A Practical Guide to Data Structures and Algorithms Using Java

Sally A. Goldman and Kenneth J. Goldman
Washington University in St. Louis

Course Lectures

The lectures are provided in two formats: video and pdf. The videos are avi files that include full audio and high resolution screen capture of all activity on the tablet PC. The pdf files show the complete content of each "page" of the lecture. Used in conjunction, the pdf files are helpful to quickly identify sections of lectures that you may want to view in detail within the videos.

These lectures can be used in several ways:

Lecture 1: Introduction

Course Introduction (7:54) [lecture notes]
We describe the goals of this course.

Motivating asymptotic complexity (37:48) [lecture notes]
We use the problem of finding the closest-pair of points in the plane to help motivate the standard asymptotic time complexity analysis that is used by computer scientists. One goal is to demonstrate that the asymptotic value of the number of lines of code executed is a good "back-of-the-envelope" calculation that can be used to evaluate which algorithms will run fastest in practice. To help make this point, we report on empirical tests comparing execution time to the number of statements executed. We stress the value of being able to quickly rule out inferior algorithms, so that the time spent on coding and empirical testing can be used for the most promising algorithms.

Introduction of the closest-pair problem (15:39) [lecture notes]
We present the closest-pair problem, which is used at the start of the course to motivate the theoretical foundations that are going to be used throughout the course. This problem has been selected since it helps students to appreciate that even though they may be able to solve a problem from first principles, this course can help them learn to develop more efficient solutions. This lecture component also covers the brute force algorithm that computes the distance between all pairs of points, and has a high-level overview of the divide-and-conquer algorithm, built up by combining ideas suggested by the class.

Analysis of the brute force closest-pair algorithm (6:35) [lecture notes]
We analyze the brute force closest pair algorithm that computes the distance between all pairs of points including a brief discussion on ways to compute the number of pairs of points from first principles for anyone without combinatorics background.

Empirical comparisons of running times (5:19) [lecture notes]
We present actual running times (in seconds) for the brute force and divide-and-conquer algorithms for the closest-pair problem for differing input sizes on two different computers that vary significantly in processor speed. The goal is to help students appreciate the difference in real time between an O(n log n) and O(n2) algorithm.

Lecture 2: Divide-and-Conquer Algorithms

Divide-and-conquer algorithm design technique (6:45) [lecture notes] [handout]
We describe the divide-and-conquer technique for algorithm design.

Merge sort algorithm (14:12) [lecture notes]
We present merge sort as an example of the divide-and-conquer technique.

Detailed discussion of the divide-and-conquer closest pair algorithm (31:34) [lecture notes]
We expand upon the high-level ideas introduced in the first lecture for an in-depth discussion of the divide-and-conquer closest pair algorithm.

Handling degeneracies in the divide-and-conquer closest-pair algorithm (14:13) [lecture notes]
We discuss the problem that could occur when two or more points share the same x-coordinate along the line that divides the left and right halves. In particular, we describe how a method (called leftOf) can be used to define a unique ordering of the points that is sorted with respect to the x-coordinate to resolve this potential difficulty.

Lecture 3: Divide-and-Conquer Algorithms (cont.)

Pseudocode for the divide-and-conquer closest pair algorithm (8:02) [lecture notes]
We wrap up our discussion of the divide-and-conquer closest pair algorithm by overviewing it through pseudo-code.

Correctness of the divide-and-conquer closest pair algorithm (6:51) [lecture notes]
We argue that the divide-and-conquer algorithm is guaranteed to return the correct answer for all possible inputs. While the discussion uses this problem as an example, it addresses more generally how induction can be used to prove the correctness of any divide-and-conquer algorithm.

Analysis of the divide-and-conquer closest pair algorithm (42:43) [lecture notes]
We analyze the asymptotic time complexity of the closest pair algorithm. While this lecture motivates the benefits of the master method (a "cookbook" approach to solve recurrences of a specified form), it uses the recursion tree technique to solve the recurrence.

Lecture 4: Asymptotic Notation

Asymptotic Notation (38:19) [lecture notes]
We formally define asymptotic notation (big-Oh, big-Omega, big-Theta, little-oh, little-omega), and discuss the limiting behavior as n approaches infinity. We build upon relationships already understood in working with inequalities to develop intuition about asymptotic notation.

Working with asymptotic notation (31:10) [lecture notes]
We use example problems to develop a better understanding of asymptotic notation. It's important to be able to determine relationships between pairs of functions. As part of this lecture, L'Hopital's rule for computing limits is reviewed.

Analyzing code fragments that are nested loops (6:17) [lecture notes]
We analyze the asymptotic time complexity of example code fragments with nested loops.

Lecture 5: Analyzing Divide-and-Conquer Algorithms

Analyzing divide-and-conquer algorithms (12:36) [lecture notes]
We discuss how to define a recurrence relation to express the asymptotic time complexity of a divide-and-conquer algorithm. Next we present the master method to quickly get an asymptotic solution, or how to use a recursion tree to obtain an exact solution.

Master method to asymptotically solve a recurrence relation (19:04) [lecture notes]
We present the master method, a "cookbook" method to asymptotically solve typical recurrence relations that occur for divide and conquer algorithms when all subproblems are roughly the same size. In particular, it can be used to solve recurrences of the form T(n) = a T(n/b) + f(n) where f(n) is of the form nl(log n)k for constants l &ge 0 and k &ge 0.

Exactly solving a recurrence relation (29:50) [lecture notes]
We illustrate how to use a recursion tree to exactly solve recurrence relations that typical of divide and conquer algorithms.

Lecture 6: ADT Taxonomy Part I and Positional Collection ADT

Overview of the ADT taxonomy (11:57) [lecture notes]
We overview the ADT taxonomy used throughout the book.

Introduction of the manually positioned collection (2:42) [lecture notes]
We provide an introduction to manually positioned collections.

A discussion of the positional collection data structures (62:29) [lecture notes]
We present positional collection data structures, including the singly-linked list, doubly-linked list, array, dynamic array, circular array, and tracked array.

Lecture 7: Quicksort, Expected time complexity

Quicksort (59:31) [lecture notes]
We present quicksort with median-of-three partitioning and also randomized quicksort. We also discuss the common optimization of terminating when the input size is a small value (around 30), and then using insertion sort to complete the sort.

Expected time complexity definition (9:00) [lecture notes]
We formally define expected time complexity.

Analysis of randomized quicksort (10:11) [lecture notes]
We analyze the expected time complexity of randomized quicksort.

Lecture 8: Adversary Lower Bound Technique

Adversary lower bound technique (40:47) [lecture notes] [handout]
Using "20 questions" as an example problem, we describe the adversary lower bound technique. In particular, we focus on the basic adversary strategy of making a list of all possible solutions, and then answering to maximize the number of remaining solutions consistent with all past answers. This adversary strategy gives the same bounds as one obtains using the decision tree technique.

Comparison-based sorting lower bound (16:17) [lecture notes]
We apply the adversary lower bound technique to prove an Ω(n log n) lower bound for any comparison-based sorting algorithm.

Lower bound for finding the minimum (13:08) [lecture notes]
We use the problem of finding the minimum element in an array to illustrate that for some problems alternate adversary strategies can yield a better lower bound.

Lecture 9: Linear Time Sorting Algorithms

Counting sort (24:39) [lecture notes]
We present counting sort.

Digitizer interface (9:03) [lecture notes]
We describe the digitizer interface. Similar to the way in which Java's Comparator interface allows a comparison-based sorting algorithm to be applied to any data elements that have a comparator, our Digitizer interface allows any sorting algorithm that treats each element as a sequence of digits to be applied to any data elements for which a Digitizer is defined.

Radix sort (50:19) [lecture notes]
We present radix sort.

Bucket sort (7:19) [lecture notes]
We present bucket sort.

Lecture 10: ADT Taxonomy Part II, Set ADT

ADT taxonomy for algorithmically positioned collections (38:28) [lecture notes]
We return to our earlier discussion of the ADT taxonomy used in the book, focusing on algorithmically positioned collections.

Set ADT (6:58) [lecture notes]
We describe the Set abstract data type.

Lecture 11: Direct Addressing and Open Addressing

Direct addressing (18:56) [lecture notes]
We present the direct addressing data structure.

Introduction to hashing based Set data structures (27:39) [lecture notes]
We introduce hashing, and provide an overview of separate chaining and open addressing.

Open addressing (44:32) [lecture notes]
We present the open addressing data structure.

Lecture 12: Separate Chaining

Separate Chaining (10:18) [lecture notes]
We present the separate chaining data structure.

Comparison of the trade-offs between Set data structures (11:08) [lecture notes]
We systematically compare the advantages and disadvantages of direct addressing, open addressing, and separate chaining.

Lecture 13: Priority Queue ADT and the Binary Heap Data Structure

Priority Queue ADT (7:00) [lecture notes]
We describe the priority queue abstract data type.

Binary Heap (56:34) [lecture notes]
We present the binary heap data structure.

Tracked Binary Heap (5:25) [lecture notes]
We illustrates how the elements stored in a binary heap can be tracked.

Comparison of the trade-offs between priority queue data structures (6:14) [lecture notes]
We systematically compares the advantages and disadvantages of a binary heap, leftist heap, pairing heap, and Fibonacci heap.

Lecture 14: Ordered Collection ADT, Balanced Search Trees

Ordered collection ADT (5:31) [lecture notes]
We describe the ordered collection abstract data type.

Abstract search tree (14:24) [lecture notes]
We discuss the generalization of a binary search tree in which each node can have any number of elements per node. As part of this discussion, we present the properties of an abstract search tree that allow efficient search, and discuss how an inorder traversal can be used to iterate over the elements in sorted order.

Binary search tree review (17:23) [lecture notes]
We review the binary search tree data structure with a focus on how to find the successor (and symmetrically predecessor) of an element, and how to remove an element.

Balanced search tree (27:03) [lecture notes]
We describe the rotations that are used to help create balanced binary search trees. This discussion includes a high-level overview of the ways in which data structures that maintain a balanced binary search tree select when to perform rotations.

Review for Midterm

Midterm topics (15:07) [lecture notes]
A review of the topics that will be covered on the midterm.

Lecture 15: Red-Black Trees

Red-black tree properties (12:24) [lecture notes]
We introduce the red-black tree by describing its properties. We also derive an upper bound on the depth of a red-black tree holding n elements.

Red-black tree methods (54:49) [lecture notes]
We describe the red-black tree methods, starting with a brief discussion of how the non-mutating methods are exactly as for a standard binary search tree. Then we present the mutators. Insertion is presented in depth, followed by a discussion of the high-level approach used by deletion, but without the details of the specific cases that occur.

Lecture 16: B-Trees

Intuition behind the B-Tree design (33:19) [lecture notes]
We discuss secondary storage and use it to motivate the design of a B-tree in terms of its goal of reducing the number of page faults during a search.

B-Tree properties (7:00) [lecture notes]
We describe the B-Tree properties.

B-Tree insertion (19:49) [lecture notes]
We describe both the bottom-up and top-down B-Tree insertion methods.

2-3-4 trees and their relation to red-black trees (17:08) [lecture notes]
We describe the 2-3-4 tree, which is simply a B-tree of order 2 (t=2), and its relationship to the red-black tree.

Lecture 17: B-Trees (cont.) and B+-Trees

B-Tree deletion (18:47) [lecture notes]
We describe the B-Tree deletion with a focus on the high-level ideas to develop a good intuition behind the design of this method.

B-Tree analysis (16:27) [lecture notes]
We derive an upper bound on the height of the B-tree, and use it to analyze the asymptotic time complexity of the B-Tree methods.

B+-tree (17:31) [lecture notes]
We briefly overview the B+-Tree data structure.

Lecture 18: Skip Lists

Intuition behind the skip list design (18:54) [lecture notes]
We motivate the design of a skip list in terms of additional navigation support for a doubly-linked list.

Skip list representation (5:24) [lecture notes]
We describe the internal representation of the skip list.

Skip list methods (17:17) [lecture notes]
We describe the skip list methods, focusing on searching, insertion, and deletion.

Lecture 19: Skip List Analysis, and Comparisons of Ordered Collection Data Structures

Skip list analysis (30:55) [lecture notes]
We analyze the time complexities of the skip list methods.

Relationship between a skip list and B+-tree (3:50) [lecture notes]
We describe the relationship between a skip list and B+-tree.

Comparison and trade-offs among ordered collection data structures (8:39) [lecture notes]
We systematically compare the ordered collection data structures.

Lecture 20: Digitized Ordered Collections, Spatial Collections/k-d Tree

Building a trie (16:02)
This video shows a short visualization of building a trie.

Digitized ordered collection ADT and data structures (33:43) [lecture notes]
We describe the digitized ordered collection abstract data type, and the illustrate by example the trie, compact trie, compressed trie, suffix tree, and indexing trie data structures..

Spatial collection ADT (9:37) [lecture notes]
We describe the spatial collection abstract data type.

kd-tree data structure (14:34) [lecture notes]
We describe the kd-tree data structure for the spatial collection.

Quad tree data structure overview(3:26) [lecture notes]
We briefly describe the quad tree data structure for the spatial collection.

Lecture 21: Graph Problems and Graph Representations

Introduction to graphs (23:50) [lecture notes]
We introduce graphs and describe a variety of problems that are best modeled by graphs.

Graph representations (64:38) [lecture notes]
After reviewing the different types of graphs, we discuss the adjacency list, adjacency set, and adjacency matrix representations and the trade-offs between them.

Lecture 22: Breadth-First Search (BFS)

Breadth-first search (51:17) [lecture notes]
We consider the problem of finding the shortest path in an unweighted path using breadth-first search (bfs). We also introduce the shortest path tree as a way to represent the output from breadth-first search.

Lecture 23: Dijkstra's Shortest Path Algorithm

Dijkstra's single-source shortest path algorithm (38:55) [lecture notes]
We present Dijkstra's algorithm to find the shortest path in a weighted graph (with non-negative edge weights) to all vertices from a specified source vertex.

Short overview of Dijkstra's algorithm (11:32) [lecture notes]
A closing overview of Dijkstra's single-source shortest path algorithm.

Guidance on the design for Project 4 -- Travel Agent Application of Dijkstra's Shortest Path Algorithm (22:37) [lecture notes]
An overview of the internals of the implementation for the last project.

Lecture 24: Greedy Tree Builder

Minimum spanning tree problem definition (4:15) [lecture notes]
We define the minimum spanning tree problem.

Maximum bottleneck problem definition (3:11) [lecture notes]
We define the maximum bottleneck problem.

Greedy tree builder (55:41) [lecture notes]
We define a generalization of Dijkstra's single-source shortest path problem and also Prim's minimum spanning tree problem. We discuss how the greedy tree builder can be used to solve either of these two problems, and also to solve the maximum bottleneck problem.

Time complexity analysis of the greedy tree builder (15:45) [lecture notes]
We analyze the time complexity of the greedy tree builder (and hence of both Dijkstra's single-source shortest path algorithm and Prim's minimum spanning tree algorithm) in terms of the time complexity of the priority queue ADT methods that dominate the time complexity of the greedy tree builder.

Lecture 25: Prim's and Kruskal's Minimum Spanning Tree (MST) Algorithm

Correctness argument for Dijkstra's algorithm (23:35) [lecture notes]
We argue that Dijkstra's single-source shortest path algorithm is correct.

Correctness argument for Prim's minimum spanning tree algorithm (23:44) [lecture notes]
We argue that Prim's minimum spanning tree algorithm is correct.

Kruskal's minimum spanning tree algorithm (11:48) [lecture notes]
We present Kruskal's minimum spanning tree algorithm and prove its correctness. We also briefly introduce the union-find data structure as a way to efficiently implement Kruskal's algorithm. Finally, we analyze the time complexity of Kruskal's algorithm and briefly discuss it in relation to Prim's algorithm.

Lecture 26: DFS and Topological Sort

Depth-first search (33:43) [lecture notes]
We present depth-first search (dfs).

Topological sort (30:14) [lecture notes]
We present how depth-first search can be used to order a set of processes in a precedence graph, and prove that the algorithm is correct. This problem is called topological sort since it is trying to order (or sort) the processes in a way that adheres to the given precedence constraints.

Strongly connected components (5:51) [lecture notes]
We define a strongly connected components, and then briefly describe how depth-first search can be used to solve this problem.

Lecture 27: Garbage Collection Algorithms

What is garbage collection? (11:56) [lecture notes]
We discuss memory management and how one defines which memory cells are garbage.

Mark-and-sweep garbage collection algorithm (51:02) [lecture notes]
We describe the mark-and-sweep garbage collection algorithm, including in-place depth-first search and why that is so important for this application. We close with a brief analysis of the mark-and-sweep algorithm and an example where novice Java programmers often unnecessarily create garbage.

Copying garbage collection algorithm (14:16) [lecture notes] We briefly overview the copying collection garbage collection algorithm and discuss the trade-off between mark-and-sweep and copying collection.

Lecture 28: Course Review

Graphs (15:23) [lecture notes]
We review the portion of the course on graphs and graph algorithms.

ADT taxonomy (11:59) [lecture notes]
We review the ADT taxonomy introduced and used throughout this course to select the best ADT(s) for a given application.

Data structures (13:12) [lecture notes]
We review the data structures covered in this course.

Other topics (7:05) [lecture notes]
We review the remaining topics in this course.