Data Structures Introduction

Data Structures Introduction
Alon Halevy

Clever? Efficient? Insert Lists, Stacks, Queues Delete Heaps
Find Merge Shortest Paths Union Lists, Stacks, Queues Heaps Binary Search Trees AVL Trees Hash Tables Graphs Disjoint Sets Data Structures Algorithms

Used Everywhere! Mastery of this material separates you from: Graphics
Theory AI Applications Systems Used Everywhere! Mastery of this material separates you from: Perhaps the most important course in your CS curriculum! Guaranteed non-obsolescence!

Anecdote #1 N2 “pretty print” routine nearly dooms major expert system project at AT&T 10 MB data = 10 days (100 MIPS) programmer was brilliant, but he skipped 326…

Asymptotic Complexity
Our notion of efficiency: How the running time of an algorithm scales with the size of its input several ways to further refine: worst case average case amortized over a series of runs

The Apocalyptic Laptop
Seth Lloyd, SCIENCE, 31 Aug 2000

Big Bang Ultimate Laptop, 1 year 1 second 1000 MIPS, since Big Bang
1 day

Specific Goals of the Course
Become familiar with some of the fundamental data structures in computer science Improve ability to solve problems abstractly data structures are the building blocks Improve ability to analyze your algorithms prove correctness gauge (and improve) time complexity Become modestly skilled with the UNIX operating system (you’ll need this in upcoming courses) This course is designed to familiarize you with the most basic and important data structures in computer science. The ones that will form the foundation of all your future work with computers. Moreover, you’ll learn how to analyze your programs and data structures so that you know how well they work and what sort of effort in the program is acceptable. These are the goals of the course as well as my expectations of you.

One Preliminary Hurdle
Recall what you learned in CSE 321 … proofs by mathematical induction proofs by contradiction formulas for calculating sums and products of series recursion Know Sec 1.1 – 1.4 of text by heart!

A Second Hurdle Unix Experience 1975 all over again!
Try to login, edit, create a Makefile, and compile your favorite “hello world” program right away Programming Project #1 distributed Wednesday Bring your questions and frustrations to Section on Thursday!

A Third Hurdle: Templates
class Set_of_ints { public: insert( int x ); boolean is_member( int x ); … } template <class Obj> class Set { insert( Obj x ); boolean is_member( Obj x ); … } Set <int> SomeNumbers; Set <char *> SomeWords;

In Every Silver Lining, There’s a Big Dark Cloud – George Carlin
Templates were invented 12 years ago, and still no compiler correctly implements them! Using templates with multiple source files tricky See Course Web pages and TAs for best way MAINTAINING SANITY RULE Write/debug first without templates Templatize as need Keep it simple!

Handy Libraries From Weiss: Like arrays and char*, but provide
vector < int > MySafeIntArray; vector < double > MySafeFloatArray; string MySafeString; Like arrays and char*, but provide bounds checking memory management STL (Standard Template Library) most of CSE 326 in a box don’t use (unless told); we’ll be rolling our own

C++  Data Structures One of the all time great books in computer science: The Art of Computer Programming ( ) by Donald Knuth Examples in assembly language (and English)! American Scientist says: in top 12 books of the CENTURY! Very little about C++ in class.

Abstract Data Types Abstract Data Type (ADT) Data Types Algorithms
Mathematical description of an object and the set of operations on the object tradeoffs! Given that this is computer science, I know you’d be disappointed if there were no acronyms in the class. Here’s our first one! Now, what an ADT really is is the interface of a data structure without any specification of the implementation. In this class, we’ll study groups of data structures to implement any given abstract data type. In that context… Data Types integer, array, pointers, … Algorithms binary search, quicksort, …

ADT Presentation Algorithm
Present an ADT Motivate with some applications Repeat until it’s time to move on: develop a data structure and algorithms for the ADT analyze its properties efficiency correctness limitations ease of programming Contrast strengths and weaknesses Given those definitions, here’s our first algorithm. This is how I’m going to try to present each set of data structures to you. You should hold me to this! You’re not getting enough out of the presentation if you don’t see these. And look, here’s an ADT now…

First Example: Queue ADT
Queue operations create destroy enqueue dequeue is_empty Queue property: if x is enQed before y is enQed, then x will be deQed before y is deQed FIFO: First In First Out F E D C B enqueue dequeue G A You’ve probably seen the Queue before. If so, this is a review and a way for us to get comfortable with the format of data structure presentations in this class. If not, this is a simple but very powerful data structure, and you should make sure you understand it thoroughly. This is an ADT description of the queue. Notice that there are no implementation details. Just a general description of the interface and important properties of those interface methods.

Applications of the Q Hold jobs for a printer
Store packets on network routers Make waitlists fair Breadth first search Qs are used widely in computer science. This is just a handful of the high profile uses, but _many_ programs use queues.

Circular Array Q Data Structure
size - 1 b c d e f front back enqueue(Object x) { Q[back] = x ; back = (back + 1) % size } How test for empty list? How to find K-th element in the queue? What is complexity of these operations? Limitations of this structure? Here is a data structure implementation of the Q. The queue is stored as an array, and, to avoid shifting all the elements each time an element is dequeued, we imagine that the array wraps around on itself. This is an excellent example of how implementation can affect interface: notice the “is_full” function. There’s also another problem here. What’s wrong with the Enqueue and Dequeue functions? Your data structures should be robust! Make them robust before you even consider thinking about making them efficient! That is an order! dequeue() { x = Q[front] ; front = (front + 1) % size; return x ; }

Linked List Q Data Structure
b c d e f front back enqueue(Object x) { back->next = new Node(x); back = back->next; } dequeue() { saved = front->data; temp = front; front = front->next; delete temp ; return saved;} What are tradeoffs? simplicity speed robustness memory usage Notice the tricky memory management

To Do Return your survey before leaving!
Sign up on the cse326 mailing list Check out the web page Log on to the PCs in course labs and access an instructional UNIX server Read Chapters 1 and 2 in the book

Data Structures Analysis of Algorithms
Alon Halevy

Analysis of Algorithms
Analysis of an algorithm gives insight into how long the program runs and how much memory it uses time complexity space complexity Why useful? Input size is indicated by a number n sometimes have multiple inputs, e.g. m and n Running time is a function of n n, n2, n log n, n(log n2) + 5n3

Simplifying the Analysis
Eliminate low order terms 4n  4n 0.5 n log n - 2n  n log n 2n + n3 + 3n  2n Eliminate constant coefficients 4n  n 0.5 n log n  n log n log n2 = 2 log n  log n log3 n = (log3 2) log n  log n We didn’t get very precise in our analysis of the UWID info finder; why? Didn’t know the machine we’d use. Is this always true? Do you buy that coefficients and low order terms don’t matter? When might they matter? (Linked list memory usage)

Order Notation BIG-O T(n) = O(f(n)) OMEGA T(n) =  (f(n))
Upper bound Exist constants c and n0 such that T(n)  c f(n) for all n  n0 OMEGA T(n) =  (f(n)) Lower bound T(n)  c f(n) for all n  n0 THETA T(n) = θ (f(n)) Tight bound θ(n) = O(n) =  (n) We’ll use some specific terminology to describe asymptotic behavior. There are some analogies here that you might find useful.

Examples n2 + 100 n = O(n2) = (n2) = (n2) n log n = O(n2)
( n n )  2 n2 for n  10 ( n n )  1 n2 for n  0 n log n = O(n2) n log n = (n log n) n log n = (n)

More on Order Notation Order notation is not symmetric; write
2n2 + 4n = O(n2) but never O(n2) = 2n2 + 4n right hand side is a crudification of the left Likewise O(n2) = O(n3) (n3) = (n2)

A Few Comparisons Function #2 Function #1 100n2 + 1000 n3 + 2n2 log n

Race I n3 + 2n2 vs. 100n

Race II n0.1 vs. log n Well, log n looked good out of the starting gate and indeed kept on looking good until about n^17 at which point n^0.1 passed it up forever. Moral of the story? N^epsilon beats log n for any eps > 0. BUT, which one of these is really better?

Race III n + 100n0.1 vs. 2n + 10 log n Notice that these just look like n and 2n once we get way out. That’s because the larger terms dominate. So, the left is less, but not asymptotically less. It’s a TIE!

Race IV 5n5 vs. n! N! is BIG!!!

Race V n-152n/100 vs. 1000n15 No matter how you put it, any exponential beats any polynomial. It doesn’t even take that long here (~250 input size)

Race VI 82log(n) vs. 3n7 + 7n We can reduce the left hand term to n^6, so they’re both polynomial and it’s an open and shut case.

The Losers Win Better algorithm! O(n2) O(log n) TIE O(n) O(n5) O(n15)
Function #1 n3 + 2n2 n0.1 n + 100n0.1 5n5 n-152n/100 82log n Function #2 100n log n 2n + 10 log n n! 1000n15 3n7 + 7n Welcome, everyone, to the Silicon Downs. I’m getting race results as we stand here. Let’s start with the first race. I’ll have the first row bet on race #1. Raise your hand if you bet on function #1 (the jockey is n^0.1) So on. Show the race slides after each race.

Common Names constant: O(1) logarithmic: O(log n) linear: O(n)
log-linear: O(n log n) superlinear: O(n1+c) (c is a constant > 0) quadratic: O(n2) polynomial: O(nk) (k is a constant) exponential: O(cn) (c is a constant > 1) Well, it turns out that the old Silicon Downs is fixed. They dope up the horses to make the first few laps interesting, but we can always find out who wins. Here’s a chart comparing some of the functions. Notice that any exponential beats any polynomial. Any superlinear beats any poly-log-linear. Also keep in mind (though I won’t show it) that sometimes the input has more than one parameter. Like if you take in two strings. In that case you need to be very careful about what is constant and what can be ignored. O(log m + 2n) is not necessarily O(2n)

Kinds of Analysis Running time may depend on actual data input, not just length of input Distinguish worst case your worst enemy is choosing input best case average case assumes some probabilistic distribution of inputs amortized average time over many operations We already discussed the bound flavor. All of these can be applied to any analysis case. For example, we’ll later prove that sorting in the worst case takes at least n log n time. That’s a lower bound on a worst case. Average case is hard! What does “average” mean. For example, what’s the average case for searching an unordered list (as precise as possible, not asymptotic). WRONG! It’s about n, not 1/2 n. Why? You have to search the whole thing if the elt is not there. Note there’s two senses of tight. I’ll try to avoid the terminology “asymptotically tight” and stick with the lower def’n of tight. O(inf) is not tight!

Analyzing Code C++ operations - constant time
consecutive stmts - sum of times conditionals - sum of branches, condition loops - sum of iterations function calls - cost of function body recursive functions - solve recursive equation Above all, use your head!

Nested Loops for i = 1 to n do for j = 1 to n do sum = sum + 1
This example is pretty straightforward. Each loop goes N times, constant amount of work on the inside. N*N*1 = O(N^2)

Nested Dependent Loops
for i = 1 to n do for j = i to n do sum = sum + 1 There’s a little twist here. J goes from I to N, not 1 to N. So, let’s do the sums inside is constant. Next loop is sum I to N of 1 which equals N - I + 1 Outer loop is sum 1 to N of N - I + 1 That’s the same as sum N to 1 of I or N(N+1)/2 or O(N^2)

Conditionals Conditional time  time(C) + Max( time(S1), time(S2) )
if C then S1 else S2 time  time(C) + Max( time(S1), time(S2) ) OK, so this isn’t exactly an example. Just reiterating the rule. Time <= time of C plus max of S1 and S2 <= time of C plus S1 plus S2 time <= sum of times of iterations often #of iterations * time of S (or worst time of S)

Coming Up Thursday Friday Unix tutorial First programming project!
Finishing up analysis A little on Stacks and Lists Homework #1 goes out

Data Structures Analysis of Recursive Algorithms
Alon Halevy

Nested Dependent Loops
for i = 1 to n do for j = i to n do sum = sum + 1 There’s a little twist here. J goes from I to N, not 1 to N. So, let’s do the sums inside is constant. Next loop is sum I to N of 1 which equals N - I + 1 Outer loop is sum 1 to N of N - I + 1 That’s the same as sum N to 1 of I or N(N+1)/2 or O(N^2)

Recursion A recursive procedure can often be analyzed by solving a recursive equation Basic form: T(n) = if (base case) then some constant else ( time to solve subproblems + time to combine solutions ) Result depends upon how many subproblems how much smaller are subproblems how costly to combine solutions (coefficients) You may want to take notes on this slide as it just vaguely resembles a homework problem! Here’s a function defined in terms of itself. You see this a lot with recursion. This one is a lot like the profile for factorial. WORK THROUGH Answer: O(n)

Example: Sum of Integer Queue
sum_queue(Q){ if (Q.length == 0 ) return 0; else return Q.dequeue() + sum_queue(Q); } One subproblem Linear reduction in size (decrease by 1) Combining: constant c (+), 1×subproblem Equation: T(0)  b T(n)  c + T(n – 1) for n>0 Here’s a function defined in terms of itself. You see this a lot with recursion. This one is a lot like the profile for factorial. WORK THROUGH Answer: O(n)

Sum, Continued Equation: T(0)  b T(n)  c + T(n – 1) for n>0
Solution: T(n)  c + c + T(n-2)  c + c + c + T(n-3)  kc + T(n-k) for all k  nc + T(0) for k=n  cn + b = O(n)

Example: Binary Search
7 12 30 35 75 83 87 90 97 99 One subproblem, half as large Equation: T(1)  b T(n)  T(n/2) + c for n>1 Solution: T(n)  T(n/2) + c  T(n/4) + c + c  T(n/8) + c + c + c  T(n/2k) + kc  T(1) + c log n where k = log n  b + c log n = O(log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation.

Example: MergeSort Split array in half, sort each half, merge together
2 subproblems, each half as large linear amount of work to combine T(1)  b T(n)  2T(n/2) + cn for n>1 T(n)  2T(n/2)+cn  2(2(T(n/4)+cn/2)+cn = 4T(n/4) +cn +cn  4(2(T(n/8)+c(n/4))+cn+cn = 8T(n/8)+cn+cn+cn  2kT(n/2k)+kcn 2kT(1) + cn log n where k = log n = O(n log n) This is the same sort of analysis as last slide. Here’s a function defined in terms of itself. WORK THROUGH Answer: O(n log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation.

Example: Recursive Fibonacci
int Fib(n){ if (n == 0 or n == 1) return 1 ; else return Fib(n - 1) + Fib(n - 2); } Running time: Lower bound analysis T(0), T(1)  1 T(n)  T(n - 1) + T(n - 2) + c if n > 1 Note: T(n)  Fib(n) Fact: Fib(n)  (3/2)n O( (3/2)n ) Why? This is the same sort of analysis as last slide. Here’s a function defined in terms of itself. WORK THROUGH Answer: O(log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series.

Direct Proof of Recursive Fibonacci
int Fib(n) if (n == 0 or n == 1) return 1 else return Fib(n - 1) + Fib(n - 2) Lower bound analysis T(0), T(1) >= b T(n) >= T(n - 1) + T(n - 2) + c if n > 1 Analysis let  be (1 + 5)/2 which satisfies 2 =  + 1 show by induction on n that T(n) >= bn - 1 This is the same sort of analysis as last slide. Here’s a function defined in terms of itself. WORK THROUGH Answer: O(log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series.

Direct Proof Continued
Basis: T(0)  b > b-1 and T(1)  b = b0 Inductive step: Assume T(m)  bm - 1 for all m < n T(n)  T(n - 1) + T(n - 2) + c  bn-2 + bn-3 + c  bn-3( + 1) + c = bn-32 + c  bn-1

Fibonacci Call Tree 5 3 4 3 2 2 1 1 2 1 1 1

Learning from Analysis
To avoid recursive calls store all basis values in a table each time you calculate an answer, store it in the table before performing any calculation for a value n check if a valid answer for n is in the table if so, return it Memoization a form of dynamic programming How much time does memoized version take?

Kinds of Analysis So far we have considered worst case analysis
We may want to know how an algorithm performs “on average” Several distinct senses of “on average” amortized average time per operation over a sequence of operations average case average time over a random distribution of inputs expected case average time for a randomized algorithm over different random seeds for any input

Amortized Analysis Consider any sequence of operations applied to a data structure your worst enemy could choose the sequence! Some operations may be fast, others slow Goal: show that the average time per operation is still good

Stack ADT Stack operations
B C D E F E D C B A F Stack operations push pop is_empty Stack property: if x is on the stack before y is pushed, then x will be popped after y is popped What is biggest problem with an array implementation?

Stretchy Stack Implementation
int data[]; int maxsize; int top; Push(e){ if (top == maxsize){ temp = new int[2*maxsize]; copy data into temp; deallocate data; data = temp; } else { data[++top] = e; } Best case Push = O( ) Worst case Push = O( )

Stretchy Stack Amortized Analysis
Consider sequence of n operations push(3); push(19); push(2); … What is the max number of stretches? What is the total time? let’s say a regular push takes time a, and stretching an array contain k elements takes time kb, for some constants a and b. Amortized time = (an+b(2n-1))/n = O(1) log n

Wrapup Having math fun? Homework #1 out wednesday – due in one week
Programming assignment #1 handed out. Next week: linked lists

Data Structures Alon Halevy

Direct Proof of Recursive Fibonacci
int Fib(n) if (n == 0 or n == 1) return 1 else return Fib(n - 1) + Fib(n - 2) Lower bound analysis T(0), T(1) >= b T(n) >= T(n - 1) + T(n - 2) + c if n > 1 Analysis let  be (1 + 5)/2 which satisfies 2 =  + 1 show by induction on n that T(n) >= bn - 1 This is the same sort of analysis as last slide. Here’s a function defined in terms of itself. WORK THROUGH Answer: O(log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series.

Direct Proof Continued
Basis: T(0)  b > b-1 and T(1)  b = b0 Inductive step: Assume T(m)  bm - 1 for all m < n T(n)  T(n - 1) + T(n - 2) + c  bn-2 + bn-3 + c  bn-3( + 1) + c = bn-32 + c  bn-1

Fibonacci Call Tree 5 3 4 3 2 2 1 1 2 1 1 1

Learning from Analysis
To avoid recursive calls store all basis values in a table each time you calculate an answer, store it in the table before performing any calculation for a value n check if a valid answer for n is in the table if so, return it Memoization a form of dynamic programming How much time does memoized version take?

Kinds of Analysis So far we have considered worst case analysis
We may want to know how an algorithm performs “on average” Several distinct senses of “on average” amortized average time per operation over a sequence of operations average case average time over a random distribution of inputs expected case average time for a randomized algorithm over different random seeds for any input

Amortized Analysis Consider any sequence of operations applied to a data structure your worst enemy could choose the sequence! Some operations may be fast, others slow Goal: show that the average time per operation is still good

Stack ADT Stack operations
B C D E F E D C B A F Stack operations push pop is_empty Stack property: if x is on the stack before y is pushed, then x will be popped after y is popped What is biggest problem with an array implementation?

Stretchy Stack Implementation
int data[]; int maxsize; int top; Push(e){ if (top == maxsize){ temp = new int[2*maxsize]; copy data into temp; deallocate data; data = temp; } else { data[++top] = e; } Best case Push = O( ) Worst case Push = O( )

Stretchy Stack Amortized Analysis
Consider sequence of n operations push(3); push(19); push(2); … What is the max number of stretches? What is the total time? let’s say a regular push takes time a, and stretching an array containing k elements takes time kb, for some constants a and b. Amortized = (an+b(2n-1))/n = a+2b-(1/n)= O(1) log n

Average Case Analysis Attempt to capture the notion of “typical” performance Imagine inputs are drawn from some random distribution Ideally this distribution is a mathematical model of the real world In practice usually is much more simple – e.g., a uniform random distribution

Example: Find a Red Card
Input: a deck of n cards, half red and half black Algorithm: turn over cards (from top of deck) one at a time until a red card is found. How many cards will be turned over? Best case = Worst case = Average case: over all possible inputs (ways of shuffling deck)

Summary Asymptotic Analysis – scaling with size of input
Upper bound O, Lower bound  O(1) or O(log n) great O(2n) almost never okay Worst case most important – strong guarantee Other kinds of analysis sometimes useful: amortized average case

List ADT ( A1 A2 … An-1 An ) List properties length = n Key operations
Ai precedes Ai+1 for 1  i < n Ai succeeds Ai-1 for 1 < i  n Size 0 list is defined to be the empty list Key operations Find(item) = position Find_Kth(integer) = item Insert(item, position) Delete(position) Next(position) = position What are some possible data structures? ( A1 A2 … An-1 An ) length = n Now, back to work! We’re going to talk about lists briefly and quickly get to an idea which I hope you haven’t seen. Lists are sets of values. The type of those values is arbitrary but fixed (can’t change from one to another in the same list). Each value is at a position, and those positions are totally ordered.

Implementations of Linked Lists
Array: 1 2 3 4 5 6 7 8 9 10 H W 1 I S E A S Y Can we apply binary search to an array representation? Linked list: (optional header) (a b c) a b c  L

Linked List vs. Array linked list array sorted array
Find(item) = position Find_Kth(integer)=item Find_Kth(1)=item Insert(item, position) Insert(item) Delete(position) Next(position) = position

Tradeoffs For what kinds of applications is a linked list best?
Examples for an unsorted array? Examples for a sorted array?

Implementing in C++ (optional (a b c) header) 
Create separate classes for Node List (contains a pointer to the first node) List Iterator (specifies a position in a list; basically, just a pointer to a node) Pro: syntactically distinguishes uses of node pointers Con: a lot of verbage! Also, is a position in a list really distinct from a list?

Data Structures Alon Halevy

Implementations of Linked Lists
Array: 1 2 3 4 5 6 7 8 9 10 H W 1 I S E A S Y Can we apply binary search to an array representation? Linked list: (optional header) (a b c) a b c  L

Linked List vs. Array linked list array sorted array
Find(item) = position Find_Kth(integer)=item Find_Kth(1)=item Insert(item, position) Insert(item) Delete(position) Next(position) = position

Tradeoffs For what kinds of applications is a linked list best?
Examples for an unsorted array? Examples for a sorted array?

Implementing in C++ (optional (a b c) header) 
Create separate classes for Node List (contains a pointer to the first node) List Iterator (specifies a position in a list; basically, just a pointer to a node) Pro: syntactically distinguishes uses of node pointers Con: a lot of verbage! Also, is a position in a list really distinct from a list?

Other Data Structures for Lists
Doubly Linked List Circular List 7 11 3 2 Advantages/disadvantages (previous for doubly linked list) your book also describes header nodes. Are they just a hack? I’m not going to go into these, but: You should be able to (for a test) add and delete nodes in all these types of list; not to mention for your daily coding needs! c d e f

Implementing Linked Lists Using Arrays
1 2 3 4 5 6 7 8 9 10 Data F O A R N R T Next 3 8 6 4 -1 10 5 First = 2 “Cursor implementation” Ch 3.2.8 Often useful in any language Can use same array to manage a second list of unused cells

Application: Polynomial ADT
Ai is the coefficient of the xn-i term: 3x2 + 2x + 5 ( ) 8x + 7 ( 8 7 ) Here’s an application of the list abstract data type as a _data structure_ for another abstract data type. Is there a problem here? Why? x2 + 3 ( ) Problem?

3x ( ) What is it about lists that makes this a problem here and not in stacks and queues? (Answer: kth(int)!) Is there a solution? Will we get anything but zeroes overwhelming this data structure?

Sparse List Data Structure: 3x2001 + 4
(<4 0> <2001 3>) 4 3 2001 This slide is made possible in part by the sparse list data structure. Now, two questions: 1) Is a sparse list really a data structure or an abstract data type? (Answer: It depends but I lean toward data structure. YOUR ANSWER MUST HAVE JUSTIFICATION!) 2) Which list data structure should we use to implement it? Linked Lists or Arrays?

Addition of Two Polynomials
Similar to merging two sorted lists – O(n+m) 15+10x50+3x1200 p 15 10 50 3 1200 5+30x50+4x100 q 5 30 50 4 100 r 20 40 50 4 100 3 1200

Multiple Linked Lists Many ADTS such as graphs, relations, sparse matrices, multivariate polynomials use multiple linked lists Several options array of lists lists of lists multi lists General principle throughout the course: use one ADT to implement a more complicated one.

Array of Linked Lists: Adjacency List for Graphs
1 3 2 5 4 Array G of unordered linked lists Each list entry corresponds to an edge in the graph G Graphs are a very important data type. You might think as you read about your project if there are any graphs there. Here, we’re implementing graphs with adjacency lists. The reason is that this is a sparse graph. We want to have every node in an array (so we can find the first edge quickly), but we just need the edges around. 1 5 2 2 4 3 5 3 1 4 4 5 3 5

Reachability by Marking
Suppose we want to mark all the nodes in the graph which are reachable from a given node k. Let G[1..n] be the adjacency list rep. of the graph Let M[1..n] be the mark array, initially all falses. mark(int i){ M[i] = true; x = G[i] while (x != NULL) { if (M[x->node] == false) mark(G[x->node]) x = x->next } Here’s an algorithm that works on our adj list graph.

Multi-Lists Suppose we have a set of movies and cinemas, and we want a structure that stores which movies are playing where.

More on Multi-Lists What if we also want to store the playing times of movies?

Data Structures (end of Lists, then) Trees
Alon Halevy