Data Structures Website: Lectures: Haim Kaplan and Uri Zwick Teaching Assistants: Yaron Orenstein and Yanir Kleiman Website: In moodle Exam: 80% Theoretical Assingnments: 10% Practical Assignments: 10% Cormen, Leiserson, Rivest and Stein Introduction to Algorithms (Second/Third Editions)
Heaps/Priority Queues Data Structures “In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.” Lists Search Trees Heaps/Priority Queues Hashing Union-Find Tries and Suffix Trees Sorting and Selection
Data Structures Lecture 1 Abstract Data Types Lists, Stacks, Queues, Deques Arrays, Linked Lists Amortized Analysis Haim Kaplan and Uri Zwick October 2013
Lists (Sequences) [a0 a1 a2 a3 … ai-1 ai ai+1 … an-2 an-1 ] b Retrieve the item at position i Insert b at position i Delete the item at position i When items are inserted or deleted, the indices of some other items change
Abstract Data Type (ADT) Lists (Sequences) List() – Create an empty list Length(L) – Return the length of list L Retrieve(L,i) – Return the i-th item of L Insert(L,i,b) – Insert b as the i-th item of L Delete(L,i) – Delete and return the i-th item of L ( Search(L,b) – Return the position of b in L, or −1 ) Concat(L1, L2) – Concatenate L1 and L2 Interesting special cases: Retrieve-First(L), Insert-First(L,b), Delete-First(L) Retrieve-Last(L), Insert-Last(L,b), Delete-Last(L)
Abstract Data Types (ADT) Stacks, Queues, Dequeues Stacks – Implement only: Insert-Last(L,b), Retrieve-Last(L), Delete-Last(L) Also known as: Push(L,b), Top(L), Pop(L) Last In First Out (LIFO) Queues – Implement only: Insert-Last(L,b), Retrieve-First(L), Delete-First(L) First In First Out (FIFO) Deques (double ended queues) – Implement only: Insert-First(L,b), Retrieve-First(L), Delete-First(L) Insert-Last(L,b), Retrieve-Last(L), Delete-Last(L)
Using lists to represent graphs B Vertex A blabla name info A D Edge vertex1 vertex2 info B C F E
Using lists to represent graphs vertices B edges A List of vertices A B C [ , ,….] D C List of edges F [ , , ,….] E
A more functional representation Vertex B name A B D A info edges [ ,….] , D C Edges of A [ ,….] , F E
Adjacency list representation Graph vertices B A List of vertices [ , ,….] A B C D C [ ,….] , F [ ,….] , E Length(G.vertices) ?
Adjacency list representation Graph vertices B A List of vertices [ , ,….] A B C D C [ ,….] , F [ ,….] , E Retrieve(Retrieve(G.vertices,1).edges,2) ?
Implementing lists using arrays We need to know the maximal length M in advance. n L array a0 a1 … an−1 * * * maxlen M length n M Retrieve(L,i) (of any element) takes O(1) time Insert-Last(L,b) and Delete-Last(L) take O(1) time Stack operations in O(1) time
Implementing lists using arrays We need to know the maximal length M in advance. n L array a0 a1 … an−1 * * * maxlen M length n M Retrieve(L,i) (of any element) takes O(1) time Insert-Last(L,b) and Delete-Last(L) take O(1) time Insert(L,i,b) and Delete(L,i) take O(n−i+1) time
Implementing lists using arrays 1 n−1 n array maxlen M length n n+1 M Delete-Last very similar
Implementing lists using arrays 1 i n−1 n array maxlen M length n n+1 M We need to move n−i+1 items, then insert. O(n−i+1) time.
Implementing lists using circular arrays Implement queue and deque operations in O(1) time n L 1 2 array maxlen M length n M start 2 New field: start
Implementing lists using circular arrays Implement queue and deque operations in O(1) time L 1 2 M−4 M−1 array maxlen M length 7 start M−4 M Occupied region can wrap around!
Implementing lists using circular arrays Implement queue and deque operations in O(1) time n L 1 n−1 array maxlen M length n n−1 M start 1 Code of other operations similar
Arrays vs. Circular Arrays Insert/Delete Last O(1) Insert/Delete First O(n+1) Insert/Delete(i) O(n−i+1) O(min{i+1,n−i+1}) Retrieve(i) Main advantage: Constant access time Main disadvantage: Inserting or deleting elements ‘in the middle’ is expensive
Adjacency list representation B vertices A List of vertices D C [ , , ,….] A B C F [ ,….] , G E [ ,….] ,
Back to the graph application x F y E […..] […..] e1 G v [ ] v ← make-vertex(“G”) e1 ← make-edge(v,x) Insert-First(x.edges,e1) An insertion of an edge to a list of vertex v may take O(1) or O(degree(v)+1) depending on the list implementation Insert-First(v.edges,e1) e2 ← make-edge(v,y) ……..
Implementing lists using singly linked lists Lists of unbounded length Support some additional operations L first a0 a1 an-1 a2 … List object List-Node object item next last length n
Insert-First with singly linked lists Insert-First(L,b) L Generate a new List-Node object B containing b first Add B to the list last Adjust L.last, if necessary length n n+1 B Increment L.length (Return B) b … item next … a0 a1 a2 an-1
Insert-First with singly linked lists last length n n+1 B b … item next … a0 a1 a2 an-1 Insert-Last and Delete-First – very similar, also O(1) time Unfortunately, Delete-Last requires O(n+1) time
Retrieve with singly linked lists first last length n item next … a0 a1 a2 an-1 Retrieving the i-th item takes O(i+1) time
Insert with singly linked lists first last length n item next … a0 a1 a2 an-1 Inserting an item into position i takes O(i+1) time
Inserting a node (After a given node) Insert-After(A,B) – Insert B after A ai−1 ai … A b B Assignments’ order is important
Deleting a node (After a given node, not the last one) Delete-After(A) – Delete the node following A A … … ai−1 ai ai+1 What happens to the node removed?
Concat(L1,L2) – Attach L2 to the end of L1 Concatenating lists Concat(L1,L2) – Attach L2 to the end of L1 L1 n+m n a0 a1 an-1 a2 … L2 m b0 b1 bm-1 b2 … O(1) time! (What happened to L2?)
Circular arrays vs. Linked Lists Insert/Delete-First Insert-Last O(1) Delete-Last O(n) Insert/Delete(i) O(min{i+1,n−i+1}) O(i+1) Retrieve(i) Concat O(min{n1,n2}+1) In linked lists we can insert or delete elements ‘in the middle’ is O(1) time
More fun with linked lists Circular singly linked lists (Circular) Doubly linked lists Doubly linked lists with a sentinel
Circular singly linked lists If we make the list circular, we don’t need first first last length n All capabilities remain the same item next … a0 a1 a2 an-1
How do we implement Delete-Last(L) in O(1) time? Doubly linked lists!
(Circular) Doubly linked lists first n length prev item next … a0 a1 a2 an-1 Each List-Node now has a prev field All previous benefits + Delete-Last(L) in O(1) time
Inserting a node into a Doubly Linked List Insert-After(A,B) – Insert node B after node A A ai ai+1 b B
Deleting a node from a Doubly Linked List Each node now has a prev field Delete-Node(A) – Delete node A from its list ai A … ai−1 ai+1 Note: A itself not changed! Is that good?
Circular Doubly linked lists with a sentinel … a0 a1 a2 an-1 Each node has a successor and a predecessor No special treatment of first and last elements No special treatment of empty lists In some case we can identify a list with its sentinel
Circular Doubly linked lists with a sentinel Empty list
Circular Doubly linked lists with a sentinel … a1 a2 a3 an-1
Abstraction barriers User List Insert, Retrieve, Delete Search List-Node Retrieve-Node Insert-After, Delete-After, Delete-Node
With the current interface we need to do: Suppose we inserted a into L. After sometime, we want to delete a from L With the current interface we need to do: Can we do? O(n) O(n) O(1)
Insert-After, Delete-Node, Next in O(1) time Modified ADT for lists The current specification does not allow us to utilize one of the main capabilities of linked lists: Insert-After, Delete-Node, Next in O(1) time We next define a new ADT in which the user is allowed to call List-Node, Insert-After, Delete-Node, Next
Lists – A modified abstraction [ a0 a1 … ai … an-1 ] Lists are now composed of List-Nodes List-Node(b) – Create a List-Node containing item b Item(B) – Return the item contained in List-Node B List() – Create an empty list Length(L) – Return the length of list L Insert(L,i,B) – Insert B as the i-th List-Node of L Retrieve(L,i) – Return the i-th List-Node of L Delete(L,i) – Delete and return the i-th List-Node of L Concat(L1, L2) – Concatenate L1 and L2
[ a0 a1 … ai … an-1 ] Lists – A modified abstraction We now allow the following additional operations: Next(A) – Return the List-Node following A (assuming there is one) Insert-After(A,B) – Insert B after A Delete-Node(A) – Delete A from its current list These operations assume that A is contained in some list, while B is not contained in any list Note: L is not an argument of these operations!
Lists – A modified abstraction The user explicitly manipulates List-Nodes The actual implementation using linked lists remains essentially the same The length field is removed from the implementation as it is hard to keep it up to date Due to concatenations, hard to keep track of which list contains a given List-Node
Traversing a list With previous abstraction: With the new abstraction: for i ← 0 to Length(L) - 1 do { a ← Retrieve(L,i) process(a) } With the new abstraction: A ← Next(L) /* first element or sentinel if list is empty */ while (A≠ L) { a ← Item(A) process(a) A ← Next(A) }
Adjacency list representation B vertices A List of vertices D C [ , , ,….] A B C F [ ,….] , G E [ ,….] ,
Can delete an edge in O(1) time vertices C B A D E F G List of vertices … A G Edges of A … e Delete-node(e.list-node1) Delete-node(e.list-node2) List-node1 List-node2
Adjacency list representation vertices C B A D E F G List of vertices … A G Edges of A … e List-node1 List-node2
Adjacency list representation vertices C B A D E F G List of vertices … A G Edges of A … f e f ← Item(Next(e.list-node1)) ? List-node1 List-node2
Adjacency list representation vertices e f C B A D E F G 1 List of vertices 2 … 3 4 A G Edges of A … f e f ← Item(Next(e.list-node1)) ? List-node1 List-node2
Adjacency list representation vertices e f C B A D E F G 1 List of vertices 2 … 3 4 A G Edges of A … f e g ← Item(Next(f.list-node2)) ? List-node1 List-node2
Adjacency list representation vertices e C B A D E F G 1 List of vertices f 2 g … 3 4 A G Edges of A … f e g ← Item(Next(f.list-node2)) ? List-node1 List-node2
Pitfalls of the modified ADT … a0 a1 a2 an-1 L2 B … b0 b1 b2 bm-1 Insert-After(A,B) L2 is not a valid list now Should call Delete-Node(B) before Insert-After(A,B)
Which-List? Concatenations move List-Nodes for lists to lists Which-List(A) – return the list currently containing A Naïve implementation: scan the list from A until getting to the sentinel Much more efficient implementation possible using a Union-Find data structure.
Implementation of lists Circular arrays Doubly Linked lists Balanced Trees Insert/Delete-First Insert/Delete-Last O(1) O(log n) Insert/Delete(i) O(i+1) Retrieve(i) Concat O(n+1)