External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.


Similar presentations
Chapter 5: Tree Constructions

Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Introduction to Algorithms Quicksort
Advanced Topics in Algorithms and Data Structures
Optimal PRAM algorithms: Efficiency of concurrent writing “Computer science is no more about computers than astronomy is about telescopes.” Edsger Dijkstra.
Lecture 3: Parallel Algorithm Design
Accelerating External Search with Bitstate Hashing Stefan Edelkamp Shahid Jabbar Computer Science Department University of Dortmund, Dortmund, Germany.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Advanced Topics in Algorithms and Data Structures Lecture 7.1, page 1 An overview of lecture 7 An optimal parallel algorithm for the 2D convex hull problem,
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
1 Tuesday, November 14, 2006 “UNIX was never designed to keep people from doing stupid things, because that policy would also keep them from doing clever.
1 Minimum Spanning Trees Gallagher-Humblet-Spira (GHS) Algorithm.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Cache Oblivious Search Trees via Binary Trees of Small Height
The Complexity of Algorithms and the Lower Bounds of Problems
External-Memory MST (Arge, Brodal, Toma). Minimum-Spanning Tree Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
Important Problem Types and Fundamental Data Structures
CSE 373 Data Structures Lecture 15
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Fixed Parameter Complexity Algorithms and Networks.
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
Minimum Spanning Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
The Lower Bounds of Problems
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Breadth First Search Maedeh Mehravaran Big data 1394.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Parallel External Directed Model Checking with Linear I/O Shahid Jabbar Stefan Edelkamp Computer Science Department University of Dortmund, Dortmund, Germany.
1 Lower Bound on Comparison-based Search We have now covered lots of searching methods –Contiguous Data (Arrays) Sequential search Binary Search –Dynamic.
1 Chapter 22 Elementary Graph Algorithms. 2 Introduction G=(V, E) –V = vertex set –E = edge set Graph representation –Adjacency list –Adjacency matrix.
Depth First Search Maedeh Mehravaran Big data 1394.
Laura TomaSimplified External memory Algorithms for Planar DAGs Simplified External Memory Algorithms for Planar DAGs July 2004 Lars Arge Laura Toma Duke.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.
Lecture 3: Parallel Algorithm Design
Chapter 22 Elementary Graph Algorithms
Decision trees Polynomial-Time
B-Trees 7/5/2018 4:26 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
B+ Tree.
The Complexity of Algorithms and the Lower Bounds of Problems
A Introduction to Computing II Lecture 13: Trees
External Sorting.
The Greedy Approach Young CS 530 Adv. Algo. Greedy.
Presentation transcript:

External Memory Algorithms Kamesh Munagala

External Memory Model Aggrawal and Vitter, 1988

Typical Memory Hierarchy CPU L1 L2 Main Memory Disk

Simplified Hierarchy CPU Main Memory Disk Disk access is slow CPU speed and main memory access is really fast Goal: Minimize disk seeks performed by algorithm

Disk Seek Model Block 1Block 2Block 3Block 4 Disk divided into contiguous blocks Each block stores B objects (integers, vertices,….) Accessing one disk block is one seek Time = 1Time = 3Time = 2

Complete Model Specification Data resides on some N disk blocks Implies N * B data elements! Computation can be performed only on data in main memory Main memory can store M blocks Data size much larger than main memory size N  M: interesting case

Typical Numbers Block Size: B = 1000 bytes Memory Size: M = 1000 blocks Problem Size: N = 10,00,000 blocks

Upper Bounds for Sorting John von Neumann, 1945

Merge Sort Iteratively create large sorted groups and merge them Step 1: Create N/(M –1) sorted groups of size (M – 1) blocks each

Merge Sort: Step 1 B = 2; M = 3; N = 8 N/(M –1) = Main Memory

Merge Sort: Step

End of Step 1 N/(M-1) = 4 sorted groups of size (M-1) = 2 each One Scan

Merging Steps Merge (M-1) sorted groups into one sorted group Iterate until all groups merged

First Merging Step N = 8 M-1 = 2

First Merging Step _ 2 _

First Merging Step _ 2 _

First Merging Step _ _ _

First Merging Step 2 4 _

First Merging Step _ 4 _

First Merging Step _ 4 _

End of First Merge

Start of Second Merge

End of Second Merge

General Merging Step Sorted 1 scan (M –1) sorted groups One Block Merge

Overall Algorithm Repeat merging till all data sorted N/(M-1) sorted groups to start with Merge (M-1) sorted groups into one sorted group O(log N / log M) iterations Each iteration involves one scan of the data O(log N/ log M) scans: Alternatively, N log N/ log M seeks

Lower Bounds for Sorting Aggrawal and Vitter, 1988

Complexity in the RAM Model Comparison model: Only allowed operation is comparison No arithmetic permitted Try and lower bound number of comparisons Decision Tree: Input is an array of objects Every node is a comparison between two fixed locations in the array Leaf nodes of tree give sorted order of objects

Example A[1] A[2] A[3] A[1] ? A[2] < > A[3] ? A[2] > A[1] A[2] A[3] < > A[3] A[2] A[1] < <> >< A[1] ? A[3] A[1] A[3] A[2] A[3] A[1] A[2] A[2] A[1] A[3] A[2] A[3] A[1]

Decision Tree Complexity Properties of Decision Tree: Two distinct orderings cannot map to same leaf node Most sorting algorithms can be thought of as decision trees Depth of any possible tree gives lower bound on number of comparisons required by any possible algorithm

Complexity of Sorting Given n objects in array: Number of possible orderings = n! Any decision tree has branching factor 2 Implies depth log (n!) =  (n log n) What about disk seeks needed?

I/O Complexity of Sorting Input has NB elements Main memory size can store MB elements We know the ordering of all elements in main memory What does a new seek do: Read in B array elements New information obtained: Ordering of B elements Ordering of (M-1)B elements already in memory with the B new elements

Example Main memory has A < B < C We read in D,E Possibilities: D < E: D E A B C D A E B C A D B E C … D > E: E D A B C E A D B C A E B D C … 20 possibilities in all

Fan Out of a Seek Number of possible outcomes of the comparisons: This is the branching factor for a seek We need to distinguish between possible outcomes

Merge Sort Is Optimal Merge Sort is almost optimal Upper Bound for Merge Sort = O(N log N/ log M)

Improved Lower Bound Relative ordering of B elements in block unknown iff block never seen before Only N such seeks In this case, possible orders = Else, possible orders =

Better Analysis Suppose T seeks suffice For some N seeks: Branching factor = For the rest of the seeks: Branching factor =

Final Analysis Merge Sort is optimal!

Some Notation Sort(X): Number of seeks to sort X objects Scan(X): Number of seeks to scan X objects =

Graph Connectivity Munagala and Ranade, 1999

Problem Statement V vertices, E edges Adjacency list: For every vertex, list the adjacent vertices O(E) elements, or O(E/B) blocks Goal: Label each vertex with its connected component

Breadth First Search Properties of BFS: Vertices can be grouped into levels Edges from a particular level go to either: Previous level Current level Next level

Notation Front(t) = Vertices at depth t Nbr(t) = Neighbors of Front(t)

Example BFS Tree Front(1) Front(2) Front(3) Front(4) Nbr(3)

Algorithm Scan edges adjacent to Front(t) to compute Nbr(t) Sort vertices in Nbr(t) Eliminate: Duplicate vertices Front(t) and Front(t-1) Yields Front(t+1)

Complexity I Scanning Front(t): For each vertex in Front(t): Scan its edge list Vertices in Front(t) are not contiguous on disk Round-off error of one block per vertex O(Scan(E) + V) over all time t Yields Nbr(t)

Complexity II Sorting Nbr(t): Total size of Nbr(t) = O(E) over all times t Implies sorting takes O(sort(E)) I/Os Total I/O: O(sort(E)+V) Round-off error dominates if graph is sparse

PRAM Simulation Chiang, Goodrich, Grove, Tamassia, Vengroff and Vitter 1995

PRAM Model Memory Parallel Processors

Model At each step, each processor: Reads O(1) memory locations Performs computation Writes results to O(1) memory locations Performed in parallel by all processors Synchronously! Idealized model

Terms for Parallel Algorithms Work: Total number of operations performed Depth: Longest dependency chain Need to be sequential on any parallel machine Parallelism: Work/Depth

PRAM Simulation Theorem: Any parallel algorithm with: O(D) input data Performing O(W) work With depth one can be simulated in external memory using O(sort(D+W)) I/Os

Proof Simulate all processors at once: We write on disk the addresses of operands required by each processor O(W) addresses in all We sort input data based on addresses O(sort(D+W)) I/Os Data for first processor appears first, and so on

More Proof Proof continued: Simulate the work done by each processor in turn and write results to disk along O(scan(W)) I/Os Merge new data with input if required O(sort(D+W)) I/Os

Example: Find-min Given N numbers A[1…N] in array Find the smallest number Parallel Algorithm: Pair up the elements For each pair compute the smaller number Recursively solve the N/2 size problem

Quality of Algorithm Work = N + N/2 + … = O(N) Depth at each recursion level = 1 Total depth = O(log N) Parallelism = O(N/log N)

External Memory Find Min First step: W = N, D = N, Depth = 1 I/Os = sort(N) Second step: W = N/2, D = N/2, Depth = 1 I/Os = sort(N/2) … Total I/Os = sort(N) + sort(N/2) + … I/Os = O(sort(N))

Graph Connectivity Due to Chin, Lam and Chen, CACM 1982 Assume vertices given ids 1,2,…,V Step 1: Construct two trees: T1: Make each vertex point to some neighbor with smaller id T2: Make each vertex point to some neighbor with larger id

Example Graph T1 T2

Lemma  One of T1 and T2 has at least V/2 edges  Assuming each vertex has at least one neighbor  Proof is homework!  Choose tree with more edges

Implementation Finding a larger/smaller id vertex: Find Min or Find Max for each vertex O(E) work and O(log V) depth O(sort(E)) I/Os in external memory In fact, O(scan(E)) I/Os suffice!

Step 2: Pointer Doubling Let Parent[v] = Vertex pointed to by v in tree If v has no parent, set Parent[v] = v Repeat O(log V) times: Parent[v] = Parent[Parent[v]] Each vertex points to root of tree to which it belongs

Implementation Work = O(V) per unit depth Depth = O(log V) I/Os = O(log V sort(V)) Total I/Os so far: O(scan(E) + log V sort(V))

Collapsing the Graph

Procedure Create new graph: For every edge (u,v) Create edge (Parent[u], Parent[v]) O(E) work and O(1) depth O(scan(E)) I/Os trivially Vertices: v such that Parent[v] = v Number of vertices at most ¾ V

Duplicate Elimination Sort new edge list and eliminate duplicates O(E) work and O(log E) depth Parallel algorithm complicated O(sort(E)) I/Os trivially using Merge Sort Total I/O so far: O(sort(E) + log V sort(V))

Iterate Problem size: ¾ V vertices At most E edges Iterate until number of vertices at most MB For instance with V’ vertices and E’ edges: I/O complexity: O(sort(E’) + log V’ sort(V’))

Total I/O Complexity

Comparison BFS Complexity: O(sort(E) + V) Better for dense graphs (E > BV) Parallel Algorithm Complexity: O(log V sort(E)) Better for sparse graphs

Best Algorithm Due to Munagala and Ranade, 1999 Upper bound: Lower bound:  (sort(E))