I/O-Algorithms Lars Arge January 31, 2005. Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.

Slides:



Advertisements
Similar presentations
I/O-Algorithms Lars Arge Spring 2012 April 17, 2012.
Advertisements

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Lecture 11: DMBS Internals
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Dealing with MASSIVE Data Feifei Li Dept Computer Science, FSU Sep 9, 2008.
Lars Arge1 Permuting Lower Bound Permuting N elements according to a given permutation takes I/Os in “indivisibility” model Indivisibility model: Move.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Laura TomaSimplified External memory Algorithms for Planar DAGs Simplified External Memory Algorithms for Planar DAGs July 2004 Lars Arge Laura Toma Duke.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Sorting.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Digital Terrain Analysis for Massive Grids
Lecture 11: DMBS Internals
Advanced Topics in Data Management
Database Management Systems (CS 564)
CPS216: Advanced Database Systems
Database Systems (資料庫系統)
Presentation transcript:

I/O-Algorithms Lars Arge January 31, 2005

Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite memory –Uniform access cost RAMRAM

Lars Arge I/O-algorithms 3 Hierarchical Memory Modern machines have complicated memory hierarchy –Levels get larger and slower further away from CPU –Levels have different associativity and replacement strategies –Large access time amortized using block transfer between levels Bottleneck often transfers between largest memory levels in use L1L1 L2L2 RAMRAM

Lars Arge I/O-algorithms 4 I/O-Bottleneck I/O is often bottleneck when handling massive datasets –Disk access is 10 6 times slower than main memory access –Large transfer block size (typically 8-16 Kbytes) Important to obtain “locality of reference” –Need to store and access data to take advantage of blocks

Lars Arge I/O-algorithms 5 Massive Data Massive datasets are being collected everywhere Storage management software is billion-$ industry Examples: Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day Geography: NASA satellites generate 1.2TB per day

Lars Arge I/O-algorithms 6 Example: Grid Terrain Data Appalachian Mountains (800km x 800km) 500MB at 100m resolution 5.5GB at 30m resolution – NASA SRTM mission acquired 30m data for 80% of the earth land mass 50GB at 10m resolution (some of US available from USGS) 5TB at 1m resolution

Lars Arge I/O-algorithms 7 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory K = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge I/O-algorithms 8 List Ranking Trivial internal memory algorithm takes O(N) time –and causes O(N) page faults in external memory O(N/B) is the number of I/Os we need to read N element –Difference between N and N/B is extremely important in practice Can we develop O(N/B) algorithm? –Answer is NO CBADEGHF B=M/B=2

Lars Arge I/O-algorithms 9 Fundamental Bounds [AV88] Internal External Scanning: N Sorting: N log N Permuting –List rank Searching: Note: –Permuting not linear –Permuting and sorting bounds are equal in all practical cases –B factor VERY important: –Cannot sort optimally with search tree

Lars Arge I/O-algorithms 10 Sorting Merge sort: –Create N/M memory sized sorted runs –Merge runs together M/B at a time  phases using I/Os each

Lars Arge I/O-algorithms 11 Distribution Sort

Lars Arge I/O-algorithms 12 Finding partition elements

Lars Arge I/O-algorithms 13