I/O-Algorithms Lars Arge Aarhus University February 7, 2005.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
External Memory Geometric Data Structures
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Storage Access Paging Buffer Replacement Page Replacement
Multiway Search Trees Data may not fit into main memory
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Advanced Topics in Data Management
Advanced Topics in Data Management
B+-Trees (Part 1).
CPS216: Advanced Database Systems
Presentation transcript:

I/O-Algorithms Lars Arge Aarhus University February 7, 2005

Lars Arge Algorithms for Hierarchical Memory 2 Random Access Machine Model Standard theoretical model of computation: –Infinite memory –Uniform access cost RAMRAM

Lars Arge Algorithms for Hierarchical Memory 3 Hierarchical Memory Modern machines have complicated memory hierarchy –Levels get larger and slower further away from CPU –Levels have different associativity and replacement strategies –Large access time amortized using block transfer between levels Bottleneck often transfers between largest memory levels in use L1L1 L2L2 RAMRAM

Lars Arge Algorithms for Hierarchical Memory 4 I/O-Bottleneck I/O is often bottleneck when handling massive datasets –Disk access is 10 6 times slower than main memory access –Large transfer block size (typically 8-16 Kbytes) Important to obtain “locality of reference” –Need to store and access data to take advantage of blocks

Lars Arge Algorithms for Hierarchical Memory 5 Massive Data Massive datasets are being collected everywhere Storage management software is billion-$ industry Examples: Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day Geography: NASA satellites generate 1.2TB per day

Lars Arge Algorithms for Hierarchical Memory 6 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory K = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

Lars Arge Algorithms for Hierarchical Memory 7 Fundamental Bounds [AV88] Internal External Scanning: N Sorting: N log N Permuting –List rank Searching: Note: –Permuting not linear –Permuting and sorting bounds are equal in all practical cases –B factor VERY important: –Cannot sort optimally with search tree

Lars Arge Algorithms for Hierarchical Memory 8 Merge Sort Merge sort: –Create N/M memory sized sorted runs –Merge runs together M/B at a time  phases using I/Os each Distribution sort similar (but harder – partition elements)

Lars Arge Algorithms for Hierarchical Memory 9 –If nodes stored arbitrarily on disk  Search in I/Os  Rangesearch in I/Os Binary search tree: –Standard method for search among N elements –We assume elements in leaves –Search traces at least one root-leaf path External Search Trees

Lars Arge Algorithms for Hierarchical Memory 10 External Search Trees BFS blocking: –Block height –Output elements blocked  Rangesearch in I/Os Optimal: space and query

Lars Arge Algorithms for Hierarchical Memory 11 Maintaining BFS blocking during updates? –Balance normally maintained in search trees using rotations Seems very difficult to maintain BFS blocking during rotation –Also need to make sure output (leaves) is blocked! External Search Trees x y x y

Lars Arge Algorithms for Hierarchical Memory 12 B-trees BFS-blocking naturally corresponds to tree with fan-out B-trees balanced by allowing node degree to vary –Rebalancing performed by splitting and merging nodes

Lars Arge Algorithms for Hierarchical Memory 13 (a,b)-tree uses linear space and has height  Choosing a,b = each node/leaf stored in one disk block   space and query (a,b)-tree T is an (a,b)-tree (a≥2 and b≥2a-1) –All leaves on the same level (contain between a and b elements) –Except for the root, all nodes have degree between a and b –Root has degree between 2 and b  tree

Lars Arge Algorithms for Hierarchical Memory 14 (a,b)-Tree Insert Insert: Search and insert element in leaf v DO v has b+1 elements/children Split v: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) Insert touch nodes v v’v’’

Lars Arge Algorithms for Hierarchical Memory 15 (a,b)-Tree Insert

Lars Arge Algorithms for Hierarchical Memory 16 (a,b)-Tree Delete Delete: Search and delete element from leaf v DO v has a-1 elements/children Fuse v with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1<2b) children split v v=parent(v) Delete touch nodes v v

Lars Arge Algorithms for Hierarchical Memory 17 (a,b)-Tree Delete

Lars Arge Algorithms for Hierarchical Memory 18 (a,b)-tree properties: –If b=2a-1 one update can cause many rebalancing operations –If b≥2a update only cause O(1) rebalancing operations amortized –If b>2a rebalancing operations amortized *Both somewhat hard to show –If b=4a easy to show that update causes rebalance operations amortized *After split during insert a leaf contains  4a/2=2a elements *After fuse during delete a leaf contains between  2a and  5a elements (split if more than 3a  between 3/2a and 5/2a) (a,b)-Tree insert delete (2,3)-tree

Lars Arge Algorithms for Hierarchical Memory 19 B-Tree B-trees: (a,b)-trees with a,b = –O(N/B) space –O(log B N+T/B) query –O(log B N) update B-trees with elements in the leaves sometimes called B + -tree Construction in I/Os –Sort elements and construct leaves –Build tree level-by-level bottom-up

Lars Arge Algorithms for Hierarchical Memory 20 B-tree B-tree with branching parameter b and leaf parameter k (b,k≥8) –All leaves on same level and contain between 1/4k and k elements –Except for the root, all nodes have degree between 1/4b and b –Root has degree between 2 and b B-tree with leaf parameter –O(N/B) space –Height – amortized leaf rebalance operations – amortized internal node rebalance operations

Lars Arge Algorithms for Hierarchical Memory 21 Permuting lower bound

Lars Arge Algorithms for Hierarchical Memory 22 Sorting lower bound

Lars Arge Algorithms for Hierarchical Memory 23 Searching lower bound