Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency Gerth Stølting Brodal Allan Grønlund Jørgensen Thomas Mølhave University of Aarhus.

Slides:



Advertisements
Similar presentations
Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen.
Advertisements

Gerth Stølting Brodal University of Aarhus Monday June 9, 2008, IT University of Copenhagen, Denmark International PhD School in Algorithms for Advanced.
Lower Bounds for Sorting, Searching and Selection
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Hashing and Indexing John Ortiz.
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Advanced Data Structures
Computer Science Day 2008 Algorithms and Data Structures for Faulty Memory Gerth Stølting Brodal Department of Computer Science, University of Aarhus,
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
B+-trees and Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or block) If r is the size of a.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
B+-tree and Hashing.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Cache-Oblivious B-Trees
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Fully Persistent B-Trees 23 rd Annual ACM-SIAM Symposium on Discrete Algorithms, Kyoto, Japan, January 18, 2012 Gerth Stølting Brodal Konstantinos Tsakalidis.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Cache Oblivious Search Trees via Binary Trees of Small Height
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Cache-Oblivious Dynamic Dictionaries with Update/Query Tradeoff Gerth Stølting Brodal Erik D. Demaine Jeremy T. Fineman John Iacono Stefan Langerman J.
Algorithm Engineering, September 2013Data Structures, February-March 2010Data Structures, February-March 2006 Cache-Oblivious and Cache-Aware Algorithms,
CSE 373 Data Structures Lecture 15
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
Problems and MotivationsOur ResultsTechnical Contributions Membership: Maintain a set S in the universe U with |S| ≤ n. Given an x in U, answer whether.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
SEARCHING. Vocabulary List A collection of heterogeneous data (values can be different types) Dynamic in size Array A collection of homogenous data (values.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B-Tree – Delete Delete 3. Delete 8. Delete
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Sorting and searching in the presence of memory faults (without redundancy) Irene Finocchi Giuseppe F. Italiano DISP, University of Rome “Tor Vergata”
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Equivalence Between Priority Queues and Sorting in External Memory
1 CSC 421: Algorithm Design & Analysis Spring 2014 Complexity & lower bounds  brute force  decision trees  adversary arguments  problem reduction.
Internal and External Sorting External Searching
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Dynamic Dictionaries Primary Operations:  get(key) => search  put(key, element) => insert  remove(key) => delete Additional operations:  ascend()
COMP 5704 Project Presentation Parallel Buffer Trees and Searching Cory Fraser School of Computer Science Carleton University, Ottawa, Canada
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms,
(A Survey on) Priority Queues IanFest, University of Waterloo, Waterloo, Ontario, Canada, August 14-15, 2013 Gerth Stølting Brodal Aarhus Universty.
X1x1 x2x2 top-k y 3-sided x1x1 x2x2 External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates Gerth Stølting Brodal Aarhus.
arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
STACS arxiv.org/abs/ y 3-sided x1 x2 x1 x2 top-k
8th Workshop on Massive Data Algorithms, August 23, 2016
CPS216: Advanced Database Systems
Presentation transcript:

Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency Gerth Stølting Brodal Allan Grønlund Jørgensen Thomas Mølhave University of Aarhus ADS 2007, 3rd Bertinoro Workshop on Algorithms and Data Structures University Residential Centre of Bertinoro, Italy, September 30-October 5, 2007

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 2 Binary Searching Fault tolerance I/O Efficiency This talk Future work

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 3 Search(17) O(log N) comparisons

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 4 Search(17) ? 9 soft memory error

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 5 Faulty-Memory RAM Model Finocchi and Italiano, STOC’04  Content of memory cells can get corrupted  Corrupted and uncorrupted content cannot be distinguished  O(1) safe registers  Assumption: At most δ corruptions  Example: Sorting requires time Θ(N·log N+δ 2 ) Finocchi, Grandoni, Italiano, ICALP‘06

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 6 Faulty-Memory RAM: Searching  Lower bound  Upper bound Θ(log N + δ) comparisons Finocchi, Grandoni, Italiano, ICALP’06 Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Jørgensen, Moruz, Mølhave, ESA’07

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 7 Faulty-Memory RAM: Searching 17? Problem? High confidenceLow confidence Requirement: If there exists an uncorrupted element equal to the search key, we should find such an element

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 8 Faulty-Memory RAM: Searching When are we done ( δ=3 )? Contradiction, i.e. at least one fault If range contains at least δ+1 and δ+1 then there is at least one uncorrupted and, i.e. x must be contained in the range

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 9 If verification fails → contradiction, i.e. ≥1 memory-fault → ignore 4 last comparisons → backtrack one level of search Faulty-Memory RAM: Θ(log N + δ) Searching Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Jørgensen, Moruz, Mølhave, ESA’07

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 10 Faulty-Memory RAM: Θ(log N + δ) Searching  Standard binary search + verification steps  At most δ verification steps can fail/backtracking  Detail: Avoid repeated comparison with the same (wrong) element by grouping elements into blocks of size O(δ) Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Jørgensen, Moruz, Mølhave, ESA’07

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 11 Faulty-Memory RAM: Reliable Values  Store 2δ+1 copies of value x - at most δ copies uncorrupted  x = majority  Time O(δ) using two safe registers (candidate and count) δ=5 y y y x x y x x x y x Candidate y y y y y y y – x – x Count Boyer and Moore ‘91

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 12 Faulty-Memory RAM: Dynamic Dictionaries  Packed array  Reliable pointers and keys  Updates O(δ ·log 2 N)  Searches = fault tolerant O(log N+δ)  2-level buckets of size O(δ·log N)  Root: Reliable pointers and keys  Bucket search/update amortized O(log N+δ)... Θ(δ·log N) elements Itai, Konheim, Rodeh, 1981  Search and update amortized O(log N+δ)... Θ(δ) elements Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Jørgensen, Moruz, Mølhave, ESA’07

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 13 I/O Model

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 14 I/O Model  N = problem size  M = memory size  B = I/O block size  One I/O moves B consecutive records from to disk  Complexity = number of I/Os CPU External I/O Memory y r o m e M Aggarwal and Vitter 1988  Example: Sorting requires I/Os

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 15 B-trees O(log B N).... Ω(B)Ω(B) Search path  Search and update O(log B N)

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 16 Fault-Tolerance versus I/O Efficiency

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 17 Lower Bound for Fault-Tolerant External Searching  Adversary argument  If B ε slabs per I/O → factor B ε reduction and B 1-ε faults  After k I/Os N/(B ε ) k –k· B 1-ε elements remain Possible values  I/Os required [minimized wrt ε ]

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 18 Randomized Upper Bound for Fault-Tolerant External Searching  Sorted array + 2δ identical B-trees (over N/(2δ) elements, stored in BFS layout)  Search: Select random tree for each node on search path + verification  Probability no faults on path: where Σ β i ≤δ  Search O(log B N+δ/B) expected....

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 19  Sorted array + 2δ/B 1-ε identical B-trees of degree B ε + B 1-ε copies of each key + min/max  Search: Verify against min/max in each step – if fail, backtrack one level and advance to next copy  Search I/Os Deterministic Upper Bound for Fault-Tolerant External Searching

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 20 Dynamic Fault-Tolerant External Dictionaries Static structure + Packed arrays + Buckets of size O(δ ·log 3 N)  Deterministic I/Os search and updates  Randomized Expected O(log B N+δ/B) I/Os search and updates ... Static

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 21  Fault-tolerant external memory searching I/Os worst-case [minized wrt ε ]  Randomized O(log B N+δ/B) I/Os Conclusion

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 22 Future Work Fault Tolerance versus I/O Efficiency  Randomized algorithms: Memory faults in internal memory?  Sorting: ? ...

Dictionaries: Fault Tolerance versus I/O Efficiency Brodal, Jørgensen, Mølhave 23 THANKS