OpenLDAP Development Back-hdb – Hierarchical Backend Howard ChuMarch 21, 2003

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Data Structures Michael J. Watts
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
CSE332: Data Abstractions Lecture 7: AVL Trees Tyler Robison Summer
Indexing Techniques. Advanced DatabasesIndexing Techniques2 The Problem What can we introduce to make search more efficient? –Indices! What is an index?
Tirgul 6 B-Trees – Another kind of balanced trees Some notes regarding Home Work.
B+-tree and Hashing.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Redundant Bit Vectors for the Audio Fingerprinting Server John Platt Jonathan Goldstein Chris Burges.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Chapter 9 Multilevel Indexing and B-Trees
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Tirgul 6 B-Trees – Another kind of balanced trees.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Making B+-Trees Cache Conscious in Main Memory
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Chapter 19 - basic definitions - order statistics ( findkth( ) ) - balanced binary search trees - Java implementations Binary Search Trees 1CSCI 3333 Data.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
CS 430: Information Discovery
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Comp 335 File Structures B - Trees. Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Tree-Structured Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Internal and External Sorting External Searching
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
CS4432: Database Systems II
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

OpenLDAP Development Back-hdb – Hierarchical Backend Howard ChuMarch 21, 2003

Motivation Support ModDN/Subtree Rename Avoid the Quadratic growth in DN2ID Enhance update performance Purist – it’s a hierarchical namespace, after all!

Review of back-bdb DN2ID Uses same DN2ID design as back-ldbm Can locate any DN with only 1 DB call – good for fast searching Maintains explicit list of onelevel and subtree IDs per entry – also good for fast searching

Back-bdb DN2ID drawbacks Slow for updates, some kinds of updates are impossible/impractical (e.g. subtree rename) –Costs a minimum of 2 DB updates plus 1 per level of DN depth to Add/Delete entries –Stores excessive redundant DN information –Paradoxically, the deeper/more organized the tree, the worse it performs Historically, people speak of LDAP as “good for reads, bad for writes” – this is largely due to the DN2ID design, not the LDAP/X.500 specs

Back-bdb DN2ID example o=org ou=biz c=us cn=Joe cn=Bob =c=us:1; 1,2,3,4,5 =o=org,c=us:2; 2,3,4,5 =ou=biz,o=org,c=us:3; %ou=biz,o=org,c=us: 3,4,5 =cn=bob,ou=biz,o=org,c=us: 4 =cn=joe,ou=biz,o=org,c=us: 5

Back-bdb DN2ID example (2) 5 nodes in DIT 13 keys in database 23 data items in these 13 keys Keys are long, consume more DB pages It only gets worse from here…

Back-hdb Principles Only store RDNs and parent references –Eliminates redundant DN storage –Allows subtree rename –Performs Add/Delete/ModRDN with only 1 DB update – O(constant) instead of O(n) efficiency. Sacrifices search performance? –Requires Depth(DN) DB searches to locate a base DN –No subtree IDLs, requires recursive DB searches

Back-hdb DN2parent example o=org ou=biz c=us cn=Joe cn=Bob 0: c=us,1 1: o=org,2 2: ou=biz,3 3: cn=bob,4 3: cn=joe,5

Back-hdb DN2parent (2) 5 nodes in DIT 4 keys in database 5 data items in these 4 keys Keys are short, DB remains small

Back-hdb “Sacrifices?” Back-bdb DN2ID search is always O(log(N)). –Efficiency is guaranteed by Btree balancing –But each compare is over a full DN – expensive –N is large, due to subtree/onelevel IDLs Back-hdb best case is O(log(N)). –Efficiency not guaranteed, poor DIT layout will have negative effects –But each compare is only over an RDN – very cheap –N is small, no redundant IDLs cluttering things up

Back-hdb “Sacrifices?” (2) Back-bdb DN2ID subtree IDL speeds subtree searching? –Only for small trees. Beyond the fixed IDL size, the subtree IDL does more harm than good, bringing in false candidates Back-hdb subtree recursion is expensive? –Never brings in false candidates – makes search evaluation more efficient

Test Results Back-hdb & back-bdb search performance tests out to nearly identical, with a 2% advantage to back-hdb. Entry caching has leveled the field here, but back-hdb’s smaller footprint still gives it an edge. Back-hdb Add/Delete performance relative to back-bdb depends on database size; even on small DBs 10% gains are noticeable. Overshadowed by attribute indexing.

Conclusions Back-hdb is faster for both reads and writes, even without entry caching. Minus attribute indexing, back-hdb performs DIT updates in constant time – disproving the myth that LDAP must be slow for writes. This is the most viable approach for really large scaling.