CPS216: Data-intensive Computing Systems

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Hashing and Indexing John Ortiz.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
BTrees & Bitmap Indexes
Efficient Storage and Retrieval of Data
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
CS 255: Database System Principles slides: B-trees
CS4432: Database Systems II
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 Physical Data Organization and Indexing Lecture 14.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Indexing.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Select Operation Strategies And Indexing (Chapter 8)
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
Storage and File Organization
COMP3017 Advanced Databases
Data Indexing Herbert A. Evans.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
External Methods Chapter 15 (continued)
CS 405G: Introduction to Database Systems
CPSC-310 Database Systems
Chapter 11: Indexing and Hashing
(Slides by Hector Garcia-Molina,
CPSC-310 Database Systems
Lecture 19: Data Storage and Indexes
Random inserting into a B+ Tree
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
File Storage and Indexing
Database Design and Programming
CSE 544: Lecture 11 Storing Data, Indexes
CPS216: Advanced Database Systems
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
EECS 647: Introduction to Database Systems
Lecture 11: B+ Trees and Query Execution
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Indexing, Access and Database System Architecture
Chapter 11: Indexing and Hashing
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CPS216: Data-intensive Computing Systems Operators for Data Access Shivnath Babu

Problem Relation: Employee (ID, Name, Dept, …) 10 M tuples (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob”

Solution #1: Full Table Scan Storage: Employee relation stored in contiguous blocks Query plan: Scan the entire relation, output tuples with Name = “Bob” Cost: Size of each record = 100 bytes Size of relation = 10 M x 100 = 1 GB Time @ 20 MB/s ≈ 1 Minute

Solution #2 Storage: Query plan: Employee relation sorted on Name attribute Query plan: Binary search

Solution #2 Cost: Size of a block: 1024 bytes Number of records per block: 1024 / 100 = 10 Total number of blocks: 10 M / 10 = 1 M Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 ms

Solution #2: Issues Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” Inserts and Deletes

Indexes Data structures that efficiently evaluate a class of filter predicates over a relation Class of filter predicates: Single or multi-attributes (index-key attributes) Range and/or equality predicates (Usually) independent of physical storage of relation: Multiple indexes per relation

Indexes Disk resident Updated when indexed relation updated Large to fit in memory Persistent Updated when indexed relation updated Relation updates costlier Query cheaper

Problem Relation: Employee (ID, Name, Dept, …) (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” Single-Attribute Index on Name that supports equality predicates

Roadmap Motivation Single-Attribute Indexes: Overview Order-based Indexes B-Trees Hash-based Indexes (May cover in future) Extensible Hashing Linear Hashing Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)

Single Attribute Index: General Construction 1 2 i n a

Single Attribute Index: General Construction 1 2 i n a 1 2 i n b 1 2 b A = val A > low A < high i b n b

Exceptions Sparse Indexes Require specific physical layout of relation Example: Relation sorted on indexed attribute More efficient

Single Attribute Index: General Construction Textbook: Dense Index A B a 1 2 i n a 1 2 i n b 1 2 b A = val A > low A < high i b n b

Single Attribute Index: General Construction How do we organize (attribute, pointer) pairs? a 1 2 i n A = val Idea: Use dictionary data structures A > low A < high Issue: Disk resident?

Roadmap Motivation Single-Attribute Indexes: Overview Order-based Indexes B-Trees Hash-based Indexes Extensible Hashing Linear Hashing Multi-Attribute Indexes

B-Trees Adaptation of search tree data structure Supports range predicates (and equality)

Use Binary Search Tree Directly? 16 74 71 54 32 92 83 16 32 54 71 74 83 92

Use Binary Search Tree Directly? Store records of type <key, left-ptr, right-ptr, data-ptr> Remember position of root Question: will this work? Yes But we can do better!

Use Binary Search Tree Directly? Number of keys: 1 M Number of levels: log (2^20) = 20 Total cost index lookup: 20 random disk I/O 20 x 20 ms = 400 ms B-Tree: less than 3 random disk I/O

B-Tree vs. Binary Search Tree k1 k2 k3 k40 k 1 Random I/O prunes tree by 40 1 Random I/O prunes tree by half

B-Tree Example 15 36 57 63 76 87 92 100

B-Tree Example 63 36 84 91 15 36 57 63 76 87 92 100 null

Meaning of Internal Node 84 91 key < 84 84 ≤ key < 91 91 ≤ key

B-Tree Example 63 36 84 91 15 36 57 63 76 87 92 100 null

Meaning of Leaf Nodes 63 76 Next leaf pointer to record 63

Equality Predicates key = 87 63 36 84 91 15 36 57 63 76 87 92 100 null

Equality Predicates key = 87 63 36 84 91 15 36 57 63 76 87 92 100 null

Equality Predicates key = 87 63 36 84 91 15 36 57 63 76 87 92 100 null

Equality Predicates key = 87 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 76 87 92 100 null

General B-Trees Fixed parameter: n Number of keys: n Number of pointers: n + 1

B-Tree Example n = 2 63 36 84 91 15 36 57 63 76 87 92 100 null

General B-Trees Fixed parameter: n Number of keys: n Number of pointers: n + 1 All leaves at same depth All (key, record pointer) in leaves

B-Tree Example n = 2 63 36 84 91 15 36 57 63 76 87 92 100 null

General B-Trees: Space related constraints Use at least Root: 2 pointers Internal: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data

n=3 Min Max 5 15 21 15 Internal 31 42 56 31 42 Leaf

Leaf Nodes n key slots (n+1) pointer slots

Leaf Nodes k k k k unused 1 3 n key slots … … 2 m (n+1) pointer slots next leaf record of k 1 record of k record of k 2 m

Leaf Nodes   m ≥ k k k k (n+1) 2 unused 1 3 n key slots … … 2 m (n+1) pointer slots … … … next leaf record of k 1 record of k record of k 2 m

Internal Nodes n key slots (n+1) pointer slots

Internal Nodes k k k k unused 1 3 n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

Internal Nodes   (m+1) ≥ k k k k (n+1) 2 unused 1 3 n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

Root Node 2 (m+1) ≥ k k k k unused 1 3 n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

Limits Why the specific limits (n+1)/2 and (n+1)/2 ? Why different limits for leaf and internal nodes? Can we reduce each limit? Can we increase each limit? What are the implications?