Cache Conscious Indexing for Decision-Support in Main Memory Pradip Dhara.

Slides:



Advertisements
Similar presentations
COSC 2007 Data Structures II Chapter 14 External Methods.
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
B+-tree and Hashing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
Efficient Storage and Retrieval of Data
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Making B+-Trees Cache Conscious in Main Memory
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Lecture 5 Cost Estimation and Data Access Methods.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
3 Data. Software And Data Data Data element – a single, meaningful unit of data. Name Social Security Number Data structure – a set of related data elements.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Storage Access Paging Buffer Replacement Page Replacement
CPS216: Data-intensive Computing Systems
Indexing Goals: Store large files Support multiple search keys
Database Management System
CS522 Advanced database Systems
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Secondary Storage Data Retrieval.
CSE 332 Data Abstractions B-Trees
Yan Huang - CSCI5330 Database Implementation – Access Methods
Lecture 2- Query Processing (continued)
CACHE-CONSCIOUS INDEXES
File Storage and Indexing
CPS216: Advanced Database Systems
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Presentation transcript:

Cache Conscious Indexing for Decision-Support in Main Memory Pradip Dhara

Why In-memory databases Telecommunications CAD tools Moore’s law will allow us to store relations in memory

Redesigning DBMS’s Optimize memory-cpu performance vs disk- memory performance Re-evaluate space/time tradeoff – space isn’t cheap Given certain space requirement, need to optimize response time for lookups

Indices in In-Memory DBMS’s Little extra space vs. Increased performance Index design takes on new dimensions when looking at in-memory databases Space overhead can not be ignored – hash tables are unacceptable

Hardware solutions Caches Growing disparity between CPU performance and memory performance. Cache misses can’t be overlapped

Solution CSS-trees indices exploit cache behavior to get improved performance

Direct Mapped Cache

Fully Associative Cache

2-Way Set Associative Cache

Binary Search on Sorted Array Store the relation in sorted order on a key Cache performance dependent upon tuple size

T-trees pointer to record 4, *8, * … 0, *3, * … 10, *16, * … key

Enhanced B+ trees 1, *3, *2, *4, *5, *7, *6, *8, *9, *11, *10, *12, * 13, *15, *14, *16, *17, *19, *18, *20, *

Hash Indices , *8, *80, *… Put however many pairs fit into a cache line

Idea Behind CSS-trees Save space by not storing pointers Use an array as a tree Implicitly store pointers as offsets into the array

Useful Formulas for CSS-trees Children of a node b are nodes b(m+1) to b(m+1) + (m+1) N = n * m n = # of elements m = # of elements per node N = # of nodes # of Internal Nodes = First leaf node in bottom level = (EQ 1) (EQ 2) (EQ 3) (EQ 4)

How it works Sorted array CSS-tree array (Directory) Full CSS-tree node 0 node 1node 2node 3 node 4node 5node 6 node 1node 2node 3node 4node 5node 6 Internal nodes Leaf nodes node 0node 1node 2node 3node 4 Values (Lemma 4.1) m (# keys per node) = 2 n (# keys) = 10 k (log m+1 N)= 2 N (# of Leaf Nodes) = 5 Internal Nodes = 2 First leaf node in bottom level = 4

Building a full CSS-tree

Searching Within a Node

Level CSS-trees Value of largest key in subtree m = 2 t Entries per node = m -1

Level vs. Full CSS-trees Level CSS-trees will be deeper due to the difference in branching factor Level CSS-trees have fewer comparisons per node Level CSS-trees have more cache accesses and and node traversals log 2 N vslog 2 N * log m+1 m * (1 + 2/(m+1)) log m N vsLog m+1 N

Time Analysis R (size of rid) = 4 bytes K (size of key) = 4 bytes P (size of pointer) = 4 bytes h = 1.2 n (# records) = 10 7 c (cache line) = 32 bytes s (node size/c) = 1 D = time to derefence a pointer A b = time to compute child address for binary search A fcss = time to compute child address for full CSS A lcss = time to compute child address for level CSS s = mK/c

Space Analysis R (size of rid) = 4 bytes K (size of key) = 4 bytes P (size of pointer) = 4 bytes h = 1.2 n (# records) = 10 7 c (cache line) = 32 bytes s (node size/c) = 1 D = time to derefence a pointer A b = time to compute child address for binary search A fcss = time to compute child address for full CSS A lcss = time to compute child address for level CSS s = mK/c

Experiment Results are for Ultra Sparc II – Keys randomly generated integers between 0 and 1 million Performed 5 tests of 100,00 searches for random keys

Figure 5a: Array Size vs. time

Figure 5b: Array Size vs. Time

Figure 6a: Array Size vs. 2 nd cache accesses

Figure 6b: Array Size vs. 2 nd cache misses

Figure 7: Node Size vs. Time

CSS Performance on Other Queries CSS is very good for individual selection queries CSS will probably perform the best in range queries Index nested loops join vs. Sort merge join

Doubts About CSS Flexibility of CSS-trees across different cache designs Any applicability to variable sized records Multiple CSS-tree indices on different keys

Conclusion CSS-trees improve searching performance by exploiting cache consciousness.

One Last Thought Cache designs Should we redesign them to let programmers have control?