DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi (2012037) Mahima Malik (2012053) Shrey Gupta (2012098) Vedanshi Kataria (2012117)

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS 540 Database Management Systems
Join Processing in Databases Systems with Large Main Memories
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Bhargav Vadher (208) APRIL 9 th, 2008 Submittetd To: Dr. T Y Lin Computer Science Department San Jose State University.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
Lecture 11: DMBS Internals
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Oracle SQL*Loader
Evaluation of Relational Operations
Lecture 11: DMBS Internals
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Join Processing in Database Systems with Large Main Memories (part 2)
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
One-Pass Algorithms for Database Operations (15.2)
Overview of Query Evaluation
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
Lecture 20: Query Execution
Presentation transcript:

DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )

Introduction ■ SSDs have rich internal parallelism – Chance to improve I/O bandwidth by doing parallel processing ■ Higher IOPS (Input/Output Operations Per Second) ■ Traditional query processing algorithms mainly designed according to the mechanical traits of the HDDs – Simply replacing HDDs by SSDs does very little benefit – Redesign algorithms like scan and join to take full advantage of SSDs

SSD Internal Architecture ■ Multiple channels shared by a set of flash memory packages ■ Two levels of parallelism – Channel level parallelism ■ Each channel can be operated independently and simultaneously – Package level parallelism ■ Operations on flash memory packages attached to the same channel can also be interleaved

Some Terms ■ Domain is a set of flash memories that share a specific set of resources (e.g. channels) – Can further be partitioned into sub-domains ■ Chunk is a logical data page with a unique logical address. ■ Every domain has into its own ScanBuffer to store scan results for the particular domain.

ParaScan ■ Most SSDs adopt a RAID-0 like striping data storage mechanism – with consecutive Logical Block Addresses for the striped chunks ■ Try uniform distribution of the table across domains to maximize parallelism benefits. ■ Example: On the assumption of 20 domains, the chunk whose logical address is 20*n (n=0,1,2...), will be in 1st domain, the chunk whose logical address is 20*n+1 (n=0,1,2...), will be in 2nd domain, and so on. ■ Parascan is twice as fast as a traditional table scan on SSD and 4 times as fast as a traditional table scan on HDD in best case.

Domain 0 Domain 1 Chunk 0 Chunk 1 Chunk 2 Chunk 3

ParaScan ■ Domain scan – Read data chunks one by one from a single domain and then put them into its own ScanBuffer, allowing multiple domain scans to be executed in parallel without interference ■ Multi-domain parallel scan – Multiple threads ■ Each in charge of one or more individual domain scans – Entire scan buffer is also divided into several ScanBuffers so that each scan thread can use one ScanBuffer ■ Performance of multi-domain parallel scan depends on the concurrency level – the maximal number of physical threads supported by the processor – the maximal queue depth supported by the SSD

Buffer Multi-Domain Parallel Scan SSD Domain #0 ParaScan Chunk # Chunk #20 Chunk #200 Domain #1 Chunk # Chunk #21 Chunk # Domain #19 Chunk # Chunk #39 Chunk #219 Domain Scan ScanBuffer #0 Page #0Page #1 ScanBuffer #1 Page #0Page #1 ………… Page #0Page #1 ScanBuffer #19 Page #0Page #1

ParaHashJoin - Parallel Hash Join ■ Two-way equi-join consists of three phases – ParaHash, MiniJoin, and Fetch phase. ■ 3x times faster than traditional hash join in SSD Table R Table S ParaScan ParaHash MiniJoinFetch ParaHashJoi n

ParaHash ▪ Buffer - scan area and hash area. ▪ Multiple hash threads - to calculate the hash values of records in ScanBuffers  Each thread is assigned to one ScanBuffer ▪ Based on hash values, put their join attributes and RIDs into corresponding hash buckets in parallel  Hash function : Hash-value = join_attr & (B-1) ▪ Concurrency control  Each bucket maintain lightweight clock  Bitmap is used to check whether hash index records with specified join attribute exists in the bucket

ParaHashJoin ParaHash B-2B-1... Table Scan Area Hash Area

ParaHashJoin MiniJoin ▪ Input - ParaHash table R and ParaHash table S ▪ Each bucket is read into the memory to generate join results - {join_attr, RID R, RID S }  Two passes are required to generate MiniJoin results – one pass for ParaScan and one pass for MIniJoin ▪ If enough memory present to hold hash table of R (smaller table)  Table R is ParaScan and then ParaHash, Table S needs to be ParaScan only and can directly probe hash table of R for join results  Only one pass is required

ParaHashJoin Fetch ▪ Outputs necessary attributes using RIDs specified in the MiniJoin output index to get the final join results. ▪ TID Hash Join approach  For each join result, fetch the needed data pages to generate the final join result ▪ this approach is reasonable if all the pages of the result can fit in memory  When memory is insufficient, some pages can be loaded multiple times, resulting in higher cost of loading pages ▪ Sort-based fetching approach  Sort MiniJoin results based on the RIDs of outer table  Load needed pages to produce final join results according to the sorted MiniJoin results

ParaAggr (Parallel Aggregation) ■ Parallel implementation of aggregation operations (sum, max, min, count, average). ■ Two Phases: – SubAggr: Multiple threads corresponding to each ParaScan thread or ScanBuffer. – TolAggr: Combines results of all SubAggr instances. ■ ParaAggr in SSDs is 3-4 times faster than traditional aggregation using single thread in HDD.

ParaAggr – Working Diagram ParaAggr Domain # Domain #n ParaScan ParaScan SubAggr TotAggr SubAggr

ParaAggr - Example ParaAggr Domain # Domain #n ParaScan ParaScan SubAggr TotAggr SubAggr Scan records in all domains in parallel. Forward those satisfying WHERE clause to SubAggr. SELECT count(*) FROM Employee WHERE dept=’Sales’

ParaAggr - Example Count of ParaScan results in parallel SubAggr threads. SELECT count(*) FROM Employee WHERE dept=’Sales’ ParaAggr Domain # Domain #n ParaScan ParaScan SubAggr TotAggr SubAggr

ParaAggr - Example Summation on result of count SubAggr operations. SELECT count(*) FROM Employee WHERE dept=’Sales’ ParaAggr Domain # Domain #n ParaScan ParaScan SubAggr TotAggr SubAggr

Summary ■ Rich internal parallelism in SSDs exploited for faster data query processing. ■ Algorithms use parallel threads and SSD domains to speed up process. – E.g. ParaScan, ParaHashJoin, ParaSort, ParaAggr. ■ Para-SSD algorithms much faster that traditional HDD algorithms.

References ■ Y. Fan, W. Lai, X. Meng, Optimizing Database Operators by Exploiting Internal Parallelism of Solid State Drives