Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations:

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
Unary Query Processing Operators CS 186, Spring 2006 Background for Homework 2.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
1 Implementation of Relational Operations: Joins.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 13: Query Processing
GSLPI: a Cost-based Query Progress Indicator
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
CS4432: Database Systems II Query Processing- Part 2.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Module 11: File Structure
CS 440 Database Management Systems
Database Management System
External Sorting Chapter 13
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Chapter 12: Query Processing
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Physical Join Operators
File Processing : Query Processing
Yan Huang - CSCI5330 Database Implementation – Access Methods
Introduction to Database Systems
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
CS179G, Project In Computer Science
Lecture 2- Query Processing (continued)
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
External Sorting Chapter 13
Evaluation of Relational Operations: Other Techniques
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations: * Microsoft Hyderabad, # Guruji.com, & SAP

COMAD 2008, IIT Bombay Agenda  AIO Background  Exploiting AIO in query processing Asynchronous Iterator model  Asynchronous Index Nested Loops Join  Asynchronous versions of other operators  Performance results  Related Work  Conclusion

COMAD 2008, IIT Bombay ApplicationKernel Read () System call Initiate IO Read response Context switch Application Blocked ! IO Processing : Traditional way  CPU is idle most of the time waiting for an IO completion data

COMAD 2008, IIT Bombay ApplicationKernel AIO Read () System call Initiate IO Read response IO Processing : Async. way Notify data Do other work !!

COMAD 2008, IIT Bombay IO Processing : Async. Way  Asynchronous approach Overlap of CPU and IO processing Application can generate multiple IO requests  Allows IO subsystem to reorder access to data on disk Important in RAID environments

COMAD 2008, IIT Bombay Asynchronous IO Interface aio_read ( aio structure)Request an AIO read operation aio_error ( aio structure )Check the status of an AIO request lio_listio ( array of aio structures ) Initiate a list of AIO operations We use list AIO in our implementation Can initiate multiple IO read operations in one system call ( File descriptor, offset, buffer, numBytes, … ) Linux 2.6 kernel

COMAD 2008, IIT Bombay Handling AIO completion  Signal-based handler A signal is generated on IO completion  Callback using interrupts An interrupt is generated on IO completion  Concurrent access to completion handler and shared data structures in both of above methods  Polling Store IO requests in pending queue and poll periodically for completion Our experiments show polling beats signal/interrupt based approach Call completion handler

COMAD 2008, IIT Bombay Demand-Driven Iterator  Bottom level nodes perform operations such as sequential scans or index scans.  Upper level nodes are join nodes or other operator nodes such as sort or aggregate. NL J sca n Table ATable B Open() Next() Close() Blocking call ! sca n

COMAD 2008, IIT Bombay  AIO Background  Exploiting AIO in query processing Asynchronous Iterator model  Asynchronous Index Nested Loop (INL) Joins  Asynchronous versions of other operators  Performance results  Related Work  Conclusion Agenda

COMAD 2008, IIT Bombay NL J sca n Table ATable B sca n Open() Next() Close() Asynchronous Iterator I don’t have the tuple available in the memory !! Issue AIO read operation Return “LATER” Non- Blocking call !

COMAD 2008, IIT Bombay Asynchronous Iterator Model (AIM)  Allow a node to return a status “LATER” to the parent Instead of blocking for IO completion.  The parent operator could Perform other work, such as fetching data from another input Simply return a LATER status to its parent node Or just loop, reinvoking the child operator till it returns a tuple  E.g. root of the execution plan tree  Exact action depends on operator Asynchronous versions of different operators Focus on Asynchronous Indexed Nested Loops join

COMAD 2008, IIT Bombay Asynchronous INL Joins  Original state of Indexed Nested Loops (INL) node Left and right subplans and qualifier lists  Augmented state for async INL node An array of outer tuples each having a queue of matching inner TIDs  AIO may have been issued for some already, others later A workqueue for outer slots which already have AIO issued for their matching inner TIDS An IO queue recording all pending AIO requests made by the node  Used to poll for completion of AIO requests

COMAD 2008, IIT Bombay Asynchronous INL Join (contd.)  We divide the async INL join operations into two stages Stage 1: Fetch outer tuples and issues AIO requests Stage 2: Check for AIO completion, process AIO results and return join results.  Stages are interleaved Stage 1 may be in progress for some tuples, and Stage 2 for others

COMAD 2008, IIT Bombay Asynchronous INL Join (contd.) Fetch outer tuples Find the matching inner TIDs for each outer tuple Put the outer tuple in workqueue For each outer tuple Issue LIST AIO for matching inner TIDS of all outer tuples in workqueue (subject to BATCH_SIZE) Stage 1

COMAD 2008, IIT Bombay Asynchronous INL Join (contd.)  Rules Batch size  BATCH_SIZE: max number of outstanding AIO requests  Why? OS limits, efficiency issues  We set the MAX_BATCH_SIZE per node to 200 in our experiments  Scale BATCH_SIZE in powers of 2 till MAX_BATCH_SIZE so that async INL can output tuples quickly at the onset Case where outer tuple matches a large number of inner tuples is handled appropriately Keeping the AIO queue filled  We issue further AIO requests (fetching outer tuples as required) if 10 % of earlier AIO requests have completed

COMAD 2008, IIT Bombay Asynchronous INL Join (contd.) For each outer tuple in workqueue Stage 2 Remove that inner TID from outer tuple’s TID array Perform join and add to result if join result found break from loop Check if any matching inner TIDs are present in memory Present ? Yes Update workqueue Next page.. No

COMAD 2008, IIT Bombay Asynchronous INL Join (contd.) Any join results? Poll for AIO completion Is tuple found or parent node cannot handle LATER Is no outstanding outer tuples & reached end of outer tuple Return result and tupStat to parent node No Yes tupStat = END_OF_RESULT result = NULL tupstat = LATER result = NULL Back to start of Stage 2 No Yes Prev page.. Yes No Return result to parent node

COMAD 2008, IIT Bombay Async. versions of other operators  Async Sequential scan Check if next tuple is in the in-memory buffer If its present, return the tuple Else initiate an async read. Set tupStat = LATER and return  Out of order sequential scan Start returning the tuples of a particular relation which are already there in the memory  even if out of order Concurrently, issue AIO for other tuples

COMAD 2008, IIT Bombay Async. versions of other operators Merge Join sort Seq scan T1T2 I can start the sorting of other input ! LATER Initiate AIO read

COMAD 2008, IIT Bombay Performance Results  Experiments with TPC-H database with scale factors of 1 and 10 in three different setups Core 2 duo P4 with:  1GB RAM and TPC-H - 1 GB database (single disk)  1GB RAM and TPC-H – 10 GB database (single disk)  3.2GB RAM and TPC-H – 10 GB database (4 disks / RAID 10)  We use PostgreSQL as the code base  Compare it with our modified version of the same code base, incorporating asynchronous iterator model with async INL and async seq. scan

COMAD 2008, IIT Bombay Performance Results: 1GB RAM Query 1a: select l_orderkey, l_quantity from orders, lineitem where o_orderkey=l_orderkey and l_orderkey%100=2 and l_linestatus=’F’ TPCH 1 GB TPCH 10 GB

COMAD 2008, IIT Bombay Performance Results: 1 GB RAM Query 2a: select l_orderkey,l_quantity from orders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey and l_orderkey%100=2 and l_linestatus=’F’ TPCH 1 GB TPCH 10 GB

COMAD 2008, IIT Bombay Performance Results : 1GB RAM Query 2a : Join of orders, lineitem and customer with filter (TPCH 1GB ) Startup effect

COMAD 2008, IIT Bombay Performance Results: 1 GB RAM Query 2b: select l_orderkey,l_quantity from myorders,lineitem,customer where o_orderkey=l_orderkey and o_custkey=c_custkey 1GB RAM TPCH 1 GB TPCH 10 GB -- No tight selection

COMAD 2008, IIT Bombay Performance Results: 3.2 GB + RAID TPC-H 10GB / 3.2GB RAM / 4 disks RAID10 Query 1a : Join of orders and lineitem with filter Query 2a : Join of orders, lineitem and customer with filter

COMAD 2008, IIT Bombay Performance Results: 3.2 GB + RAID Query 1b : Join of myorders, lineitem TPC-H 10GB / 3.2GB RAM / 4 disks RAID10 Query 2b : Join of myorders, lineitem and customer

COMAD 2008, IIT Bombay Performance Results TPC-H Q12:select l_shipmode,sum(...) from orders,lineitem where o_orderkey = l_orderkey and group by l_shipmode order by l_shipmode Original INLAsync INLGain TPCH 1GB 1GB RAM 64.7 sec48 sec25 % TPCH 10 GB 1GB RAM 687 sec431 sec37 % TPCD 10GB RAID 10 4 disks, 3.2 GB RAM 164 sec147 sec10 %

COMAD 2008, IIT Bombay Related Work  Graefe’s generalized spool iterator (Graefe [ BTW03 ]) INL Spool operator scan Index lookup Pre-fetches multiple outer tuples Issue AIO for matching inner TIDS Can be replenished when empty or when one tuple is joined

COMAD 2008, IIT Bombay Related Work  AIO used in database products Microsoft SQL Server, IBM DB2, Oracle No public documentation on how these systems use AIO  Asynchronous iteration for evaluating web queries (R.Goldman and J. Widom [ SIGMOD 2000 ] ) They report results only on web queries

COMAD 2008, IIT Bombay Conclusion  Proposed the Asynchronous Iterator Model (AIM)  Presented asynchronous versions of INL and some operators  Showed gains of over 50 % in some cases  AIM can be useful in web-service access and in data integration systems like IBM DataJoiner  Future work Implementing async versions for index lookup, sub plan, sort and merge operator Performing async IO in the presence of ordering constraints

COMAD 2008, IIT Bombay Thank You Questions ?

COMAD 2008, IIT Bombay Plans  Query 1a : Seq scan on lineitem, probe on orders Merge Join -> Index Scan on orders -> Sort lineitem -> Seq Scan on lineitem  Query 2a: Nested Loop -> Nested Loop -> Seq Scan on lineitem -> Index Scan on orders -> Index Scan on customer

COMAD 2008, IIT Bombay Plans  Query 2a Merge Join -> Sort orders -> Merge Join -> Index Scan on orders -> Sort on lineitem -> Seq Scan on lineitem -> Index Scan on customer  Query 2b : Nested Loop -> Nested Loop -> Seq Scan on lineitem -> Index Scan on myorders -> Index Scan on customer