Unary Query Processing Operators Not in the Textbook!

Slides:



Advertisements
Similar presentations
1 Quick recap of the SQL & the GUI way in Management Studio The Adwentureworks database from the book A script to create tables & insert data for the Amazon.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Implementation of relational operations
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Unary Query Processing Operators CS 186, Spring 2006 Background for Homework 2.
Unary Query Processing Operators Not in the Textbook!
Implementation of Relational Operations R&G - Chapters 12 and 14.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Computer Science 101 Web Access to Databases SQL – Extended Form.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
©Silberschatz, Korth and Sudarshan4.1Database System Concepts Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries.
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
Chapter 3 Single-Table Queries
SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1.
CSCE Database Systems Chapter 15: Query Execution 1.
1 CS 430 Database Theory Winter 2005 Lecture 12: SQL DML - SELECT.
SQL (Chapter 2: Simple queries; Chapter 7 and 8: Nested and DML queries) Many of the examples in this document are based on the tables in the next slide.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
Instructor: Jinze Liu Fall Basic Components (2) Relational Database Web-Interface Done before mid-term Must-Have Components (2) Security: access.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Introduction We’ve covered the basic underlying storage, buffering, and indexing technology. – Now we can move on to query processing. Some database operations.
Copyright © Curt Hill Queries in SQL More options.
1 Querying a Single Table Structured Query Language (SQL) - Part II.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation I Slides from UC Berkeley.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Lectures 2&3: Introduction to SQL. Lecture 2: SQL Part I Lecture 2.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Thinking in Sets and SQL Query Logical Processing.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
SQL: Structured Query Language Instructor: Mohamed Eltabakh 1 Part II.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Simple Queries DBS301 – Week 1. Objectives Basic SELECT statement Computed columns Aliases Concatenation operator Use of DISTINCT to eliminate duplicates.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Retrieving Information Pertemuan 3 Matakuliah: T0413/Current Popular IT II Tahun: 2007.
Unary Query Processing Operators
Query-by-Example (QBE)
Indexed Nested Loop Join Iterator
SQL The Query Language R & G - Chapter 5
SQL: Advanced Options, Updates and Views Lecturer: Dr Pavle Mogin
Lecture#7: Fun with SQL (Part 2)
CS 405G: Introduction to Database Systems
Introduction to Database Systems
Implementation of Relational Operations (Part 2)
Relational Operations
Implementation of Relational Operations
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
SQL – Entire Select.
Chapter 4 Summary Query.
Access: SQL Participation Project
Chapters 15 and 16b: Query Optimization
SQL: Structured Query Language
The Relational Algebra
Lecture 13: Query Execution
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Implementation of Relational Operations
Presentation transcript:

Unary Query Processing Operators Not in the Textbook!

A “Slice” Thru Query Processing We’ll study single- table queries today –SQL details –Query Executor Architecture –Simple Query “Optimization” Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB SQL Query

Basic Single-Table Queries SELECT [DISTINCT] FROM [WHERE ] [GROUP BY [HAVING ] ] [ORDER BY ]

Basic Single-Table Queries SELECT [DISTINCT] FROM [WHERE ] [GROUP BY [HAVING ] ] [ORDER BY ] Simplest version is straightforward –Produce all tuples in the table that satisfy the predicate –Output the expressions in the SELECT list Expression can be a column reference, or an arithmetic expression over column refs

Basic Single-Table Queries SELECT S.name, S.gpa FROM Students S WHERE S.dept = ‘ CS ’ [GROUP BY [HAVING ] ] [ORDER BY ] Simplest version is straightforward –Produce all tuples in the table that satisfy the predicate –Output the expressions in the SELECT list Expression can be a column reference, or an arithmetic expression over column refs

SELECT DISTINCT SELECT DISTINCT S.name, S.gpa FROM Students S WHERE S.dept = ‘ CS ’ [GROUP BY [HAVING ] ] [ORDER BY ] DISTINCT flag specifies removal of duplicates before output

ORDER BY SELECT DISTINCT S.name, S.gpa, S.age*2 AS a2 FROM Students S WHERE S.dept = ‘ CS ’ [GROUP BY [HAVING ] ] ORDER BY S.gpa, S.name, a2; ORDER BY clause specifies that output should be sorted –Lexicographic ordering again! Obviously must refer to columns in the output –Note the AS clause for naming output columns!

ORDER BY SELECT DISTINCT S.name, S.gpa FROM Students S WHERE S.dept = ‘ CS ’ [GROUP BY [HAVING ] ] ORDER BY S.gpa DESC, S.name ASC; Ascending order by default, but can be overriden –DESC flag for descending, ASC for ascending –Can mix and match, lexicographically

Aggregates SELECT [DISTINCT] AVERAGE(S.gpa) FROM Students S WHERE S.dept = ‘ CS ’ [GROUP BY [HAVING ] ] [ORDER BY ] Before producing output, compute a summary (a.k.a. an aggregate) of some arithmetic expression Produces 1 row of output –with one column in this case Other aggregates: SUM, COUNT, MAX, MIN Note: can use DISTINCT inside the agg function –SELECT COUNT(DISTINCT S.name) FROM Students S –vs. SELECT DISTINCT COUNT (S.name) FROM Students S;

GROUP BY SELECT [DISTINCT] AVERAGE(S.gpa), S.dept FROM Students S [WHERE ] GROUP BY S.dept [HAVING ] [ORDER BY ] Partition the table into groups that have the same value on GROUP BY columns –Can group by a list of columns Produce an aggregate result per group –Cardinality of output = # of distinct group values Note: can put grouping columns in SELECT list –For aggregate queries, SELECT list can contain aggs and GROUP BY columns only! –What would it mean if we said SELECT S.name, AVERAGE(S.gpa) above??

HAVING SELECT [DISTINCT] AVERAGE(S.gpa), S.dept FROM Students S [WHERE ] GROUP BY S.dept HAVING COUNT(*) > 5 [ORDER BY ] The HAVING predicate is applied after grouping and aggregation –Hence can contain anything that could go in the SELECT list –I.e. aggs or GROUP BY columns HAVING can only be used in aggregate queries It’s an optional clause

Putting it all together SELECT S.dept, AVERAGE(S.gpa), COUNT(*) FROM Students S WHERE S.gender = “ F ” GROUP BY S.dept HAVING COUNT(*) > 5 ORDER BY S.dept;

Context We looked at SQL Now shift gears and look at Query Processing Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB SQL Query

Query Processing Overview The query optimizer translates SQL to a special internal “language” –Query Plans The query executor is an interpreter for query plans Think of query plans as “box-and-arrow” dataflow diagrams –Each box implements a relational operator –Edges represent a flow of tuples (columns as specified) –For single-table queries, these diagrams are straight-line graphs SELECT DISTINCT name, gpa FROM Students HeapScan Sort Distinct name, gpa Optimizer

iterator Iterators The relational operators are all subclasses of the class iterator : class iterator { void init(); tuple next(); void close(); iterator &inputs[]; // additional state goes here } Note: –Edges in the graph are specified by inputs (max 2, usually) –Encapsulation: any iterator can be input to any other! –When subclassing, different iterators will keep different kinds of state information

Example: Sort init() : –generate the sorted runs on disk –Allocate runs[] array and fill in with disk pointers. –Initialize numberOfRuns –Allocate nextRID array and initialize to NULLs next(): –nextRID array tells us where we’re “up to” in each run –find the next tuple to return based on nextRID array –advance the corresponding nextRID entry –return tuple (or EOF -- “End of Fun” -- if no tuples remain) close() : –deallocate the runs and nextRID arrays class Sort extends iterator { void init(); tuple next(); void close(); iterator &inputs[1]; int numberOfRuns; DiskBlock runs[]; RID nextRID[]; }

Postgres Version src/backend/executor/nodeSort.c –ExecInitSort (init) –ExecSort (next) –ExecEndSort (close) The encapsulation/inheritence stuff is hardwired into the Postgres C code –Postgres predates even C++! –See src/backend/execProcNode.c for the code that “dispatches the methods” explicitly!

GROUP BY: One Solution –The Sort iterator permutes its input so that all tuples are output in order of their grouping columns The Agg iterator maintains running info (“transition values”) on agg functions in the SELECT list, per group –E.g., for COUNT, it keeps count-so-far –For SUM, it keeps sum-so-far –For AVERAGE it keeps sum-so-far and count-so- far When the Aggregate iterator sees a tuple from a new group: –It produces an output for the old group based on the agg function E.g. for AVERAGE it returns ( sum-so-far/count-so- far ) –It resets its running info. –And updates the running info with the new tuple’s info Sort Agg

We Can Do Better! Can use a hash-based approach –Build a hashtable of groups storing pairs of the form –When we want to insert a new tuple into the hash table If we find a matching GroupVals, just update the TransVals appropriately Else insert a new pair What’s the benefit? –Q: How many pairs will we have to handle? –A: Number of distinct values of GroupVals columns Not the number of tuples!! –Also probably “narrower” than the tuples Can we play the same trick during sorting? What happens when hashtable runs out of memory –Wait for the discussion of Hash Joins, later. Note: This HashAgg idea was HW2 in previous years! HashAgg

Context We looked at SQL We looked at Query Execution –Query plans & Iterators –A specific example How do we map from SQL to query plans? Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB SQL Query

Query Optimization A deep subject, focuses on multi-table queries –We will only need a cookbook version for now. Build the dataflow bottom up: –Choose an Access Method (HeapScan or IndexScan) Non-trivial, we’ll learn about this later! –Next apply any WHERE clause filters –Next apply GROUP BY and aggregation Can choose between sorting and hashing! –Next apply any HAVING clause filters –Next Sort to help with ORDER BY and DISTINCT In absence of ORDER BY, can do DISTINCT via hashing! –Note: Where did SELECT clause go? Implicit!! Distinct HeapScan FilterHashAggFilterSort

Summary Single-table SQL, in detail Exposure to query processing architecture –Query optimizer translates SQL to a query plan –Query executor “interprets” the plan Query plans are graphs of iterators Hashing is a useful alternative to sorting –For many but not all purposes