Download presentation
Presentation is loading. Please wait.
1
February 12th – RDBMS Internals
Cse 344 February 12th – RDBMS Internals
2
Administrivia HW5 out tonight OQ5 out Wednesday
Both due February 21 (11:30 & 11:00) Exam grades on canvas by Wednesday Handed back in section on Thursday
3
Today Back to RDBMS ”Query plans” and DBMS planning
Management between SQL and execution Optimization techniques Indexing and data arrangement
4
Query Evaluation Steps
SQL query Parse & Rewrite Query Logical plan (RA) Select Logical Plan Query optimization Select Physical Plan Physical plan Analogy with sorting: we have decided the ordering of operations, and now we need to choose their implementation. For instance, there are many ways to implement sort. Query Execution Disk
5
Logical vs Physical Plans
Logical plans: Created by the parser from the input SQL text Expressed as a relational algebra tree Each SQL query has many possible logical plans Physical plans: Goal is to choose an efficient implementation for each operator in the RA tree Each logical plan has many possible physical plans
6
Review: Relational Algebra
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = and x.scity = ‘Seattle’ and x.sstate = ‘WA’ πsname σscity=‘Seattle’ and sstate=‘WA’ and pno=2 sid = sid Relational algebra expression is also called the “logical query plan” Supplier Supply
7
Physical Query Plan 1 (On the fly) πsname (On the fly)
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 1 (On the fly) πsname A physical query plan is a logical query plan annotated with physical implementation details (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Nested loop) sid = sid Describe the “nested loop join”: we have not discussed physical operators in class yet, but it should be clear what a nested loop means. Supplier Supply (File scan) (File scan)
8
Physical Query Plan 2 (On the fly) πsname (On the fly)
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 2 (On the fly) πsname Same logical query plan Different physical plan (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Hash join) sid = sid Briefly mention that a hash join is a different implementation. Normally, most students in class should have no problem imagining what it might be, but no need to describe it in detail, we will cover it in two lectures. Supplier Supply (File scan) (File scan)
9
Physical Query Plan 3 (On the fly) πsname (d) (Sort-merge join) (c)
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 3 Different but equivalent logical query plan; different physical plan (On the fly) πsname (d) SELECT sname FROM Supplier x, Supply y WHERE x.sid = y.sid and y.pno = and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Sort-merge join) (c) sid = sid (Scan & write to T1) (a) σscity=‘Seattle’ and sstate=‘WA’ Same here, briefly mention that sort-merge is yet another way to implement join. (b) σpno=2 (Scan & write to T2) Supplier Supply (File scan) (File scan) CSE au
10
Query Optimization Problem
For each SQL query… many logical plans For each logical plan… many physical plans Next: we will discuss physical operators; how exactly are query executed?
11
Physical Operators Each of the logical operators may have one or more implementations = physical operators Will discuss several basic physical operators, with a focus on join
12
Main Memory Algorithms
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory:
13
Main Memory Algorithms
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory: Nested Loop Join O(??) Merge join O(??) Hash join O(??) CSE au
14
Main Memory Algorithms
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sid Supply Propose three physical operators for the join, assuming the tables are in main memory: Nested Loop Join O(n2) Merge join O(n log n) Hash join O(n) … O(n2) CSE au
15
BRIEF Review of Hash Tables
Separate chaining: A (naïve) hash function: 1 2 3 4 5 6 7 8 9 Duplicates OK WHY ?? h(x) = x mod 10 503 103 503 Operations: 76 666 find(103) = ?? insert(488) = ?? 48
16
BRIEF Review of Hash Tables
insert(k, v) = inserts a key k with value v Many values for one key Hence, duplicate k’s are OK find(k) = returns the list of all values v associated to the key k CSE au
17
Iterator Interface Each operator implements three methods: open() next() close() CSE au
18
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Iterator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); return in; void close () { child.close(); }
19
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); return in; void close () { child.close(); }
20
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () {
21
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in);
22
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in); return r;
23
Iterator Interface Example “on the fly” selection operator
interface Operator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in); return r; void close () { child.close(); }
24
Iterator Interface interface Operator {
// initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } Operator q = parse(“SELECT ...”); q = optimize(q); q.open(); while (true) { Tuple t = q.next(); if (t == null) break; else printOnScreen(t); } q.close(); Query plan execution
25
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)
26
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)
27
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)
28
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). Suppliers Supplies (File scan) (File scan)
29
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). open() Suppliers Supplies (File scan) (File scan)
30
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 open() (Nested loop) sno = sno Say: Recall: All operator implement the same iterator interface: open, next, close (SimpleDB also has hasNext and rewind). open() open() Suppliers Supplies (File scan) (File scan)
31
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)
32
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)
33
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)
34
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() Suppliers Supplies (File scan) (File scan)
35
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() next() Suppliers Supplies (File scan) (File scan)
36
Discuss: open/next/close for nested loop join
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/close for nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 next() (Nested loop) sno = sno Slide assumes first call to next on the right hand side does not result in a join. next() next() next() Suppliers Supplies (File scan) (File scan)
37
Discuss hash-join in class
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss hash-join in class (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Hash Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)
38
Pipelining Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity) Pipelining Discuss hash-join in class (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Hash Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Tuples from here are pipelined Suppliers Supplies (File scan) (File scan)
39
Pipelining Supplier(sid, sname, scity, sstate)
Supply(sid, pno, quantity) Pipelining Discuss hash-join in class (On the fly) πsname (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 Tuples from here are “blocked” (Hash Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Tuples from here are pipelined Suppliers Supplies (File scan) (File scan)
40
Discuss merge-join in class
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname Discuss merge-join in class (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Merge Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Suppliers Supplies (File scan) (File scan)
41
Discuss merge-join in class
Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname Discuss merge-join in class (On the fly) σscity=‘Seattle’ and sstate=‘WA’ and pno=2 (Merge Join) sno = sno Slide assumes first call to next on the right hand side does not result in a join. Blocked Blocked Suppliers Supplies (File scan) (File scan)
42
Pipelined Execution Tuples generated by an operator are immediately sent to the parent Benefits: No operator synchronization issues No need to buffer tuples between operators Saves cost of writing intermediate data to disk Saves cost of reading intermediate data from disk This approach is used whenever possible Mention that some engines process tuples in batches rather than one at a time. Others process one at a time.
43
Query Execution Bottom Line
SQL query transformed into physical plan Access path selection for each relation Scan the relation or use an index (next lecture) Implementation choice for each operator Nested loop join, hash join, etc. Scheduling decisions for operators Pipelined execution or intermediate materialization Pipelined execution of physical plan
44
Recall: Physical Data Independence
Applications are insulated from changes in physical storage details SQL and relational algebra facilitate physical data independence Both languages input and output relations Can choose different implementations for operators
45
Query Performance My database application is too slow… why?
One of the queries is very slow… why? To understand performance, we need to understand: How is data organized on disk How to estimate query costs In this course we will focus on disk-based DBMSs
46
Student Data Storage ID fName lName 10 Tom Hanks 20 Amy … DBMSs store data in files Most common organization is row-wise storage On disk, a file is split into blocks Each block contains a set of tuples In the example, we have 4 blocks with 2 tuples each 10 Tom Hanks 20 Amy block 1 50 … 200 block 2 DBMS decides how to map relations to OS files. DBMSs can also store data directly on disk, bypassing the OS. We won’t get into those details here. 220 240 block 3 420 800
47
Data File Types The data file can be one of: Heap file Unsorted
Student Data File Types ID fName lName 10 Tom Hanks 20 Amy … The data file can be one of: Heap file Unsorted Sequential file Sorted according to some attribute(s) called key
48
Data File Types The data file can be one of: Heap file Unsorted
Student Data File Types ID fName lName 10 Tom Hanks 20 Amy … The data file can be one of: Heap file Unsorted Sequential file Sorted according to some attribute(s) called key Note: key here means something different from primary key: it just means that we order the file according to that attribute. In our example we ordered by ID. Might as well order by fName, if that seems a better idea for the applications running on our database. CSE au
49
Index An additional file, that allows fast access to records in the data file given a search key
50
Index An additional file, that allows fast access to records in the data file given a search key The index contains (key, value) pairs: The key = an attribute value (e.g., student ID or name) The value = a pointer to the record
51
Index Key = means here search key
An additional file, that allows fast access to records in the data file given a search key The index contains (key, value) pairs: The key = an attribute value (e.g., student ID or name) The value = a pointer to the record Could have many indexes for one table Key = means here search key
52
Keys in indexing Different keys: Primary key – uniquely identifies a tuple Key of the sequential file – how the data file is sorted, if at all Index key – how the index is organized
53
Example 1: Index on ID Data File Student
fName lName 10 Tom Hanks 20 Amy … Data File Student Index Student_ID on Student.ID 10 Tom Hanks 20 Amy 10 20 50 200 220 240 420 800 50 … 200 220 240 Note: this doesn’t speed up a sequential scan if the file data is organized by ID, but it allows fast access to an ID without having to scan the full table Student_ID is the name of the index on the attribute Student.ID 420 800 950 …
54
Example 2: Index on fName
Student Example 2: Index on fName ID fName lName 10 Tom Hanks 20 Amy … Index Student_fName on Student.fName Data File Student 10 Tom Hanks 20 Amy Amy Ann Bob Cho … 50 … 200 220 240 420 800 … Tom
55
Index Organization We need a way to represent indexes after loading into memory so that they can be used Several ways to do this: Hash table B+ trees – most popular They are search trees, but they are not binary instead have higher fanout Will discuss them briefly next Specialized indexes: bit maps, R-trees, inverted index
56
(preferably in memory)
Student Hash table example ID fName lName 10 Tom Hanks 20 Amy … Data File Student Index Student_ID on Student.ID 10 Tom Hanks 20 Amy 10 20 50 200 220 240 420 800 … 50 … 200 220 240 After this, explain Binary trees (they should know this from 311, but just as a refresher) 420 800 Index File (preferably in memory) Data file (on disk)
57
B+ Tree Index by Example
Find the key 40 80 40 <= 80 20 60 100 120 140 20 < 40 <= 60 Good for scanning ranges of tuples 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 <= 40 10 15 18 20 30 40 50 60 65 80 85 90
58
Clustered vs Unclustered
B+ Tree B+ Tree Index entries Index entries (Index File) (Data file) Data Records Data Records CLUSTERED UNCLUSTERED Every table can have only one clustered and many unclustered indexes Why? 12
59
Index Classification Clustered/unclustered
Clustered = records close in index are close in data Option 1: Data inside data file is sorted on disk Option 2: Store data directly inside the index (no separate files) Unclustered = records close in index may be far in data
60
Index Classification Clustered/unclustered Primary/secondary
Clustered = records close in index are close in data Option 1: Data inside data file is sorted on disk Option 2: Store data directly inside the index (no separate files) Unclustered = records close in index may be far in data Primary/secondary Meaning 1: Primary = is over attributes that include the primary key Secondary = otherwise Meaning 2: means the same as clustered/unclustered
61
Index Classification Clustered/unclustered Primary/secondary
Clustered = records close in index are close in data Option 1: Data inside data file is sorted on disk Option 2: Store data directly inside the index (no separate files) Unclustered = records close in index may be far in data Primary/secondary Meaning 1: Primary = is over attributes that include the primary key Secondary = otherwise Meaning 2: means the same as clustered/unclustered Organization B+ tree or Hash table
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.