Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gergely Lukács Pázmány Péter Catholic University

Similar presentations


Presentation on theme: "Gergely Lukács Pázmány Péter Catholic University"— Presentation transcript:

1 Database Systems Data Access Structures (Index) Query planning and execution
Gergely Lukács Pázmány Péter Catholic University Faculty of Information Technology Budapest, Hungary

2

3 External/Conceptual/Internal Views
External View: A user is anyone who needs to access some portion of the data. Conceptual/Logical View: An abstract representation of the entire information content of the database. Internal/physical View: Describes how the data are stored/organised physically (indeces, acces paths, block size…)

4 Data access structures, Binary Search Tree B+ tree („Index“)
4

5 Binary tree each node can have up to two successor nodes
The successor nodes of a node are called its children The predecessor node of a node is called its parent The "beginning" node is called the root (has no parent) A node without children is called a leaf

6 Binary tree - 2 Nodes are organize in levels (indexed from 0).
Level (or depth) of a node: number of edges in the path from the root to that node. Height of a tree h: #levels = L (some books define h as #levels-1). Full tree: (every node has exactly two children) and all the leaves are on the same level.

7 Binary tree - 3 Max. #nodes at level l: 2l Height h of full tree H = log2(N+1) → (log2(N)) Tree operations (e.g., insert, delete, retrieve etc.) depend on h. h determines running time!

8 Binary Search Tree (BST) - 4
The value stored at a node is greater than the value stored at its left child and less than the value stored at its right child How to search a BST? We begin by examining the root node. If the tree is null, the key we are searching for does not exist in the tree. Otherwise, if the key equals that of the root, the search is successful and we return the node. If the key is less than that of the root, we search the left subtree. Similarly, if the key is greater than that of the root, we search the right subtree. This process is repeated until the key is found or the remaining subtree is null. If the searched key is not found before a null subtree is reached, then the item must not be present in the tree. Tree operations (e.g., insert, delete, retrieve etc.) depend on height of tree. h determines running time! 13 9 44 4 11 30 2 15 34 6

9 Disk storage Disk storage Data is organised in Blocks!
data is transferred between disk and main memory in blocks (e.g, 4/8/16/32 kB) No matter, whether 1 byte or the whole block is needed Significant cost factor in DBMS: # blocks accessed from disk Sequential access much faster than random access 3600 RPM 16.7 ms X instructions!!

10 Table storage, blocks

11 Binary search tree – vs. disk storage
Assume: X0(0) million values Deep tree Idea: more branches and thus reduce the height of the tree! Values at different levels Idea: values only in the leaf nodes

12 B+ tree as database index
B+ tree – nodes ~ blocks high fanout (ca. 100-?00) E.g., 4 KB block, one value (separator): 40 Bytes: ~100 values in a block! flat tree (log100N instead of log2: ca. 3-5 levels)

13 B+ tree Data (pointer) only in leaf nodes Leaf nodes chained
Pointer on the data (table)

14 B+ tree – select (search)
Searching for Key Value 6

15 B+ tree: inserting new element
Search bucket If the bucket is not full, add the record. Otherwise, split the bucket. Allocate new leaf and move half the bucket's elements to the new bucket. Insert the new leaf's smallest key and address into the parent. If the parent is full, split it too. Add the middle key to the parent node. Repeat until a parent is found that need not split. If the root splits, create a new root which has one key and two pointers. Propagating splits up, as far as necessary

16 Example Inserting new data with key 74 74 ... 25 45 65 75 10 ...
20 ... 30 ... 40 ... 50 ... 60 ... 70 ... 74 ... 80 ... 90 ...

17 Example Inserting new data with key 51 Overflow 51 ... 25 45 65 75
10 ... 20 ... 30 ... 40 ... 50 ... 60 ... 70 ... 74 ... 80 ... 90 ...

18 Example Inserting new data with key 51 Overflow 25 45 65 75 10 ...
New separator: 55 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... 60 ...

19 Example Inserting new data with key 51 55 25 45 65 75 10 ... 20 ...
30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... 60 ...

20 Example Deleting data with key 60: underflow 55 25 45 65 75 10 ...
20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... X underflow 60 ...

21 Example Deleting data with key 60: 55 25 45 65 75 10 ... 20 ... 30 ...
underflow X 25 45 65 75 X 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ...

22 Example Deleting data with key 60: 25 45 55 75 10 ... 20 ... 30 ...
40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ...

23 B+ tree as database index -- pointers in index to records in table

24 Creating and using indexes
CREATE INDEX Idx_Emp_LName ON Employees ("Last Name") Exact match query ... WHERE "Last Name" = 'Doe' Range query ... WHERE "Last Name" > 'DA' AND "Last Name" < 'DC' Selectivity of query (condition) Index only for highly selective queries

25 Selectivity in a quey Selectivity of query (condition)
Index useful only for highly selective queries Low selectivity: nearly all data blocks have to be read, even several times!

26 Multiple-Column Indexes
CREATE INDEX Idx_Emp_Name ON Employees ("Last Name", "First Name") Useful for ... WHERE "Last Name" = 'Doe' ... WHERE "Last Name" = 'Doe' AND "First Name" = 'John' ... WHERE "First Name" = 'John' AND "Last Name" = 'Doe' Can not be used ... WHERE "First Name" = 'John' 1 2

27 Index Query conditions Join conditions Sorted result
Space requirement ~ similar to table Maintenance (updates, inserts) ~ (>) table Bulk updates, inserts Deactivating, dropping index Reactivating, recreating

28 Cost based query optimization
28

29 Query execution SQL: very high level, declarative
What to retrieve, not how! SQL query is translated by the query processor into a low level program – the execution plan

30 Query optimisation One SQL query – many (!) different execution plans, execution alternatives Index – using, not using tendence: selective query – using Which index? Join order ? (Join execution?) Dramatically different costs! Query optimizer: choosing a relatively good execution plan

31 Query execution Query parser Transformation Query optimizer
SQL Query parser Internal query description Transformation Internal query description Query optimizer Query Execution Plan Query execution

32 Query execution (detailed)
Parsing Syntactic check (tables, attributes – Data Dictionary) View expansion Query parser Transformation Query optimizer Query execution

33 Query execution (detailed)
Query parser Standardized description (operator tree) Transformation Query optimizer Query execution

34 Query execution (detailed)
Query parser Optimisation (cost-based): Setting up execution plans Estimating their costs Selecting a cheap execution plan Transformation Query optimizer Query execution

35 Reminder: Key Idea: Algebraic Optimization
N = ((n*5)+(n*2)+0)/1 Algebraic laws: (+) identity: x+0=x (/) identity: x/1=x (*) distributes: (y*x)+(z*x)=(y+z)*x (*) commutes: y*x=x*y Rules 1, 3, 4,2: N=(5+2)*n using relational algebra

36 Query execution (detailed)
Query parser Transformation Query optimizer Execution of selected execution plan Query execution

37 Transformation, operator tree
B A1, A2,..., An select A1, A2,..., An from R1, R2,..., Rm where B  A1, A2,..., An (B (R1  R2  ...  Rm))

38 Basic idea Keep intermediate results as small as possible
Executing ,  early, ⋈, , ,… late, as  and  reduces the volume of data, ⋈, … result often in large intermediate results. Heuristics + cost estimation

39 Equivalences 1. Selection c1c2 ... cn (R)  c1(c2(…(cn(R)) …))
2.  ist commutative c1(c2((R))  c2(c1((R)) 3.  -cascades: If L1  L2  …  Ln, then L1(L2 (…(Ln(R)) …))  L1(R) 4. Changing  and  If the selection only refers to the projected attributes A1, …, An, selection and projection can be exchanged c(A1, …, An(R))  A1, …, An(c(R))

40 Equivalences 2 5. , ⋈,  and  are commutative
6. A Cartesic product, followed by a selection referring to both operands can be replaced by a join. c(R  S)  R ⋈c S ....

41 Example select Lname from Employee, WorksOn, Project where Pname = 'GOM‘ and Pnumber = PNO and ESSN = SSN and Bdate > (Select the lastname of an employee born after and working on the project „GOM”)

42 Selection as early, as possible
Lname Pname=‘GOM‘  Pnumber = PNO  ESSN= SSN  Bdate > EMPLOYEE WORKS_ON PROJECT Selection as early, as possible

43 Restrictive joins early
Pnumber = PNO EMPLOYEE WORKS_ON PROJECT Pname=‘GOM‘ ESSN=SSN Bdate> Lname Restrictive joins early

44 Cross product and selection => join
PROJECT WORKS_ON EMPLOYEE Lname Pnumber = PNO ESSN=SSN Pname=‘GOM‘ Bdate> Cross product and selection => join

45 PROJECT WORKS_ON EMPLOYEE Lname ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate> Projections as early, as possible (attributes for result and intermediate results kept)

46 Lname ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate>58.04.16
PROJECT WORKS_ON EMPLOYEE ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate> Pnumber ESSN,PNO SSN,Lname ESSN

47 Cost based selection Optimisation (cost-based):
Query parser Optimisation (cost-based): Setting up execution plans Estimating their costs Selecting a cheap execution plan Transformation Query optimizer Query execution

48 Statistics in databases
Optimizer needs statistics on the data to make decisions! E.g., Size of a table? Selectivity of a (join) condition?

49 Equivalences - … c1c2 ... cn (R)  c1(c2(…(cn(R)) …))
c1(c2((R))  c2(c1((R)) L1(L2 (…(Ln(R)) …))  L1(R) c(A1, …, An(R))  A1, …, An(c(R)) R  S  S  R (⋈, , , ) c(R  S)  R ⋈c S c(R  S)  c(R)  S (⋈, ) c(R  S)  c1(R)  c2(S) (⋈, ) L(R ⋈c S)  (A1, …, An(R)) ⋈c (B1, …, Bn(S)) L(R ⋈c S)  L((A1, …, An, A1', …, An'(R)) ⋈c (B1, …, Bn, B1', …, Bn'(S))) (R S) T  R (S T) (⋈, , , ) c(R  S) (c(R))  (c(S)) (,  , ) L(R  S) (L(R))  (L(S)) (,  , )

50 Join-order Left oriented join trees, greedy search... Many joins
Joins expensive R1 R1 R2 R1 R2 R3 R4 R2 R4 R3 R3 R4 Left oriented join trees, greedy search...

51 Example select FlugNr from (select F., FT., count (TicketNr) from FLUG F, FLUGZEUGTYP FT, BUCHUNG B where B.FlugNr = F.FlugNr group by F., FT., Datum) as DFT(F.,FT.,count) where F.FtypId = FT.FtypId and FT.First+FT.Business+FT.Economy < DFT.count

52 Example FLUG F FLUGZEUGTYP FT BUCHUNG B F.FlugNr F.FtypId = FT.FtypId  FT.First+FT.Business+FT.Economy < count F., FT., count F., FT., count (TicketNr) F., FT., Datum B.FlugNr = F.FlugNr F.FlugNr ( F.FtypId = FT.FtypId  FT.First+FT.Business+FT.Economy < count ( F.,FT.,count ( F., FT., count (TicketNr) ( F., FT., Datum ( B.FlugNr = F.FlugNr (FLUG F  FLUGZEUGTYP FT BUCHUNG B))))))

53 Statistics in databases
Oracle TABLES: num_rows, num_blocks avg_row_len TAB_COL_STATISTICS num_distinct num_nulls num_buckets INDEXES leaf_blocks blevel

54 Calculation of statistics
Task of database administrator (expensive task!) analyze table relation compute statistics for columns attribute,..., attribute size value : estimate statistics sample value percent

55 Oracle SQL Developer, Explain Plan


Download ppt "Gergely Lukács Pázmány Péter Catholic University"

Similar presentations


Ads by Google