Chapter 3 - 1 Secondary Storage Rough Speed Differentials –nanoseconds: retrieve data in main memory –microseconds: retrieve from disk cache or under a.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Transaction Program unit that accesses the database
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Hashing and Indexing John Ortiz.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Lecture 8: Data structures for databases II Jose M. Peña
CS CS4432: Database Systems II Basic indexing.
Chapter Abstraction Concrete: directly executable/storable Abstract: not directly executable/storable –automatic translation (as good as executable/storable)
Chapter 8 File organization and Indices.
FileOrg: 1 Secondary Storage Rough Speed Differentials –nanoseconds: retrieve data in main memory –microseconds: retrieve from under a read head –milliseconds:
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Efficient Storage and Retrieval of Data
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CostEst: 1 Cost Estimation Based on: –operations –file and record sizes –file structures –block layout on disk –caching scheme (in our simplified world,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
CostAnalysis: 1 Cost Analysis Rule-of-Thumb Guidelines As a guide, consider denormalizing if: –redundancy is minimal and update anomalies are not expected.
QueryRewriting: 1 Query Rewriting (for Query “Optimization”) Main Strategy: Make intermediate results small by applying selection and projection early.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Physical Data Organization and Indexing Lecture 14.
Database Management 9. course. Execution of queries.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Indexing.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
6340 DBMS Components. DBMS OS, application, middleware Components: storage, query optimizer, recovery manager, transaction processor, security.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS 540 Database Management Systems
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Chapter 5 Record Storage and Primary File Organizations
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
CS411 Database Systems Kazuhiro Minami 16: Final Review Session.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
CS422 Principles of Database Systems Indexes
CS 728 Advanced Database Systems Chapter 18
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
File organization and Indexing
Chapter 11: Indexing and Hashing
File Storage and Indexing
Indexing 4/11/2019.
Chapter 11: Indexing and Hashing
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Chapter Secondary Storage Rough Speed Differentials –nanoseconds: retrieve data in main memory –microseconds: retrieve from disk cache or under a read head –milliseconds: retrieve from elsewhere on disk Approximate Disk Speeds –seek (head move): 8 milliseconds –rotational latency (spin under): 4 milliseconds –block transfer: 68 microseconds (negligible) –total: – about 12 milliseconds Implications –cluster data on cylinders –make good use of caches

Chapter Sequential Files Operations –Add: write over a deleted record or after last record –Delete: mark deleted –Access: read until record found (half the file, on average) Sorted (doesn’t help much without an index) –Access: binary search (if contiguous) –Add: can be expensive to maintain sort order –Delete: mark deleted

Chapter Indexes Primary (Key) Index Secondary (Key) Index Secondary (Nonkey) Index Guest(GuestNr Name StreetNr City) Smith 12 Maple Boston 102 Carter 10 Main Hartford Jones 20 Main Boston Hansen 12 Oak Boston Black 15 Elm Hartford 764 Barnes 45 Oak Boston Dense & Sparse Indexes Block# Block#/Offset , , , Boston 1, 2,..., 25 Hartford 1,..., 25

Chapter Indexed Sequential File Guest(GuestNr Name StreetNr City) Smith 12 Maple Boston 102 Carter 10 Main Hartford Jones 20 Main Boston Hansen 12 Oak Boston Black 15 Elm Hartford 764 Barnes 45 Oak Boston Index Sorted on primary key 2. Sparse index 3. Overflow buckets 146 Green 10 Main Albany 120 Adams 15 Oak Boston Operations: Access (ok) Delete (ok) Insert (ok, if space in block)

Chapter Variable-Length Records GuestNr RoomNr ArrivalDate NrDays May May May May 5... Three Implementations: 1. Reserve enough space for maximum. 2. Chain each nested record. 3. Reserve space for the expected number and chain the rest.

Chapter Hashing Static Hashing –similar to in-memory hashing (block/offset addresses) –records in contiguous blocks (may be hard to find sufficient) Open Hashing –hash table of pointers to buckets –buckets: chained blocks of dense-index value-pointer pairs –operations: retrieve, add, delete 101 Smith 12 Maple Boston h(101)

Chapter Indexing Verses Hashing Store and retrieve on key – hashing wins (usually) Search on non-key – indexing wins (usually) Range search – indexing wins Search on multiple attributes – indexing wins (usually) However: for highly dynamic updates, indexed-sequential files degenerate quickly – need B + -tree indexes.

Chapter B + -Tree Index N-way, balanced tree whose non-root nodes are always at least half full. Leaf nodes chained and contain value-pointer pairs whose pointers point at blocks of records. Basic Idea

Chapter B + -Tree Index – Example

Chapter B + -Tree: Insertion & Deletion Insertion –place in leaf (if space) –split and promote to parent (do recursively, if necessary) Deletion –remove from leaf (if not too few) –take from neighbor (if not too few) –coalesce and take from parent (do recursively, if necessary)

Chapter Query “Optimization” Actually: Query Improvement Query Rewriting –Main Strategy: Make intermediate results small by applying selection and projection early. –Additional Strategies: Remove unnecessary operations, execute common subexpressions only once, … Cost Estimation & Lowest-Cost Selection –Estimate the execution time for two or more execution methods. –Select the most efficient.

Chapter Rewriting Rules 1 & 2 1.  X e = e Note: the scheme of e is X.  RoomNr, Name, NrBeds, Cost r = r 2.  X  f e =  X  f  XY e Note: f mentions the attrs. of Y.  RoomNr, Name  Cost > 75 r =  RoomNr, Name  Cost > 75  RoomNr, Name, Cost r

Chapter Rewriting Rules 3 & 4 3.  f (e 1  e 2 ) =  f1 (  f2 e 1   f3 e 2 ) Note: f = f1  f2  f3; each pertains to its respective expression.  ArrivalDate = 15 May  City = Boston (g  s) =  City = Boston g   ArrivalDate = 15 May s 4.  X (e 1  e 2 ) =  X (  X 1 Y e 1   X2Y e 2 ) Note: X1 = X  scheme(e 1 ), X2 = X  scheme(e 2 ), and Y = (scheme(e 1 )  scheme(e 2 )) - X.  RoomNr (r  g) =  RoomNr (  RoomNr, Name r   Name g)

Chapter Rewriting Rules 5 & 6 5.  X  Y e =  X e  Name  RoomNr, Name r =  Name r 6.  f1  f2 e =  f1  f2 e  NrBeds = 2  Cost = 80 r =  NrBeds = 2  Cost = 80 r

Chapter Rewriting Rules 7, 8 & 9 Natural join (  ) is (7) commutative, (8) associative, and (9) idempotent. (r  (g  s))  r = 8 r  g  s  r = 7 r  r  g  s = 9 r  g  s

Chapter Query Rewriting – Example  Name, StreetNr, City  ArrivalDate = 10 May (  NrBeds = 2 (r  g  s)) = 3,6,8  Name, StreetNr, City (  NrBeds = 2 r  g   ArrivalDate = 10 May s) = 4,8  Name, StreetNr, City (  RoomNr, Name  NrBeds = 2 r   GuestNr, Name, StreetNr, City g   GuestNr, RoomNr  ArrivalDate = 10 May s) = 1  Name, StreetNr, City (  RoomNr, Name  NrBeds = 2 r  g   GuestNr, RoomNr  ArrivalDate = 10 May s)

Chapter Cost Estimation Based on: –operations –file and record sizes –file structures –block layout on disk Estimate: –number of seeks –number of rotational delays –number of blocks transferred (ignore unless > 100 or so) –(ignore memory-access time and CPU time)

Chapter Cost Estimation – Selection Example  ArrivalDate = 15 May sAssume: 10,000 records, 50/block 1. sequential file, not contiguous, not sorted 10,000/50 = 200 block 12 ms per access = 2400 ms = 2.4 sec 2. sequential file, contiguous, not sorted With 132 blocks per track, 200 blocks fit on a cylinder; 1 access block transfers = 12 ms  68  s = ms = 25.6 ms 3. sequential file, contiguous, sorted (on ArrivalDate) Binary search (  log  = 8); so 1 seek + 8 rotational delays = 8 ms + 8  4 ms = 40 ms (worse than unsorted)

Chapter Cost Estimation – Selection Example  ArrivalDate = 15 May sAssume: 10,000 records, 50/block 4. sequential file, not contiguous, sorted (on ArrivalDate), indexed, with index in memory If considerably fewer than 50, all records are likely to be in one block; 1 access = 12 ms. 5. …, index not in memory Probably need 200 value-pointer pairs for sparse index. If pointer size is 4 bytes and date size is 8, 512/12 = 42.6, so say 40 pairs per block and thus 5 blocks (contiguous). Thus, 1 access for index + 1 access for record = 24 ms.

Chapter Cost Estimation – Selection Example  ArrivalDate = 15 May sAssume: 10,000 records, 50/block 6. sequential file, not contiguous, not sorted (on ArrivalDate), but with secondary index in memory on ArrivalDate If we estimate, with the help of someone who knows the application, that the records we need are in 20 different blocks, we have 20 accesses = 240 ms. 7. …, contiguous,... For 20 blocks, we have 1 seek + 20 rotational delays = 8 ms + 20  4 ms = 88 ms.

Chapter Cost Estimation – Join Example  City  ArrivalDate = 15 May (s  g) Assume: 50/block for s and 25/block for g. Assume: sequential files, not contiguous, g is sorted and has an in-memory index on GuestNr. 1. simple nested-loop join (index not used). For each block of s (200) access each block of g;  400 = 80,200 accesses = 962,400 ms = 16 min 2. for each block of s, use the index to access blocks of g At most 50 blocks of g for each block of s; likely less, say 20; then  20 = 4,200 accesses = 50,400 ms = 50.4 sec.

Chapter Cost Estimation – Join Example  City (  ArrivalDate = 15 May s  g) (equivalent but optimized) Assume: 50/block for s and 25/block for g. Assume: sequential files, not contiguous, g is sorted and has an in-memory index on GuestNr. 3. Get 15-May records from s and do simple nested-loop join (index not used). Access each s block once to get 15-May GuestNr’s and then access each g block once; = ms = 7,200 ms = 7.2 sec 4. Get 15-May records from s and use index to do join. Assume we access, say, 10% of the blocks of g using the index; = 240 accesses or 2,880 ms = 2.9 sec.

Chapter Transaction Program unit that accesses the database Executes atomically (all or nothing) and transforms the DB from one consistent state to another Solves problems: –system crashes –integrity constraint violations –concurrent updates

Chapter Transaction State Diagram Active Partially CommittedFailed CommittedAborted All statements executed Transaction abort or system crash Able to guarantee completion Unable to guarantee completion Necessary adjustments made (e.g., rollback)

Chapter Crash Recovery Logging with Immediate Updates Can update anytime after log records with old/new values have been written to disk.

Chapter Crash Recovery Example assume actual update done assume log record on disk but update not done assume log record not on disk Do “undo” for each value record on disk -- overwrites changed or unchanged values. 1. Assume the system crashes before the commit as follows: 2. Assume the system crashes after the commit (all log records on disk, but not all updates done). “Redo” each value record; overwrite changed and unchanged values.

Chapter Multiple Transactions Recovery –order serially –redo those committed in serial order –undo those not committed in reverse serial order Checkpoints –flush all records from buffer –do all updates for committed transactions –add to log –subsequently ignore committed transactions before the

Chapter Concurrent Updates T1T2 read # enrolled in CS1 add 1 write # enrolled add 1 write # enrolled If 10 are initially enrolled, the DB says 11 after these transactions execute, but should say 12.

Chapter Two-Phase Locking Protocol Strict serial execution (works, but allows no concurrency) 2PLP (equivalent to serial execution, but allows concurrency) –Phase 1: locking – must not unlock –Phase 2: unlocking – must not lock T1T2 LX(N) Read(N) Add 1 to N Write(N) UN(N) LX(N) Read(N) Add 1 to N Write(N) UN(N) Any attempt by T2 to lock or read N here fails. T3 LX(M) Read(M) Add 1 to M Write (M) UN(M) T3 can overlap either T1 or T2 or both

Chapter Simple Locking is Not Enough T1T2 LX(A) A := A+1 UN(A) LX(A) LX(B) A := A*B UN(A) UN(B) LX(A) LX(B) B := B-A UN(A) UN(B) Let A = B = 2 initially. Execute as written: A = 6 B = -4 Execute T1 then T2: A = -3 B = -1 Execute T2 then T1: A = 5 B = -3 Execution as written is not the same as either serial execution. Not 2P

Chapter Deadlock T1T2 LX(A) LX(B) Try LX(B)Try LX(A) Abort either T1 or T2.

Chapter Shared Locks Allow Additional Concurrency T1T2T3 LS(A) LX(B) LS(A) UN(B) LS(B) UN(A) LS(B) UN(A) UN(B) UN(A) UN(B) LS: read, but not write Serializable: T2 T1 T3