1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Csci 2111: Data and File Structures Week2, Lecture 1 & 2
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
Buffer Manager Extra!. Further Reading (Papers I like) Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Advance Database System
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
CPSC 231 Sorting Large Files (D.H.)1 LEARNING OBJECTIVES Sorting of large files –merge sort –performance of merge sort –multi-step merge sort.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Lecture 11: DMBS Internals
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Storage and Indexes Introduction to Databases Computer Science 557 Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 3 Secondary Storage and System Software I
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Query Processing Exercise Session 1.
CS522 Advanced database Systems
Five-Minute Rule for trading memory for disc access-Jim Gray and G. F
Query Processing Part 1: Managing Disks 1.
Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 554: Advanced Database System Notes 02: Hardware
9/12/2018.
Lecture 11: DMBS Internals
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Disk Storage, Basic File Structures, and Buffer Management
CENG 351 Data Management and File Structures
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

1 Query Processing Exercise Session 1

2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program reads from and writes to the buffer How I/O is Done

3 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program reads from and writes to the buffer The system is in charge of removing blocks to make room for new ones How I/O is Done When a program wants to read, the system brings the blocks from disk if they are not already in the buffer When a program writes to the buffer, the system is responsible for transferring the data to the disk

4 Note that An application program is not aware that there are blocks and buffers It performs I/O operations directly on records, almost as if those records are always in main memory

5 Replacement Policies When the buffer is full, which block should be removed? –The one that will be needed again only a long time from now OS usually implements a policy of LRU (least recently used) What if all the blocks in the buffer are still needed by the programs running now?

6 Why LRU is not Good for DBMS An example: –The size of the buffer is n-1 blocks –We need to read several times a sequential file that has n blocks In this case, MRU (most recently used) is the best policy (for deciding which block to remove) –Same when reading nodes of a B+tree

7 How to Use a Buffer Efficiently Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...

8 Single-Buffer Solution (1) Read B1  Buffer (2) Process Data in Memory (3) Read B2  Buffer (4) Process Data in Memory...

9 SayP = time to process 1 block R = time to read 1 block from disk n = # blocks Single-buffer time = n(P+R) Total Time (not just I/O)

10 Double Buffering Buffer: Disk: ABCDGEF process For simplicity, we assume that the processing is done in the buffer (rather than in the program’s memory)

11 While the Program Processes Block A, the Systems Reads Block B Buffer: Disk: ABCDGEF B done process A

12 Now the Program Processes Block B While the System Reads Block C Buffer: Disk: ABCDGEF A C process B done

13 Once Again Buffer: Disk: ABCDGEFA B done process A C B done

14 Total Time Assuming P  R What is the total time? –Single buffering time = n(R+P) –Double buffering time = The CPU time hardly affects the total length of the computation It is correct to count just the I/O operations when analyzing running time P = Time to process 1 block R = Time to read 1 block n = # blocks

15 Questions Is double buffering useful also when writing to the disk? How do you activate double buffering? Suppose your program is a CPU cruncher, that is, P  R –Compute the total time for single and double buffering when P  R Does double buffering help?

16 Comments “Double buffering” is not limited to using just a buffer of two blocks –An application program processes k blocks in main memory while the system reads the next k blocks Read-ahead buffering –When an application wants to read one block, the system reads several more blocks sequentially in anticipation that the application will need them –This is just one example of double buffering

17 Best Case of Joining 2 Relations Relation R has B R blocks Relation S has B S block The size of the result is C blocks The best possible I/O cost is B R + B S + C How much memory is needed to achieve this cost?

18 ID is a unique key, so what is the cost of doing the selection ID=102? Name is not a unique key, there are 1,000 records with the name “levy”, and a block can store 50 records –Cost of the selection Name=“levy”? Depends on whether the records are clustered on Name, that is, all records with the same name are physically close to each other on the disk –If sorted then clustered Selection Records cannot be clustered on two different fields! (unless one is a unique key) Do the answers depend on the total number of blocks?

19 Zone Bit Recording All sectors have the same capacity (typically 512 bytes) All tracks used to have the same number of sectors, but not anymore –why? Sustained transfer rate OD (outer diameter) is higher This rate goes down as the heads move toward the center –Use a software tool to measure the sustained transfer rate of your disks

20 How It Used to Be Tracks are concentric circles, divided into sectors All sectors have the same number of bytes (typically 512) Gaps between sectors and between tracks

21 Zone Bit Recording

22

23 Physical Addresses are Just “Logical” The physical address of a block consists of –Device ID –Cylinder # –Surface # (i.e., track number) –Sector # Due to zone bit recording (and other reasons), the physical addresses do not reflect the true geometry of the disk Same number of sectors in every track

24 The Five-Minute Rule The Five-minute Rule for Trading Memory for Disc Accesses Jim Gray & Franco Putzolu, 1987 The Five Minute Rule, Ten Years Later Goetz Graefe & Jim Gray, 1997 The five-minute rule 20 years later (and how flash memory changes the rules) Goetz Graefe, 2009 (originally 2007)

25 IOPS IOPS = I/O Operations Per Second –Currently, IOPS is in the range 100 – 200 D = price of a disk I = # of IOPS A block has to be brought into memory every X second The (proportional) cost is D/(XI)

26 An Alternative Keep the block in memory all the time M = the cost of memory (RAM) for 1 block (varies with the size of the block) Break-even point is when equality holds, that is, M = D/(XI) and hence X = IM D

27 The New Rule Cost of 1 IOP is about $1 Cost of 1MB RAM is about $0.05 –The # of 4KB blocks in 1MB is 256 Hence, X is about 90 minutes –Used to be about 5 minutes in 1987 & 1997 Buy RAM for each block you need at least every 90 minutes

28 Not Only A Matter of Cost The poor IOPS performance of hard disks is a bottleneck of I/O-intensive systems The solution is solid-state drives (SSD) – nsane_ssd_performance/ nsane_ssd_performance/

29 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk

30 RAID Tutorial

31 On-Disk Cache P MC... cache

32 Summary of Optimizations Disk-Scheduling Algorithms –e.g., elevator algorithm Larger Blocks (8KB nowadays) and larger buffers –As the price of RAM drops, blocks and buffers get bigger Read-Ahead Buffering – this is useful if –The system knows in advance the blocks that will be needed shortly, or –The systems guesses correctly that the following N contiguous blocks are going to be needed RAID On-Disk Cache

33 A Bit More on Bytes What does burst rate mean? Gibibytes vs. Gigabytes –gibibytes = gigabinary bytes Memory is measured in gibibytes whereas the capacity of disks is given in gigabytes 1MB = K  K, 1GB = K  K  K K = 1024 for RAM but only 1000 for disks

34 Relational Operations on Bags What are the definitions of the five basic operators when they are applied under the bag semantics, that is, relations may have duplicates? When can we push selection and projection through join?

35 Pushing Selections and Projections Break each selection into several ones – Using the equivalence  C 1 ⋀C 2 (E) ≡  C 1 (  C 2 (E)) Repeatedly do the following: –Push selections through projections –Push selections into every operand of a natural join if possible (i.e., if the operand contains all the attributes of the selection) After each selection and join, do projection that leaves only attributes that are needed for later selections and joins, or for the final result Does it work also for bags?

36 The Duplicate-Elimination Operator  is the operation of duplicate elimination The result of  (R) is obtained from R by removing duplicates Through which operations can we push  ?