Download presentation
Presentation is loading. Please wait.
Published byDwight Thornton Modified over 8 years ago
1
1 Query Processing Exercise Session 1
2
2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program reads from and writes to the buffer How I/O is Done
3
3 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program reads from and writes to the buffer The system is in charge of removing blocks to make room for new ones How I/O is Done When a program wants to read, the system brings the blocks from disk if they are not already in the buffer When a program writes to the buffer, the system is responsible for transferring the data to the disk
4
4 Note that An application program is not aware that there are blocks and buffers It performs I/O operations directly on records, almost as if those records are always in main memory
5
5 Replacement Policies When the buffer is full, which block should be removed? –The one that will be needed again only a long time from now OS usually implements a policy of LRU (least recently used) What if all the blocks in the buffer are still needed by the programs running now?
6
6 Why LRU is not Good for DBMS An example: –The size of the buffer is n-1 blocks –We need to read several times a sequential file that has n blocks In this case, MRU (most recently used) is the best policy (for deciding which block to remove) –Same when reading nodes of a B+tree
7
7 How to Use a Buffer Efficiently Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...
8
8 Single-Buffer Solution (1) Read B1 Buffer (2) Process Data in Memory (3) Read B2 Buffer (4) Process Data in Memory...
9
9 SayP = time to process 1 block R = time to read 1 block from disk n = # blocks Single-buffer time = n(P+R) Total Time (not just I/O)
10
10 Double Buffering Buffer: Disk: ABCDGEF process For simplicity, we assume that the processing is done in the buffer (rather than in the program’s memory)
11
11 While the Program Processes Block A, the Systems Reads Block B Buffer: Disk: ABCDGEF B done process A
12
12 Now the Program Processes Block B While the System Reads Block C Buffer: Disk: ABCDGEF A C process B done
13
13 Once Again Buffer: Disk: ABCDGEFA B done process A C B done
14
14 Total Time Assuming P R What is the total time? –Single buffering time = n(R+P) –Double buffering time = The CPU time hardly affects the total length of the computation It is correct to count just the I/O operations when analyzing running time P = Time to process 1 block R = Time to read 1 block n = # blocks
15
15 Questions Is double buffering useful also when writing to the disk? How do you activate double buffering? Suppose your program is a CPU cruncher, that is, P R –Compute the total time for single and double buffering when P R Does double buffering help?
16
16 Comments “Double buffering” is not limited to using just a buffer of two blocks –An application program processes k blocks in main memory while the system reads the next k blocks Read-ahead buffering –When an application wants to read one block, the system reads several more blocks sequentially in anticipation that the application will need them –This is just one example of double buffering
17
17 Best Case of Joining 2 Relations Relation R has B R blocks Relation S has B S block The size of the result is C blocks The best possible I/O cost is B R + B S + C How much memory is needed to achieve this cost?
18
18 ID is a unique key, so what is the cost of doing the selection ID=102? Name is not a unique key, there are 1,000 records with the name “levy”, and a block can store 50 records –Cost of the selection Name=“levy”? Depends on whether the records are clustered on Name, that is, all records with the same name are physically close to each other on the disk –If sorted then clustered Selection Records cannot be clustered on two different fields! (unless one is a unique key) Do the answers depend on the total number of blocks?
19
19 Zone Bit Recording All sectors have the same capacity (typically 512 bytes) All tracks used to have the same number of sectors, but not anymore –why? Sustained transfer rate OD (outer diameter) is higher This rate goes down as the heads move toward the center –Use a software tool to measure the sustained transfer rate of your disks
20
20 How It Used to Be Tracks are concentric circles, divided into sectors All sectors have the same number of bytes (typically 512) Gaps between sectors and between tracks
21
21 Zone Bit Recording
22
22
23
23 Physical Addresses are Just “Logical” The physical address of a block consists of –Device ID –Cylinder # –Surface # (i.e., track number) –Sector # Due to zone bit recording (and other reasons), the physical addresses do not reflect the true geometry of the disk Same number of sectors in every track
24
24 The Five-Minute Rule The Five-minute Rule for Trading Memory for Disc Accesses Jim Gray & Franco Putzolu, 1987 The Five Minute Rule, Ten Years Later Goetz Graefe & Jim Gray, 1997 The five-minute rule 20 years later (and how flash memory changes the rules) Goetz Graefe, 2009 (originally 2007)
25
25 IOPS IOPS = I/O Operations Per Second –Currently, IOPS is in the range 100 – 200 D = price of a disk I = # of IOPS A block has to be brought into memory every X second The (proportional) cost is D/(XI)
26
26 An Alternative Keep the block in memory all the time M = the cost of memory (RAM) for 1 block (varies with the size of the block) Break-even point is when equality holds, that is, M = D/(XI) and hence X = IM D
27
27 The New Rule Cost of 1 IOP is about $1 Cost of 1MB RAM is about $0.05 –The # of 4KB blocks in 1MB is 256 Hence, X is about 90 minutes –Used to be about 5 minutes in 1987 & 1997 Buy RAM for each block you need at least every 90 minutes
28
28 Not Only A Matter of Cost The poor IOPS performance of hard disks is a bottleneck of I/O-intensive systems The solution is solid-state drives (SSD) –http://www.theregister.co.uk/2009/09/23/i nsane_ssd_performance/http://www.theregister.co.uk/2009/09/23/i nsane_ssd_performance/
29
29 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk
30
30 RAID Tutorial http://www.acnc.com/04_01_00.html
31
31 On-Disk Cache P MC... cache
32
32 Summary of Optimizations Disk-Scheduling Algorithms –e.g., elevator algorithm Larger Blocks (8KB nowadays) and larger buffers –As the price of RAM drops, blocks and buffers get bigger Read-Ahead Buffering – this is useful if –The system knows in advance the blocks that will be needed shortly, or –The systems guesses correctly that the following N contiguous blocks are going to be needed RAID On-Disk Cache
33
33 A Bit More on Bytes What does burst rate mean? Gibibytes vs. Gigabytes –gibibytes = gigabinary bytes Memory is measured in gibibytes whereas the capacity of disks is given in gigabytes 1MB = K K, 1GB = K K K K = 1024 for RAM but only 1000 for disks
34
34 Relational Operations on Bags What are the definitions of the five basic operators when they are applied under the bag semantics, that is, relations may have duplicates? When can we push selection and projection through join?
35
35 Pushing Selections and Projections Break each selection into several ones – Using the equivalence C 1 ⋀C 2 (E) ≡ C 1 ( C 2 (E)) Repeatedly do the following: –Push selections through projections –Push selections into every operand of a natural join if possible (i.e., if the operand contains all the attributes of the selection) After each selection and join, do projection that leaves only attributes that are needed for later selections and joins, or for the final result Does it work also for bags?
36
36 The Duplicate-Elimination Operator is the operation of duplicate elimination The result of (R) is obtained from R by removing duplicates Through which operations can we push ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.