Query Processing Exercise Session 1
How I/O is Done An application program reads from and writes to its private memory Disk B1 B2 B3 Bn … Program’s private memory The system is in charge of removing blocks to make room for new ones The system (OS or DBMS) transfers data along the green arrows RED is RAM
When a program wants to read, the system brings the blocks from the disk if they are not already in the buffer When a program writes, the system is responsible for transferring the blocks from the buffer to the disk How I/O is Done An application program reads from and writes to its private memory Disk B1 B2 B3 Bn … Program’s private memory The system is in charge of removing blocks to make room for new ones The system (OS or DBMS) transfers data along the green arrows RED is RAM
Application Programs Only Deal with Records An application program works only with records and is not aware that there are blocks and buffers When a program reads a record, the system brings the relevant block from the disk to the buffer and then copies the record to the program’s private memory Similarly (but in the opposite direction) for writing
Replacement Policies When the buffer is full, which block should be removed? The one that will be needed again only a long time from now OS usually implements a policy of LRU (least recently used) What if all the blocks in the buffer are still needed by the programs running now? The answer to the question in the last item will be given during the last week of the semester.
Why LRU is not Good for DBMS An example: The size of the buffer is n-1 blocks We need to read several times a sequential file that has n blocks In this case, MRU (most recently used) is the best policy (for deciding which block to remove) Same when reading nodes of a B+tree
How to Use a Buffer Efficiently Problem: Have a File Sequence of Blocks B1, B2 Have a Program Process B1 Process B2 Process B3 ...
Single-Buffer Solution (1) Read B1 Buffer (2) Process Data in Memory (3) Read B2 Buffer (4) Process Data in Memory ...
Total Time (not just I/O) Say P = time to process 1 block R = time to read 1 block from disk n = # blocks Single-buffer time = n(P+R)
Double Buffering A A B C D G E F Buffer: Disk: For simplicity, we assume that the processing is done in the buffer (rather than in the program’s memory)
While the Program Processes Block A, the Systems Reads Block B Buffer: Disk: B done A A B C D G E F
Now the Program Processes Block B While the System Reads Block C Buffer: Disk: C A B A B C D G E F done
Once Again B A C B A A B C D G E F Buffer: Disk: process process done
Total Time Assuming P R P = Time to process 1 block R = Time to read 1 block n = # blocks What is the total time? Single buffering time = n(R+P) Double buffering time = The CPU time hardly affects the total length of the computation It is correct to count just the I/O operations when analyzing running time The answer is nR+P
The Actual Difference The actual difference between single and double buffering is much worse than n(R+P) – (nR+P), why? These are actually two different R’s Because double buffering enables reading the file sequentially, whereas single buffering is even worse than random reading since the latency is almost a full revolution
Questions Is double buffering useful also when writing to the disk? How do you activate double buffering? Suppose your program is a CPU cruncher, that is, P R Compute the total time for single and double buffering when P R Does double buffering help? Double buffering is useful also when writing to the disk. However, it is a different story if you have to read each block, modify it (based on some computation) and then write it back, before reading the next block. In many cases, double buffering is activated automatically by the DBMS. To be sure, check the details of the system you are using.
Using a Buffer of 2k Blocks “Double buffering” is not limited to using just a buffer of two blocks An application program processes k blocks in main memory while the system reads the next k blocks Is it better to use k blocks, rather than 2? In an ideal situation it is not better But practically, double buffering does not work perfectly, because other programs occasionally “steal” the controller After a steal, there are nonzero seek time and latency, and it is better if their cost is spread over k blocks
Read-Ahead Buffering When an application asks for one block, the system reads several more blocks sequentially in anticipation that the application will need them This is just one example of using double buffering
Best Case of Joining 2 Relations It is meaningless to specify the I/O cost without saying how much memory is needed to realize that cost Best Case of Joining 2 Relations Relation R has BR blocks Relation S has BS block The size of the result is C blocks The best possible I/O cost is BR + BS + C How much memory is needed to achieve this cost?
Selection Using an Index An index is a data structure that gives the addresses of records with a given value in some field(s) For now, we only consider the I/O cost of accessing the file itself and ignore the cost of using the index ID is a unique key, so what is the cost of doing the selection ID=102 using an index? The I/O cost is 1 (because only one record satisfies the condition of the selection)
Selection Using an Index (cont.) Name is not a unique key, there are 1,000 records with the name “levy”, and a block can store 50 records What is the cost of the selection Name=“levy”? Depends on whether the file is clustered on Name, that is, whether all the records with the same name are physically close to each other on the disk If sorted on Name, then clustered Note that a file cannot be clustered on two different fields! (unless one is a unique key)
So, What is the Answer? If the file is clustered on Name, then there are at least 20 and at most 21 blocks holding records with Name=“levy” When does the best case (i.e., 20) happens? So, the I/O cost (in the worst case) is 21 If the file is not clustered, then (in the worst case) the I/O cost is 1,000 Does the cost depend on the file size? Yes, if the file has only 700 blocks, then it is better to read all the blocks of the file one by one than to use the index. That is, we do not use an index when just scanning the file is better.
Zone Bit Recording All sectors have the same capacity (typically 512 bytes) All tracks used to have the same number of sectors, but not anymore why? Sustained transfer rate OD (outer diameter) is higher This rate goes down as the heads move toward the center Use a software tool to measure the sustained transfer rate of your disks
How It Used to Be Tracks are concentric circles, divided into sectors Gaps between sectors and between tracks All sectors have the same number of bytes (typically 512)
Zone Bit Recording
Physical Addresses are Just “Logical” The physical address of a block consists of Device ID Cylinder # Surface # (i.e., track number) Sector # Due to zone bit recording (and other reasons), the physical addresses do not reflect the true geometry of the disk Same number of sectors in every track
The Five-Minute Rule The Five-minute Rule for Trading Memory for Disc Accesses Jim Gray & Franco Putzolu, 1987 The Five Minute Rule, Ten Years Later Goetz Graefe & Jim Gray, 1997 The five-minute rule 20 years later (and how flash memory changes the rules) Goetz Graefe, 2009 (originally 2007)
IOPS IOPS = I/O Operations Per Second D = price of a disk Currently, IOPS is in the range 100 – 200 D = price of a disk I = # of IOPS A block has to be brought into memory every X seconds The (proportional) cost is D/(XI)
An Alternative D X = IM Keep the block in memory all the time M = the cost of memory (RAM) for 1 block (varies with the size of the block) Break-even point is when equality holds, that is, M = D/(XI) and hence X = IM D
The New Rule Cost of 1 IOP is about $1 Cost of 1MB RAM is about $0.05 The # of 4KB blocks in 1MB is 256 Hence, X is about 90 minutes Used to be about 5 minutes in 1987 & 1997 Buy RAM for each block you need at least every 90 minutes
Not Only A Matter of Cost The poor IOPS performance of hard disks is a bottleneck of I/O-intensive systems The solution is solid-state drives (SSD) http://www.theregister.co.uk/2009/09/23/insane_ssd_performance/
Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk
RAID Tutorial http://www.acnc.com/04_01_00.html
On-Disk Cache P ... ... M C cache cache
Summary of Optimizations Disk-Scheduling Algorithms e.g., elevator algorithm Larger Blocks (8KB nowadays) and larger buffers As the price of RAM drops, blocks and buffers get bigger Read-Ahead Buffering – this is useful if The system knows in advance the blocks that will be needed shortly, or The systems guesses correctly that the following N contiguous blocks are going to be needed RAID On-Disk Cache
A Bit More on Bytes What does burst rate mean? Gibibytes vs. Gigabytes gibibytes = gigabinary bytes Memory is measured in gibibytes whereas the capacity of disks is given in gigabytes 1MB = K K, 1GB = K K K K = 1024 for RAM but only 1000 for disks
Relational Operations on Bags What are the definitions of the five basic operators when they are applied under the bag semantics, that is, relations may have duplicates? When can we push selection and projection through join?
Pushing Selections and Projections Does it work also for bags? Repeatedly split each selection with ⋀ using the equivalence C1⋀C2(E) ≡ C1(C2(E)) Repeatedly do the following: Push selections through projections Push selections into every operand of a natural join if possible (i.e., if the operand contains all the attributes of the selection) After each selection and each join, do projection that leaves only attributes that are needed either for later selections and joins, or for the final result
The Duplicate-Elimination Operator is the operation of duplicate elimination The result of (R) is obtained from R by removing duplicates Through which operations can we push ?