Query Processing Part 1: Managing Disks 1.

Query Processing Part 1: Managing Disks 1

Main Topics on Query Processing
Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms for the relational operators Optimizing the evaluation of a whole query The characteristics of disks are different from those of main memory

Not to be confused with the newer SSD (Solid-State Drive)
Disks (HDDs) HDD – Hard Disk Drive Not to be confused with the newer SSD (Solid-State Drive) 3

P Typical Computer M C ... ... Disks (and other secondary
storage devices) The processor (CPU), main memory (RAM) and controllers are connected by a bus

Processor Speed: 100   MIPS (MIPS = Million Instructions per Second) Memory Access time:  sec. 1 s  1 ns

“Typical Disk” … Terms: Platter, Head, Actuator Cylinder, Track
Sector (physical), Block (logical), Gap

Top (& Bottom) View of a Disk Platter
Tracks are concentric circles, divided into sectors Gaps between sectors and between tracks All sectors have the same number of bytes (typically 512)

More Details Both surfaces of each platter are used
There is a head for each surface All tracks with the same radius form a cylinder The heads move together and are always over the same cylinder A block consists of N contiguous sectors N is determined when the OS formats the disk The DBMS may choose a different value for N

Memory vs. Disk Memory is fast and Disk is slow and
not exactly true … Memory is fast and time to read (or write) a byte is fixed can read (or write) just what is needed Disk is slow and must read (or write) at least one block time to read (or write) a block varies Memory is volatile whereas a disk keeps the data even without electricity

Disk Access Time user needs block X block x in memory ?

Time = Seek Time + Rotational Delay + Transfer Time + Other …
Seek Time – the time it takes to move the heads to the cylinder where the block is Rotational Delay – the time it takes until the beginning of the block arrives under the head Transfer Time – the time it takes to actually read the block

… Seek Time 3 or 5x Time x 1 N Cylinders Traveled

Average Seek Time Can be measured empirically
Alternatively, it can be proved that the average distance from one random cylinder to another is 1/3 of the maximal distance (i.e., from innermost to outermost track) Hence, the average seek time is about 1/3 of the maximum Typical average seek time is about 10 msec For the fastest disks it is about 3 msec

Rotational Delay (Latency)
Head Here Needed Block The average latency is ½ of the time of one revolution It is 4.17 msec (7200 rpm) Only 2 msec for the fastest disks (15,000rpm)

Transfer Time The transfer time can be computed from the sustained transfer rate, which is measured in MB/sec A transfer time of 0.1 msec for a 4KB block amounts to a rate of 40 MB/sec This is a conservative estimate with respect to recent models of disks

Other Delays CPU time to issue I/O Contention for controller
Contention for bus, memory We ignore these delays

Time to Read The time to read a block of 4KB is avgSeek + avgLatency + transferTime = = = msec If we read 11 sequential blocks (on the same tack), then Seek & latency are needed just for the first block So, the time is = msec 14.27 15.27

Summary Random I/O is expensive Sequential I/O much less
Average per 4KB block is ~15 msec Sequential I/O much less Average per 4KB block is ~1.5 msec (when reading 11 sequential blocks) However, even sequential I/O is slower than memory by at least a factor of 100

Writing and Updating Cost of writing is similar to reading
Unless we want to verify If so, add 1 revolution + transfer time To update a block, we must read it into memory, modify it, and then write it back to the disk

Typical DB Application
while blocks to read do read next block from disk process the block write some result to disk end The CPU can execute tens-of-thousands (if not millions) of instructions while the controller reads or writes a single block

Running-Time Analysis: I/O Cost
We only count the number of blocks that are read from or written to the disk The CPU time is negligible in comparison Furthermore, the controller can read to and write from the disk while the CPU is processing other blocks So, the CPU time that can actually influence an exact analysis is even more negligible The goal is to minimize the number of blocks that we read and write

We Count Blocks, But What is the cost (in time) of each block?
Cannot tell whether a block was read randomly or sequentially (with other blocks) We should organize data on disks and write programs so that the I/O will be sequential as much as possible The DBMS helps a lot in this task! It is also capable of minimizing the number of accessed blocks when processing queries And it tries to keep the controller busy while the CPU processes blocks that are already in memory

Best-Case Analysis Read B1 blocks from the disk
Compute the result and write it back to the disk Suppose that the size of the result is B2 What is the best possible I/O cost? What is needed to achieve the best I/O cost? The best possible I/O cost is B1 + B2. It can be realized if the main memory is large enough to hold simultaneously all the input, all the output and all the intermediate results. (This is an upper bound on the main-memory size needed to realize the best I/O cost. Sometimes less than that is enough.)

Summary The running time of an algorithm is the I/O cost
We measure the I/O cost in terms of the number of blocks that are read or written A block that is read and then written is counted as 2

Arranging Data on Disks
25

The Goal Arrange data on disks so that
Queries and updates can be performed by reading and writing as few blocks as possible, and Blocks would usually be read sequentially Optimal arrangement depends on the typical queries and updates that are going to be executed Harder to achieve

Addresses of Records on Disks
27

Addresses for Records on Disks
We need the ability to refer to a particular record In fact, some records have pointers to other records or to blocks Pointers are inherent to object-relational database systems Even in purely relational systems, pointers are needed in indexes The DBMS stores indexes – not just relations! Rx

Several Types of Addresses
How does one refer to records? Rx Many options: Physical Indirect

Purely Physical Device ID Cylinder # = Track # Block # Block ID
Offset in Block Block ID Record Address

Fully Indirect (Record IDs)
Record ID is a bit string (assigned by the system) that can be translated to a physical address by means of a table map Rec ID for R Address A Physical addr. Rec ID

Tradeoff Flexibility Cost to move records of indirection
(for deletions, insertions) Physical addresses limit the ability to move records or use their space when deleting them – why? Logical addresses have the cost of indirection If we move a record, we would have to search the whole database for pointers to that record and change them. If we use a deleted space for another record, then first we have to search the whole database for pointers to that record and indicate that they refer to a record that has been deleted. So, these would be prohibitively expensive operations.

Half & Half Approach Physical Indirect Many options in between …
One option: physical address of the block + logical address inside the block

Header: Fixed Part + Array A
Illustration: R7 R5 R8 R6 A Block: Free Space Header: Fixed Part + Array A The address of R6 is the pair (P, 2), where P is the physical address of the block Given (P, 2), we go to the block having the address P and then follow the pointer in A[2]

More Details on Half & Half
One field of the fixed part (of the header) contains the size of the array A The header is at the beginning of the block Any record R can be moved freely inside the block Only need to change the pointer to R in A All records are packed at the end of the block Available free space is between the header and the records Why do we want the free space to be contiguous? If the free space is contiguous, it can be used most efficiently when we insert additional records into the block.

Insertions Insert a new record R at the end of the free space, and add to the array A a pointer to R The address of R is determined when space is allocated to R R7 R5 R8 R6 A Block: Free Space Header: Fixed Part + Array A

Deletions To delete a record R, put a null in the entry of A for R – why do we need to do that? Move records toward the end to fill gaps and update their entries in A R7 R5 R8 R6 A Block: Free Space Header: Fixed Part + Array A We have to put a null in the entry of A for R, because we do not know where in the database there are pointers to R, so we need to leave an indication that R was deleted, in case some process follows those pointers.

Updates Can be done in-place, except when: The record grows in size
We may have to move the record or parts of it to another block if there is not enough space We update a field that is used to keep the file in sorted order We may have to move the record to another block, as dictated by the sorted order This case is really like a deletion followed by an insertion

Types of Files 39

Arranging a File on Disk
Try to allocate a contiguous portion of the disk to the file In a heap, records are packed into blocks in no particular order In a sorted file (also called sequential file), records are inserted in sorted order according to some field(s) Why the name “sequential file”? It is called sequential file, because it is stored sequentially (i.e., in contiguous blocks and cylinders) according to the sorted order. Therefore, it can be read quickly in sorted order. … Blocks for the file It is a good idea to chain the file’s blocks in both directions

Heap Assume that the file has 1,000,000 blocks
Easy to insert – records can be added either at the end or in any block that has available space I/O cost of insertion is 2 (not 1!) Suppose there are 100 records for “Levy” What if we want to read all of them? I/O cost is 1,000,000 blocks (must read all blocks) How much time will it take if we have the IDs of all those records? In the worst case, each record is in another block, so I/O cost is 100

Sorted File Must insert a new record in the location dictated by the order How much time does it take (the file has N blocks)? What if each block has some free space – does it help?

Sorted File Must insert a new record in the location dictated by the order How much time does insertion take (the file has N blocks)? We assume that binary search can be done (what is needed to make it possible?) Need to read logN blocks to find the location On average we have to read and write half of the file’s blocks to make room for the new record (if existing blocks are full) I/O cost is N, where N is the number of blocks of the file To avoid this high cost, use overflow blocks To do binary search, the file must be stored sequentially (i.e., contiguously) on the disk.

What is the problem with overflow?
Overflow Blocks We need to insert 350, 490 and 600, but block is full Use an overflow block 100 200 300 350 400 header 490 500 600 100 200 300 400 500 If overflow blocks are used, then the file is no longer stored in sequential order. The more overflow blocks there are, the longer it takes to read the whole file according to the sorted order. What is the problem with overflow?

Interesting Problems Free space How much free space to leave in each block, track, cylinder? How often to reorganize file + overflow?

Heap vs. Sorted File A file with 100 records for “Levy” (each has a size of 320 bytes) and 1,000,000 blocks (each is 4K bytes long) We have the IDs of all the records for “Levy” and need to read them If the file is organized as heap, then in the worst case the I/O cost is 100 blocks If the file is sorted on Name, then The records for “Levy” occupy a minimum of 8 blocks and 9 in the average case, so the I/O cost is 9 In the best case, the system will read starting with the first “Levy” (in sequential order) and will use read-ahead buffering, so in this case all 9 blocks will be read sequentially

Variable-Length Records
Reasons for variable-length records: Repeating fields Data about children Variable format A record of a person with data about medical tests Fields whose size varies, for example Address of a person BLOB (binary, large object), e.g., video clip Also, long fixed-length records cause a problem if they cannot be spanned across blocks

Handling Variable-Length Records
Several options for arranging variable-length records in blocks Read the textbook Read about how it is done in a specific DBMS you may want to use You need to understand these things to achieve optimal performance

Simple Example How to store data about students and the courses they take? Fixed-length records (S#,C#), or One variable-length record per student (S#,C#*) Does the system allocate space in each record for the max number of courses? Does the system use truly variable-length records, but with overflow blocks? How efficient is it to search on C#? Could save space (disk & memory) All the courses for a given student can be found very efficiently

So, what does happen when a block of records is read into main memory?
Addresses of Records on Disks are Different from Addresses in Main Memory So, what does happen when a block of records is read into main memory? 50

Pointer Swizzling Memory Disk
block 1 block 1 Rec B block 2 Rec A Block 1 was read into memory and record B continues to point to record A on the disk

Now We Also Read Block 2 Memory Disk
Rec B block 2 Rec A Rec A When reading block 2 into memory, we need to change (swizzle) the pointer to A in record B

A Table Translates DB Addresses to Memory Addresses
This table is just for the DB addresses that are currently in memory One entry per record or per block? This table is different from the one that translates logical addresses to physical ones (Slide 31) Memory Addr. DB Addr. One entry per block is enough

Several Approaches to Swizzling
Automatic swizzling When reading a block into memory, the pointers in that block are swizzled if they are in the table Is this enough? Swizzling on demand (lazy approach) No swizzling (i.e., use the table all the time) Address This is not enough, because some of the pointers in that block may point to blocks that will be read into memory later (so those pointers will have to be changed later). A bit indicating whether this is a DB address or a memory address

Unswizzling At some point, a block B is removed from memory
To make room for another block If B was changed (while in memory), then first it has to be written to disk Need to unswizzle the pointers in the block Must also update the table, and unswizzle pointers in memory that are pointing to B Need a list of all the pointers in memory that point to B

Query Processing Part 1: Managing Disks 1.

Similar presentations

Presentation on theme: "Query Processing Part 1: Managing Disks 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Processing Part 1: Managing Disks 1.

Similar presentations

Presentation on theme: "Query Processing Part 1: Managing Disks 1."— Presentation transcript:

Similar presentations

About project

Feedback