Next: Data Items Records Blocks Files Memory CS 4432 lecture #5
Goal : placing records into blocks a file records assume fixed length blocks assume a single file (for now) CS 4432 lecture #5
Options for storing records in blocks: (1) separating records (2) spanned vs. unspanned (3) mixed record types – clustering (4) split records (5) sequencing (6) indirection CS 4432 lecture #5
(1) Separating records Block (a) no need to separate if fixed size records. (b) or, use special marker (c) or, give record lengths (or offsets) - within each record - in block header R1 R2 R3 CS 4432 lecture #5
(2) Spanned vs. Unspanned Unspanned: records within one block block 1 block 2 ... Spanned : records wrap across 2 blocks block 1 block 2 ... R1 R2 R3 R4 R5 R1 R2 R3 (a) R3 (b) R4 R5 R6 R7 (a) CS 4432 lecture #5
Unspanned is much simpler, but may waste space… Spanned vs. unspanned: Unspanned is much simpler, but may waste space… Spanned essential if record size > block size CS 4432 lecture #5
Example 106 records each of size 2,050 bytes (fixed) block size = 4096 bytes block 1 block 2 2050 bytes wasted 2046 2050 bytes wasted 2046 R1 R2 Utiliz = 50% -> ½ of space is wasted CS 4432 lecture #5
(3) Mixed versus uniform record types Mixed - records of different types (e.g., EMPLOYEE, DEPT) allowed in same block e.g., a block EMP e1 DEPT d1 DEPT d2 CS 4432 lecture #5
Why do we want to mix? Answer: CLUSTERING Records that are frequently accessed together should be placed into the same block Problems Creates variable length records in block Aim to avoid duplicates (how to cluster?) Insert/deletes are harder CS 4432 lecture #5
Example Clustering Q1: select C_NAME, C_CITY, AMOUNT, … from DEPOSIT, CUSTOMER where DEPOSIT.C_NAME = CUSTOMER.C.NAME a block layout: CUSTOMER,NAME=SMITH DEPOSIT,NAME=SMITH DEPOSIT,NAME=SMITH CUSTOMER,NAME=JONES Question: Good idea or bad idea ? CS 4432 lecture #5
But if instead Q2 frequent with : Q2: SELECT * FROM CUSTOMER If Q1 frequent with join on customer and deposit relations, then clustering good But if instead Q2 frequent with : Q2: SELECT * FROM CUSTOMER then clustering is counter-productive CS 4432 lecture #5
Compromise: No mixing, but keep related records in same cylinder ... CS 4432 lecture #5
Recap: Storing records in blocks (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records (5) Sequencing (6) Indirection CS 4432 lecture #5
Options for storing records in blocks: (1) separating records (2) spanned vs. unspanned (3) mixed record types – clustering (4) split records (5) sequencing (6) indirection CS 4432 lecture #5
(4) Split records Fixed part in one block Typically for hybrid format Variable part in another block CS 4432 lecture #5
R1 (a) R1 (b) R2 (a) R2 (b) R2 (c) Block with fixed recs. Block with variable recs. R1 (a) R1 (b) R2 (a) R2 (b) R2 (c) CS 4432 lecture #5
(5) Sequencing Ordering records in file (and block) by some key value Sequential file ( - sequenced file) Why sequencing ? Typically to make it possible to efficiently read records in order CS 4432 lecture #5
Sequencing Options (a) Next record physically contiguous ... (b) Linked What about INSERT/ DELETE ? Next (R1) R1 R1 Next (R1) CS 4432 lecture #5
Sequencing Options (c) Overflow area Records in sequence header R1 CS 4432 lecture #5
(6) Indirection Addressing How does one refer to records? Problem: Records can be on disk or in (virtual) memory. Need common address, but have different physical locations. Rx Many options: Physical Indirect CS 4432 lecture #5
Purely Physical Addressing Device ID E.g., Record Cylinder # Address = Track # ( ID ) Block # Offset in block Block ID CS 4432 lecture #5
Fully Indirect Addressing Solution: Record ID (Oracle: ROWID) as global address, maintain a map table. Map Table rec ID r address a Rec ID Physical addr. CS 4432 lecture #5
What to do : Options inbetween ? Tradeoff Physical Indirect Flexibility Cost to move records of indirection (for deletions, insertions) (lookup) What to do : Options inbetween ? CS 4432 lecture #5
Ex #1 : Indirection in block Block Header A block: Free space R3 R4 R1 R2 CS 4432 lecture #5
Ex. #2 Use logical block #’s. understood by file system Ex. #2 Use logical block #’s understood by file system instead of direct disk access REC ID File ID Block # Record # or Offset File ID, Physical Block # Block ID File System Map CS 4432 lecture #5
Recap: Storing records in blocks (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records (5) Sequencing (6) Indirection CS 4432 lecture #5