Download presentation
Presentation is loading. Please wait.
Published byIris Young Modified over 9 years ago
1
CS 44321 CS4432: Database Systems II Record and Page Formats Chapter 12
2
CS 44322 Data Items Records Blocks Files Memory Overview
3
CS 44323 What are the data items we want to store? a salary a name a picture What we have available: Bytes 8 bits
4
CS 44324 To represent: Integer (short): 2 bytes e.g., 35 is 0000000000100011
5
CS 44325 Boolean e.g., TRUE FALSE 1111 0000 To represent: Enumeration types: e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 … Can we use less than 1 byte/code? Yes, but only if desperate...
6
CS 44326 Characters Various coding schemes suggested (ASCII) To represent: Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010
7
CS 44327 –Null terminated e.g., –Length given e.g., -Fixed length e.g., In Oracle define the string length. e.g., name CHAR(20), ctacta3 To represent: String of characters
8
CS 44328 Key Points Fixed length items Variable length items - usually length given at beginning Type of an item : - tells us how to interpret (plus size if fixed)
9
CS 44329 Data Items Records Blocks Files Memory Overview
10
CS 443210 Record - Collection of related data items (called FIELDS) E.g.: Employee record: name CHAR (20), salary NUMBER, date-of-hire DATE,...
11
CS 443211 Types of records: Main choices: –FIXED vs VARIABLE FORMAT –FIXED vs VARIABLE LENGTH
12
CS 443212 A SCHEMA contains information such as: - # fields (attributes) - type of each field (length) - order of attributes in record - meaning of each field (domain) - constraints (primary key, etc). Fixed format Not associated with each record.
13
CS 443213 Example: fixed format & fixed length Employee record (1) E.id, 2 byte integer (2) E.name, 10 char. Schema (3) Dept, 2 byte code We can simply concatenate fields. 55 s m i t h 02 83 j o n e s 01 Records
14
CS 443214 What : –Not all fields are included in the record, –and/or, fields possibly in different orders. Then : –Record itself must contain format, i.e., it is “self-describing”: Variable format
15
CS 443215 Why Variable Format ? “sparse” records repeating fields evolving formats
16
CS 443216 Example: variable format and length 4I524SDROF46 Field name codes could also be strings, i.e., TAGS # Fields Code identifying field as E# Integer type Code for Ename String type Length of str.
17
CS 443217 EXAMPLE: variable format record with repeating fields e.g., Employee has one or more children 3E_name: FredChild: SallyChild: Tom Do repeating fields always require variable format and length?
18
CS 443218 Example : a person and her hobbies. MarySailingChess-- Then allocate maximum number of repeating fields If not used, set to null Repeating fields with fixed format & length
19
CS 443219 Many variants between fixed - variable format: Example1: Include record type in record recordtype record length tells me what to expect (i.e., points to schema) 527....
20
CS 443220 Record header - data at beginning that describes record May contain: - pointer to schema (record type) - length of record - time stamp (create time, mod. time) - other stuff (e.g., ROW-ID in Oracle)
21
CS 443221 Example2: Variant btw FIXED/VAR format Hybrid format : one part is fixed, other is variable E.g.: All employees have E#, name, dept; and other fields vary. 25SmithToy2retiredHobby:chess # of var fields
22
CS 443222 Also, many variations in internal organization of record Just to show one: length of field 3F310F1 5 F212 * * * 33251520F1F2F3 total size offsets 0 1 2 3 4 5 15 20
23
CS 443223 Question: We have seen examples for : * Fixed format and length records * Variable format and length records (a) Does fixed format and variable length make sense? (b) Does variable format and fixed length make sense?
24
CS 443224 Data Items Records Blocks Files Memory Next:
25
CS 443225 Goal : placing records into blocks blocks... a file assume fixed length blocks assume a single file (for now) records
26
CS 443226 (1) separating records (2) spanned vs. unspanned (3) mixed record types – clustering (4) split records (5) sequencing (6) indirection Options for storing records in blocks:
27
CS 443227 Block (a) no need to separate if fixed size records. (b) or, use special marker (c) or, give record lengths (or offsets) - within each record - in block header (1) Separating records R2R1R3
28
CS 443228 Unspanned: records within one block block 1 block 2... Spanned : records wrap across 2 blocks block 1 block 2... (2) Spanned vs. Unspanned R1R2 R1 R3R4R5 R2 R3 (a) R3 (b) R6R5R4 R7 (a)
29
CS 443229 Unspanned is much simpler, but may waste space… Spanned essential if record size > block size Spanned vs. unspanned:
30
CS 443230 Example 10 6 records each of size 2,050 bytes (fixed) block size = 4096 bytes block 1 block 2 2050 bytes wasted 2046 2050 bytes wasted 2046 R1R2 Utiliz = 50% -> ½ of space is wasted
31
CS 443231 Mixed - records of different types (e.g., EMPLOYEE, DEPT) allowed in same block e.g., a block (3) Mixed versus uniform record types EMP e1 DEPT d1 DEPT d2
32
CS 443232 Why do we want to mix? Answer: CLUSTERING Records that are frequently accessed together should be placed into the same block Problems Creates variable length records in block Aim to avoid duplicates (how to cluster?) Insert/deletes are harder
33
CS 443233 Example Clustering Q1: select C_NAME, C_CITY, AMOUNT, … from DEPOSIT, CUSTOMER where DEPOSIT.C_NAME = CUSTOMER.C.NAME a block layout: CUSTOMER,NAME=SMITH DEPOSIT,NAME=SMITH CUSTOMER,NAME=JONES Question: Good idea or bad idea ?
34
CS 443234 If Q1 frequent with join on customer and deposit relations, then clustering good But if instead Q2 frequent with : Q2: SELECT * FROM CUSTOMER then clustering is counter-productive
35
CS 443235 Compromise: No mixing, but keep related records in same cylinder...
36
CS 443236 (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records (5) Sequencing (6) Indirection So Far: Storing records in blocks
37
CS 443237 (1) separating records (2) spanned vs. unspanned (3) mixed record types – clustering (4) split records (5) sequencing (6) indirection Options for storing records in blocks:
38
CS 443238 Fixed part in one block Typically for hybrid format Variable part in another block (4) Split records
39
CS 443239 Block with fixed recs. R1 (a) R1 (b) Block with variable recs. R2 (a) R2 (b) R2 (c)
40
CS 443240 Ordering records in file (and block) by some key value –Sequential file ( - sequenced file) Why sequencing ? –Typically to make it possible to efficiently read records in order (5) Sequencing
41
CS 443241 Sequencing Options (a) Next record physically contiguous... (b) Linked What about INSERT/ DELETE ? Next (R1)R1 Next (R1)
42
CS 443242 (c)Overflow area Records in sequence R1 R2 R3 R4 R5 Sequencing Options header R2.1 R1.3 R4.7
43
CS 443243 How does one refer to records? Problem: Records can be on disk or in (virtual) memory. Need common address, but have different physical locations. (6) Indirection Addressing Rx Many options: PhysicalIndirect
44
CS 443244 Purely Physical Addressing Device ID E.g., RecordCylinder # Address=Track # ( ID ) Block # Offset in block Block ID
45
CS 443245 Fully Indirect Addressing Solution: Record ID (Oracle: ROWID) as global address, maintain a map table. Map Table rec ID raddress a Physical addr. Rec ID
46
CS 443246 Tradeoff Flexibility Cost to move recordsof indirection (for deletions, insertions) (lookup) What to do : Options inbetween ? Physical Indirect
47
CS 443247 Ex #1 : Indirection in block Block Header A block:Free space R3 R4 R1R2
48
CS 443248 Block header - data at beginning that describes block May contain: - File ID (or RELATION or DB ID) - This block ID - Record directory - Pointer to free space - Type of block (e.g. contains recs type 4; is overflow, …) - Pointer to other blocks “like it” - Timestamp...
49
CS 443249 Ex. #2 Use logical block #’s understood by file system instead of direct disk access REC ID File ID Block # Record # or Offset File ID,Physical Block #Block ID File System Map
50
CS 443250 (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records (5) Sequencing (6) Indirection Recap: Storing records in blocks
51
CS 443251 (1) Insertion/Deletion (2) Buffer Management (3) Comparison of Schemes Other Topics in Chapter 12
52
CS 443252 Block Deletion Rx
53
CS 443253 Options: (a)Deleted and immediately reclaim space (b)Mark deleted –May need chain of deleted records (for re-use) –Need a way to mark: special characters delete field in map
54
CS 443254 As usual, many tradeoffs... How expensive is to move valid record to free space for immediate reclaim? How much space is wasted? –e.g., deleted records, delete fields, free space chains,...
55
CS 443255 Dangling pointers Note: If pointers point to physical locations (rather than ROWIDs), storing new data in deleted block corrupts data. Concern with deletions R1?
56
CS 443256 Solution #1: Do not worry
57
CS 443257 E.g., Leave “MARK” in map or old location Solution #2: Tombstones Physical IDs A block This spaceThis space can never re-usedbe re-used
58
CS 443258 Logical IDs IDLOC 7788 map Never reuse ID 7788 nor space in map... E.g., Leave “MARK” in map or old location Solution #2: Tombstones
59
CS 443259 Place record ID within every record When you follow a pointer, check if it leads to correct record Solution #3 (?): Does this work??? If space reused, won’t new record have same ID? to 3-77 rec-id: 3-77
60
CS 443260 Easy case: Records fixed length/not in sequence Insert new record at end of file or, in deleted slot A little harder: If records are variable size, not as easy may not be able to reuse space – fragmentation Hard case: records in sequence If free space “close by”, not too bad... Or, use overflow idea... Or worst case, reorganize file... Insert
61
CS 443261 Interesting problems: How much free space to leave in each block, track, cylinder? How often do I reorganize file + overflow? Free space
62
CS 443262 DB features needed Why LRU may be bad Read Pinned blocks Textbook! Forced output Double buffering Swizzling Buffer Management
63
CS 443263 Pointer Swizzling Memory Disk Rec A block 1 Rec A block 2 block 1 Issue : If records (objects) contain pointers to other objects, translate locations when load objects into memory.
64
CS 443264 TranslationDB Addr Mem Addr Table Rec-A Rec-A-inMem One Option: Solution: Insert fields that represent pointers into map table. Translate pointers as needed.
65
CS 443265 In memory pointers - need “type” bit to disk to memory M Another Option:
66
CS 443266 Swizzling Issues Automatic On-demand No swizzling / program control Swizzling Options Must ‘unswizzle’ Updating/writing of records
67
CS 443267 There are 1,000,001 ways to organize my data on disk… Which is right for me? Comparison
68
CS 443268 Issues: FlexibilitySpace Utilization ComplexityPerformance
69
CS 443269 To evaluate a given strategy, compute following parameters: -> space used for expected data - on average -> expected time to : - fetch record given key - fetch record with next key - insert/delete/update record - read complete file - reorganize file (maybe sort) -> usage patterns / workload: - how many/which user queries/updates
70
CS 443270 Chapter 13 in book How to find a record quickly, given a key NEXT
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.