Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 121 Chapter 12: Representing Data Elements (Slides by Hector Garcia-Molina,

Similar presentations


Presentation on theme: "Chapter 121 Chapter 12: Representing Data Elements (Slides by Hector Garcia-Molina,"— Presentation transcript:

1 Chapter 121 Chapter 12: Representing Data Elements (Slides by Hector Garcia-Molina, http://www-db.stanford.edu/~hector/cs245/notes.htm)

2 Chapter 122 How to lay out data on disk Sec. 12.3 and 15.7 (self-study): How to move data to memory Topics for today

3 Chapter 123 Issues: FlexibilitySpace Utilization ComplexityPerformance

4 Chapter 124 To evaluate a given strategy, compute following parameters: -> space used for expected data -> expected time to - fetch record given key - fetch record with next key - insert record - append record - delete record - update record - read all file - reorganize file

5 Chapter 125 What are the data items we want to store? a salary a name a date a picture What we have available: Bytes 8 bits

6 Chapter 126 To represent: Integer (short): 2 bytes e.g., 35 is 0000000000100011 Real, floating point n bits for mantissa, m for exponent….

7 Chapter 127 Characters  various coding schemes suggested, most popular is ascii To represent: Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010

8 Chapter 128 Boolean e.g., TRUE FALSE 1111 0000 To represent: Application specific e.g., RED  1 GREEN  3 BLUE  2 YELLOW  4 … Can we use less than 1 byte/code? Yes, but only if desperate...

9 Chapter 129 Dates e.g.: - Integer, # days since Jan 1, 1900 - 8 characters, YYYYMMDD - 7 characters, YYYYDDD (not YYMMDD! Why?) Time e.g. - Integer, seconds since midnight - characters, HHMMSSFF To represent:

10 Chapter 1210 String of characters –Null terminated e.g., –Length given e.g., - Fixed length ctacta 3 To represent:

11 Chapter 1211 Bag of bits LengthBits To represent:

12 Chapter 1212 Key Point Fixed length items Variable length items - usually length given at beginning

13 Chapter 1213 Type of an item: Tells us how to interpret (plus size if fixed) Also

14 Chapter 1214 Data Items Records Blocks Files Memory Overview

15 Chapter 1215 Record - Collection of related data items (called FIELDS) E.g.: Employee record: name field, salary field, date-of-hire field,...

16 Chapter 1216 Types of records: Main choices: –FIXED vs VARIABLE FORMAT –FIXED vs VARIABLE LENGTH

17 Chapter 1217 A SCHEMA (not record) contains following information - # fields - type of each field - order in record - meaning of each field Fixed format

18 Chapter 1218 Example: fixed format and length Employee record (1) E#, 2 byte integer (2) E.name, 10 char.Schema (3) Dept, 2 byte code 55 s m i t h 02 83 j o n e s 01 Records

19 Chapter 1219 Record itself contains format “Self Describing” Variable format

20 Chapter 1220 Example: variable format and length 4I524SDROF46 Field name codes could also be strings, i.e. TAGS # Fields Code identifying field as E# Integer type Code for Ename String type Length of str.

21 Chapter 1221 Variable format useful for: “sparse” records repeating fields evolving formats But may waste space...

22 Chapter 1222 EXAMPLE: var format record with repeating fields Employee can have children 3E_name: FredChild: SallyChild: Tom

23 Chapter 1223 Note: Repeating fields does not imply - variable format, nor - variable size JohnSailingChess-- Key is to allocate maximum number of repeating fields (if not used  null)

24 Chapter 1224 Many variants between fixed - variable format: Ex. #1: Include record type in record recordtype record length tells me what to expect (i.e. points to schema) 527....

25 Chapter 1225 Record header - data at beginning that describes record May contain: - record type - record length - time stamp - other stuff...

26 Chapter 1226 Ex #2 of variant between FIXED/VAR format Hybrid format –one part is fixed, other variable E.g.: All employees have E#, name, dept other fields vary. 25SmithToy2retiredHobby:chess # of var fields

27 Chapter 1227 Other interesting issues: Compression –within record - e.g. code selection –collection of records - e.g. find common patterns Encryption

28 Chapter 1228 Next: placing records into blocks blocks... a file assume fixed length blocks assume a single file (for now)

29 Chapter 1229 (1) separating records (2) spanned vs. unspanned (3) mixed record types – clustering (4) split records Options for storing records in blocks:

30 Chapter 1230 Block (a) no need to separate - fixed size recs. (b) special marker (c) give record lengths (or offsets) - within each record - in block header (1) Separating records R2R1R3

31 Chapter 1231 Unspanned: records must be within one block block 1 block 2... Spanned block 1 block 2... (2) Spanned vs. Unspanned R1R2 R1 R3R4R5 R2 R3 (a) R3 (b) R6R5R4 R7 (a)

32 Chapter 1232 need indication of partial recordof continuation “pointer” to rest(+ from where?) R1R2 R3 (a) R3 (b) R6R5R4 R7 (a) With spanned records:

33 Chapter 1233 Unspanned is much simpler, but may waste space… Spanned essential if record size > block size Spanned vs. unspanned:

34 Chapter 1234 Example 10 6 records each of size 2,050 bytes (fixed) block size = 4096 bytes block 1 block 2 2050 bytes wasted 2046 2050 bytes wasted 2046 R1R2 Total wasted = 2 x 10 9 Utiliz = 50% Total space = 4 x 10 9

35 Chapter 1235 Mixed - records of different types (e.g. EMPLOYEE, DEPT) allowed in same block e.g., a block (3) Mixed record types EMP e1 DEPT d1 DEPT d2

36 Chapter 1236 Why do we want to mix? Answer: CLUSTERING Records that are frequently accessed together should be in the same block

37 Chapter 1237 Compromise: No mixing, but keep related records in same cylinder...

38 Chapter 1238 Example Q1: select A#, C_NAME, C_CITY, … from DEPOSIT, CUSTOMER where DEPOSIT.C_NAME = CUSTOMER.C.NAME a block CUSTOMER,NAME=SMITH DEPOSIT,NAME=SMITH

39 Chapter 1239 If Q1 frequent, clustering good But if Q2 frequent Q2: SELECT * FROM CUSTOMER CLUSTERING IS COUNTER PRODUCTIVE

40 Chapter 1240 Fixed part in one block Typically for hybrid format Variable part in another block (4) Split records

41 Chapter 1241 Block with fixed recs. R1 (a) R1 (b) Block with variable recs. R2 (a) R2 (b) R2 (c) This block also has fixed recs.

42 Chapter 1242 Question What is difference between - Split records - Simply using two different record types?

43 Chapter 1243 (1) Separating records (2) Spanned vs. Unspanned (3) Mixed record types - Clustering (4) Split records Options for storing records in blocks

44 Chapter 1244 (1) Insertion/Deletion (2) Buffer Management (3) Comparison of Schemes Other Topics

45 Chapter 1245 Block Deletion Rx

46 Chapter 1246 Options: (a)Immediately reclaim space (b)Mark deleted –May need chain of deleted records (for re-use) –Need a way to mark: special characters

47 Chapter 1247 As usual, many tradeoffs... How expensive is to move valid record to free space for immediate reclaim? How much space is wasted? –e.g., deleted records, free space chains,...

48 Chapter 1248 Dangling pointers Concern with deletions R1?

49 Chapter 1249 Solution #1: Do not worry

50 Chapter 1250 E.g., Leave “MARK” in map or old location Solution #2: Tombstones Physical IDs A block This spaceThis space can never re-usedbe re-used

51 Chapter 1251 Logical IDs IDLOC 7788 map Never reuse ID 7788 nor space in map... E.g., Leave “MARK” in map or old location Solution #2: Tombstones

52 Chapter 1252 Easy case: records not in sequence  Insert new record at end of file or in deleted slot  If records are variable size, not as easy... Insert

53 Chapter 1253 Hard case: records in sequence  If free space “close by”, not too bad...  Or use overflow idea... Insert

54 Chapter 1254 Interesting problems: How much free space to leave in each block, track, cylinder? How often do I reorganize file + overflow?

55 Chapter 1255 There are 10,000,000 ways to organize my data on disk… Which is right for me? Comparison

56 Chapter 1256 Issues: FlexibilitySpace Utilization ComplexityPerformance

57 Chapter 1257 To evaluate a given strategy, compute following parameters: -> space used for expected data -> expected time to - fetch record given key - fetch record with next key - insert record - append record - delete record - update record - read all file - reorganize file

58 Chapter 1258 Example How would you design Megatron 3000 storage system? (for a relational DB, low end) –Variable length records? –Spanned? –What data types? –Fixed format? –How to handle deletions?

59 Chapter 1259 How to lay out data on disk Data Items Records Blocks Files Memory DBMS Summary

60 Chapter 1260 How to find a record quickly, given a key Next


Download ppt "Chapter 121 Chapter 12: Representing Data Elements (Slides by Hector Garcia-Molina,"

Similar presentations


Ads by Google