Download presentation
Presentation is loading. Please wait.
Published byReynard Scot Rodgers Modified over 9 years ago
1
Chapter 3 Representing Data Elements 1.How to lay out data on disk 2.How to move it to memory
2
Attributes Records Blocks Files Memory Data Elements
3
Representing Relational Database Elements CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, salary INTEGER, address VARCHAR(255), gender CHAR(1), birthdate DATE );
4
Representing Objects Interface star{ attribute string name; attribute Struct Addr{ string street, string city} address; relationship set starredIn inverse Movie::stars;
5
Some Differences Objects can have methods. Objects have an object identifier. Objects can have relationships to other objects. So we need a way to represent address as objects have OID, and may have relationships to other objects. We also need the ability to represent arbitrarily long lists of objects, such as arbitrarily long lists of movies for a given star.
6
What are the data attributes we want to store? a salary a name a date a picture What we have available: Bytes 8 bits
7
To represent: Integer :2 bytes (short), 4 bytes (long) e.g., 35 is 0000000000100011 Float: 4 bytes, Double Float: 8 bytes
8
Characters various coding schemes suggested, most popular is ASCII To represent: Example: A: 1000001 a: 1100001 5: 0110101 LF: 0001010
9
Fixed-Length Character Strings To represent: A CHAR(5) A: cat cat Hereis the “pad” character, whose 8-bit code is not one of legal characters for SQL strings
10
Variable-Length Character Strings –Null terminated e.g., –Length given e.g., ctacta 3 To represent: The SQL type VARCHAR(n) actually represents fields of fixed length, although its value has a length that varies, as n+1 bytes are dedicated to the value of the string regards of how long it is.
11
To represent: SQL2 Standard : Date : YYYY-MM-DD Time : HH:MM:SS 20:19:02 1958-05-14 Time to include fractions of a second : ‘ 20:19:02.25 ’
12
Bag of bits LengthBits To represent: A sequence of bits are packed eight to a byte.
13
Boolean e.g., TRUE FALSE 1111 0000 To represent: 1000 0000 0000
14
To represent: Enumerated Types e.g., RED 1 GREEN 3 BLUE 2 YELLOW 4 … Can we use less than 1 byte/code? Yes, but only if with less than 256 values
15
Representing Records Fixed-Length Records Variable-Length Records Record Addresses Record Modification
16
Record - Collection of related data items (called FIELDS) E.g.: Employee record: name field, salary field, date-of-hire field,...
17
Types of records: Main choices: –FIXED vs VARIABLE FORMAT –FIXED vs VARIABLE LENGTH
18
Fixed-Length Records 0 30 286 287 297 nameaddressbirthdate gender A MovieStar Record nameaddressbirthdate gender 0 32 288 292 304 The layout of MovieStar tuples when fields are required to start at multiple of 4 bytes
19
A SCHEMA (not record) contains following information - # fields - type of each field - order in record - meaning of each field Fixed format
20
Example: fixed format and length Employee record (1) E#, 2 byte integer (2) E.name, 10 char.Schema (3) Dept, 2 byte code 55 s m i t h 02 83 j o n e s 01 Records
21
To schema length timestamp nameaddress gender birthdate 0 12 44 300 304 316 header Adding head information to records representing tuples of the Moviestar relation The attributes of the relation Their types The order in which attributes appear in the tuple Constraints on the attributes and relation itself
22
Header Record1 Record2 … Record n Packing Fixed-Length Records into Blocks Links to one or more other blocks that are part of a network of blocks. Information about the role played by this block in such a network. Information about which relation the tuples of this block belong to. A “ directory ” giving the offset of each record in the block A “ block ID ” Timestamp(s) indicating the time of the block ‘ s last modification and/or access A typical block holding records
23
Block and Record Addresses Client Server Application Address SpaceDatabase Address Space Logical Addresses Physical Address Memory Addresses
24
How does one refer to records? Indirection Rx Many options: PhysicalIndirect
25
Purely Physical Device ID E.g., RecordCylinder # Address=Track # or ID Block # Offset in block Block ID
26
Physical Addresses: (1) Host Name (2) Device ID (3) Cylinder # (4) Track # (5) Block # (6) Offset Logical Addresses: Byte strings of some fixed length logicalphysical Logical address Physical address A map table translates logical to physical address
27
Fully Indirect E.g., Record ID is arbitrary bit string map rec ID raddress a Physical addr. Rec ID
28
Tradeoff Flexibility Cost to move records of indirection (for deletions, insertions)
29
Physical Indirect Many options in between …
30
Logical and Structured Addresses A physical address of the block + the key value for the record being referred to. A physical address of the block + the offset of the entry in the block’s offset table for the record being referred to.
31
Ex #1 Indirection in block Header A block:Free space R3 R4 R1R2
32
Block header - data at beginning that describes block May contain: - File ID (or RELATION or DB ID) - This block ID - Record directory - Pointer to free space - Type of block (e.g. contains recs type 4; is overflow, … ) - Pointer to other blocks “ like it ” - Timestamp...
33
record4 A block with a table of offsets telling us the position of each record within the block header unused offset table record3record2record1 1.Moving the record around within the block by changing the record’s entry in the offset table. 2.Moving the record to another block by holding a “ forwarding address” in its offset-table entry. 3.Deleting the record by leaving a tombstone in its offset-table entry.
34
DBaddr mem-addr database address (logical address, physical address) memory address The translation table turns database addresses into their equivalents in memory. Database Address and Memory Address
35
Swizzled DiskMemory Read into memory Block 1 Block 2 Unswizzled Pointer Swizzling
36
Strategies to Swizzle Pointers Automatic Swizzling Swizzling on Demand No Swizzling Programmer Control of Swizzling
37
When a block is moved from memory back to disk, any pointers within the block must be “unswizzled”. Select memaddr From translationtable Where dbaddr=x; (creating index on the dbAddr ) Select dbaddr From translationtable Where memaddr=y; (creating index on the memaddr) Returning Blocks to Disk
38
A block in memory is said to be pinned if it is referred to by a swizzled pointer from somewhere else. When a block is pinned, we must either unpin it, or let the block remain in memory. xy y y Translation table Swizzled pointer
39
Variable-Length Records Data items whose size varies Repeating fields Variable-format records Enormous fields
40
Records With Variable-Length Fields Other header information Record length to address gender birthdatenameaddress The first variable-length field needs no pointer.
41
Records With Repeating Fields to address to movie pointers nameaddress other header information record length pointers to movies A record with a repeating group of references to movies
42
Record Additional space address name record header information to name length of name to address length of address to movie references number of references Storing variable-length fields separately from the record
43
Variable-Form at Records Clint eastwoodHog ’ s breath inn … NS14 RS16 code for name code for string type length code for restaurant owned code for string type length A record with tagged fields
44
Spanned Records block header record1 record 2-a record 2-b Record 3 block 1 block 2 record header Storing spanned records across blocks
45
BLOBS Storage of BLOBS Stored on a sequence of blocks; Striped across several disks. Retrieval of BLOBS Several blocks at a time; Suitable index structure.
46
Record Modification Insertion Deletion Update
47
Easy case: records not in sequence Insert new record at end of file or in deleted slot If records are variable size, not as easy... Insert
48
Hard case: records in sequence If free space “ close by ”, not too bad... Or use overflow idea... Insert
49
Block Deletion Rx
50
Options: (a)Immediately reclaim space (b)Mark deleted –May need chain of deleted records (for re-use) –Need a way to mark: special characters delete field in map
51
As usual, many tradeoffs... How expensive is to move valid record to free space for immediate reclaim? How much space is wasted? –e.g., deleted records, delete fields, free space chains,...
52
Dangling pointers Concern with deletions R1?
53
E.g., Leave “ MARK ” in old location Solution: Tombstones A block This spaceThis space can never re-usedbe re-used
54
IDLOC 7788 map Never reuse ID 7788 nor space in map... E.g., Leave “ MARK ” in map Solution : Tombstones
55
Update Fixed-length record: no effort on the storage system Variable-length record: with methods of insertion and deletion but without tombstone
56
Interesting problems: How much free space to leave in each block, track, cylinder? How often do I reorganize file + overflow?
57
There are 10,000,000 ways to organize my data on disk … Which is right for me? Comparison
58
Issues: FlexibilitySpace Utilization ComplexityPerformance
59
To evaluate a given strategy, compute following parameters: -> space used for expected data -> expected time to - fetch record given key - fetch record with next key - insert record - append record - delete record - update record - read all file - reorganize file
60
Example How would you design Megatron 3000 storage system? (for a relational DB, low end) –Variable length records? –Spanned? –What data types? –Fixed format? –Record IDs ? –Sequencing? –How to handle deletions?
61
How to lay out data on disk Data Items Records Blocks Files Memory DBMS Summary
62
How to find a record quickly, given a key Next
63
Exercises of Chapter 2, 3 EX 2.2.1 EX 2.2.2 EX 2.3.1 EX 2.6.7 EX 3.2.2 EX 3.3.4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.