Download presentation
Presentation is loading. Please wait.
1
Lecture 15: Midterm Review Data Storage
Monday, October 31, 2005
2
Midterm Monday, 11:30, this room (in class) Open book 50’
Notes, notebooks, anything No computers
3
Midterm SQL E/R Diagrams Functional Dependencies XML/Xpath/XQuery
4
SQL Know the basics: SFW, GROUP-BY, HAVING…
When are two queries equivalent ? Eliminating subqueries Be aware of duplicates Insert/delete, especially more than one tuple Constraints in SQL
5
E/R Diagrams Good design (don’t make stupid mistakes)
Translation to relations Many-many v.s. many-one relationships Subtleties: Inheritance Union types Weak entity sets
6
Functional Dependencies
Know the definition of X ® Y Does a given table satisfy X ® Y ? Understand inference If A ® B, B ® C, does it follow that C ® A ? Why ? Why not ? Understand closure: X+ Understand BCNF and 3NF
7
XML Basics in XPath and Xquery In what sense is XML “semistructured” ?
Mapping relations to XML Simple ways to store XML data Exclude the XML index
8
Midterm How to prepare: Read lecture notes Read from the textbook
Review the homeworks Make sure you understand
9
Outline Disks 11.3 Representing data elements 12
Recommended reading: entire chapter 11 Representing data elements 12
10
The Mechanics of Disk Mechanical characteristics:
Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(105) Spindle Tracks Disk head Sector Unit of read or write: disk block Once in memory: page Typically: 4k or 8k or 16k Arm movement Platters Arm assembly
11
Disk Access Characteristics
Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency Seek time = time for the head to reach cylinder 10ms – 40ms Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 40MB/s Disks read/write one block at a time
12
Average Seek Time Suppose we have N tracks, what is the average seek time ? Getting from cylinder x to y takes time |x-y|
13
RAID Several disks that work in parallel
Redundancy: use parity to recover from disk failure Speed: read from several disks at once Various configurations (called levels): RAID 1 = mirror RAID 4 = n disks + 1 parity disk RAID 5 = n+1 disks, assign parity blocks round robin RAID 6 = “Hamming codes”
14
Buffer Management in a DBMS
Page Requests from Higher Levels READ WRITE BUFFER POOL disk page free frame INPUT OUTUPT MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained 4
15
Buffer Manager Manages buffer pool: the pool provides space for a limited number of pages from disk. Needs to decide on page replacement policy LRU Clock algorithm Both work well in OS, but not always in DB Enables the higher levels of the DBMS to assume that the needed data is in main memory.
16
Buffer Manager Why not use the Operating System for the task??
- DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk, for recovery purposes
17
Representing Data Elements
Relational database elements: A tuple is represented as a record The table is a sequence of records CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name) )
18
Issues Represent attributes inside the records
Represent the records inside the blocs
19
Record Formats: Fixed Length
Base address (B) Address = B+L1+L2 Information about field types same for all records in a file; stored in system catalogs. Finding i’th field requires scan of record. Note the importance of schema information! 9
20
Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header
timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist 9
21
Variable Length Records
Other header information header F1 F2 F3 F4 L1 L2 L3 L4 length Place the fixed fields first: F1 Then the variable length fields: F2, F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end) 9
22
Records With Repeating Fields
Other header information header F1 F2 F3 L1 L2 L3 length Needed e.g. in Object Relational systems, or fancy representations of many-many relationships 9
23
Storing Records in Blocks
Blocks have fixed size (typically 4k – 8k) BLOCK R4 R3 R2 R1
24
Spanning Records Across Blocks
When records are very large Or even medium size: saves space in blocks block header block header R1 R2 R3 R2
25
BLOB Binary large objects Supported by modern database systems
E.g. images, sounds, etc. Storage: attempt to cluster blocks together CLOB = character large objec Supports only restricted operations
26
Modifications: Insertion
File is unsorted: add it to the end (easy ) File is sorted: Is there space in the right block ? Yes: we are lucky, store it there Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block
27
Overflow Blocks Blockn-1 Blockn Blockn+1 Overflow After a while the file starts being dominated by overflow blocks: time to reorganize
28
Modifications: Deletions
Free space in block, shift records Maybe be able to eliminate an overflow block Can never really eliminate the record, because others may point to it Place a tombstone instead (a NULL record) How can we point to a record in an RDBMS ?
29
Modifications: Updates
If new record is shorter than previous, easy If it is longer, need to shift records, create overflow blocks
30
Pointers Physical addresses
Where do we need them in RDBMS ? Pointers Physical addresses Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block’s header Note: review what a pointer in C is
31
Pointers Logical address: a string of bytes (10-16)
More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3
32
Main Memory Address When the block is read in main memory, it receives a main memory address Need another translation table Memory address Logical address M1 L1 M2 L2 M3 L3
33
Optimization: Pointer Swizzling
= the process of replacing a physical/logical pointer with a main memory pointer Still need translation table, but subsequent references are faster
34
Pointer Swizzling Block 2 Block 1 Disk read in memory swizzled Memory
unswizzled
35
Pointer Swizzling Automatic: when block is read in main memory, swizzle all pointers in the block On demand: swizzle only when user requests No swizzling: always use translation table
36
Pointer Swizzling When blocks return to disk: pointers need unswizzled
Danger: someone else may point to this block Pinned blocks: we don’t allow it to return to disk Keep a list of references to this block
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.