Lecture 15: Data Storage Tuesday, February 20, 2001.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Physical DataBase Design
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
Lecture #19 May 13 th, Agenda Trip Report End of constraints: triggers. Systems aspects of SQL – read chapter 8 in the book. Going under the lid!
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CS411 Database Systems Kazuhiro Minami 09: Storage.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Storage and Representation Spring 2016.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Module 11: File Structure
Storing Data: Disks and Files
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
End of XQuery DBMS Internals
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
Database Management Systems (CS 564)
Database Systems November 2, 2011 Lecture #7.
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 15: Midterm Review Data Storage
Lecture 19: Data Storage and Indexes
CSE 544: Lectures 13 and 14 Storing Data, Indexes
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Lecture 6: Data Storage and Indexes
Basics Storing Data on Disks and Files
DATABASE IMPLEMENTATION ISSUES
CSE 544: Lecture 11 Storing Data, Indexes
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Files and access methods
Storing Data: Disks and Files
Database Implementation Issues
Lecture 18: DMBS Overview and Data Storage
Lecture 17: Data Storage and Recovery
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Database Implementation Issues
Lecture 20: Representing Data Elements
Presentation transcript:

Lecture 15: Data Storage Tuesday, February 20, 2001

Outline Storing Data: Disks (7.1-7.3) Buffer manager (7.4) Representing data (7.5-7.7) External Sorting (Chapter 11)

The Memory Hierarchy Main Memory = Disk Cache Processor Cache: access time 10 nano’s Volatile 256M-1G expensive Access time: 10-100 nanoseconds Disk Tape Persistent 2-10 GB storage speed: Rate=5-10 MB/S Access time= 10-15 msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data

Main Memory Fastest, most expensive Today: 256MB are common even on PCs Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen Main issue is volatility

Secondary Storage Disks Slower, cheaper than main memory Persistent !!! The unit of disk I/O = block Typically 1 block = 4k Used with a main memory buffer

The Mechanics of Disk Mechanical characteristics: Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platers (1-30) Number of tracks (<=10000) Number of bytes/track(105) Spindle Tracks Disk head Sector Arm movement Platters Arm assembly

Important Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency Seek time = time for the head to reach cylinder 10ms – 40ms Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

RAIDs = “Redundant Array of Independent Disks” Was “inexpensive” disks Idea: use more disks, increase reliability Recall: Database recovery helps after a systems crash, not after a disk crash 6 ways to use RAIDs. More important: Level 4: use N-1 data disks, plus one parity disk Level 5: same, but alternate which disk is the parity Level 6: use Hamming codes instead of parity

Buffer Management in a DBMS Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained 4

Buffer Manager When a page is first requested, it is read in a free frame The DBMS typically requests it to be pinned: pin_count = how many processes requested it pinned When the DBMS writes to it, the buffer manager marks it dirty When the DBMS doesn’t need it any more, un-pinned: pin_count is decremented Typical replacement policy (always chooses among un-pinned frames): LRU Clock MRU

Buffer Manager Why not use the Operating System for the task?? - DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk.

Representing Data Elements Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name)) A tuple is represented as a record

Representing Data Elements Representing objects: interface Company { attribute string name; relationship Set<Product> makes inverse Product::maker; } An object is represented as a record plus object identifier What to do with repeating fields (e.g. makes)

Record Formats: Fixed Length Base address (B) Address = B+L1+L2 Information about field types same for all records in a file; stored in system catalogs. Finding i’th field requires scan of record. Note the importance of schema information! 9

Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist 9

Variable Length Records Other header information header F1 F2 F3 F4 L1 L2 L3 L4 length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end) 9

Records With Repeating Fields Other header information header F1 F2 F3 L1 L2 L3 length E.g. to represent one-many or many-many relationships 9

Storing Records in Blocks Blocks have fixed size (typically 4k) BLOCK R4 R3 R2 R1

Spanning Records Across Blocks When records are very large Or even medium size: saves space in blocks block header block header R1 R2 R3 R2

BLOB Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together

Modifications: Insertion File is unsorted (= heap file) add it to the end (easy ) File is sorted: Is there space in the right block ? Yes: we are lucky, store it there Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

Overflow Blocks Blockn-1 Blockn Blockn+1 Overflow After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions Free space in block, shift records Maybe be able to eliminate an overflow block Can never really eliminate the record, because others may point to it Place a tombstone instead (a NULL record)

Modifications: Updates If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

Physical Addresses Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block sometimes this is in the block’s header

Logical Addresses Logical address: a string of bytes (10-16) More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3

Main Memory Address When the block is read in main memory, it receives a main memory address Buffer manager has another translation table Memory address Logical address M1 L1 M2 L2 M3 L3

Optimization: Pointer Swizzling = the process of replacing a physical/logical pointer with a main memory pointer Still need translation table, but subsequent references are faster

Pointer Swizzling Block 2 Block 1 Disk read in memory swizzled Memory unswizzled

Pointer Swizzling Automatic: when block is read in main memory, swizzle all pointers in the block On demand: swizzle only when user requests No swizzling: always use translation table

Pointer Swizzling When blocks return to disk: pointers need unswizzled Danger: someone else may point to this block Pinned blocks: we don’t allow it to return to disk Keep a list of references to this block

The I/O Model of Computation In main memory algorithms we care about CPU time In databases time is dominated by I/O cost Assumption: cost is given only by I/O Consequence: need to redesign certain algorithms Will illustrate here with sorting

Sorting Illustrates the difference in algorithm design when your data is not in main memory: Problem: sort 1Gb of data with 1Mb of RAM. Arises in many places in database systems: Data requested in sorted order (ORDER BY) Needed for grouping operations First step in sort-merge join algorithm Duplicate removal Bulk loading of B+-tree indexes. 4

2-Way Merge-sort: Requires 3 Buffers Pass 1: Read a page, sort it, write it. only one buffer page is used Pass 2, 3, …, etc.: three buffer pages used. INPUT 1 OUTPUT INPUT 2 Main memory buffers Disk Disk 5

Two-Way External Merge Sort 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file Each pass we read + write each page in file. N pages in the file => the number of passes So total cost is: Improvement: start with larger runs Sort 1GB with 1MB memory in 10 passes PASS 0 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs PASS 1 2,3 4,7 1,3 2-page runs 4,6 8,9 5,6 2 PASS 2 2,3 4,4 1,2 4-page runs 6,7 3,5 8,9 6 PASS 3 1,2 2,3 3,4 8-page runs 4,5 6,6 7,8 9 6

Can We Do Better ? We have more main memory Should use it to improve performance

Cost Model for Our Analysis B: Block size M: Size of main memory N: Number of records in the file R: Size of one record 3

External Merge-Sort Phase one: load M bytes in memory, sort Result: runs of length M/R records M/R records . . . . . . Disk Disk M bytes of main memory

Phase Two . . . . . . Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1) records Input 1 . . . Input 2 . . . Output . . . . Input M/B Disk Disk M bytes of main memory 7

Phase Three . . . . . . Merge M/B – 1 runs into a new run Result: runs have now M/R (M/B – 1)2 records Input 1 . . . Input 2 . . . Output . . . . Input M/B Disk Disk M bytes of main memory 7

Cost of External Merge Sort Number of passes: Think differently Given B = 4KB, M = 64MB, R = 0.1KB Pass 1: runs of length M/R = 640000 Have now sorted runs of 640000 records Pass 2: runs increase by a factor of M/B – 1 = 16000 Have now sorted runs of 10,240,000,000 = 1010 records Pass 3: runs increase by a factor of M/B – 1 = 16000 Have now sorted runs of 1014 records Nobody has so much data ! Can sort everything in 2 or 3 passes ! 8