CS 505: Intermediate Topics to Database Systems

Slides:



Advertisements
Similar presentations
1 Storing Data: Disks and Files Chapter 7. 2 Disks and Files v DBMS stores information on (hard) disks. v This has major implications for DBMS design!
Advertisements

Storing Data: Disks and Files
FILES (AND DISKS).
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Storing Data: Disks and Files
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Storing Data: Disks and Files Lecture 5 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter.
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
CS 405G: Introduction to Database Systems 20. Concurrent control, Storage.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
CS 405G: Introduction to Database Systems Storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
File Organization Record Storage and Primary File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Module 11: File Structure
Storing Data: Disks and Files
Storing Data: Disks and Files
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Performance Measures of Disks
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Disk Storage, Basic File Structures, and Buffer Management
CS222P: Principles of Data Management Lecture #2 Heap Files, Page structure, Record formats Instructor: Chen Li.
Lecture 12 Lecture 12: Indexing.
Database Systems November 2, 2011 Lecture #7.
Introduction to Database Systems
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Secondary Storage Management Brian Bershad
Basics Storing Data on Disks and Files
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Secondary Storage Management Hank Levy
Storing Data: Disks and Files
EECS 647: Introduction to Database Systems
CS 405G: Introduction to Database Systems
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Lecture #2 Storing Data: Record/Page Formats Instructor: Chen Li.
Lecture 15: Data Storage Tuesday, February 20, 2001.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
CS 405G: Introduction to Database Systems
Presentation transcript:

CS 505: Intermediate Topics to Database Systems Instructor: Jinze Liu Fall 2008

Review: Database Design Jinze Liu @ University of Kentucky 4/28/2019

A DBMS Preview Jinze Liu @ University of Kentucky 4/28/2019

Outline It’s all about disks! Storing data on a disk That’s why we always draw databases as And why the single most important metric in database processing is the number of disk I/O’s performed Storing data on a disk Record layout Block layout Jinze Liu @ University of Kentucky 4/28/2019

Source: Operating Systems Concepts 5th Edition The Storage Hierarchy Smaller, Faster Main memory (RAM) for currently used data Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). Source: Operating Systems Concepts 5th Edition Bigger, Slower Jinze Liu @ University of Kentucky 4/28/2019

Jim Gray’s Storage Latency Analogy: How Far Away is the Data? Registers On Chip Cache On Board Cache Memory Disk 1 2 10 100 Tape /Optical Robot 9 6 Lexington This Lecture Hall This Room My Head 10 min 1.5 hr 2 Years 1 min Pluto 2,000 Years Andromeda Jinze Liu @ University of Kentucky 4/28/2019

“Moving parts” are slow A typical disk Tracks Disk arm Platter Spindle Cylinders Disk head “Moving parts” are slow Arm movement Spindle rotation Jinze Liu @ University of Kentucky 4/28/2019

Top view Sectors Higher-density sectors on inner tracks and/or more sectors on outer tracks Track A block is a logical unit of transfer consisting of one or more sectors So why use a block as a unit of transfer? Why not just bits? Write a bit at location a, another at location b, etc? Want economy of scale. Recall that it’s pluto. Maybe just grocery shop. Compare this with getting a book from a bookshelf in your room. This happens throughout the memory hierarchy. When fetching from cache, probably just one address. When fetching from memory, what really happens is that an entire cache line is transferred. Jinze Liu @ University of Kentucky 4/28/2019

Disk access time Sum of: Seek time: time for disk heads to move to the correct cylinder Rotational delay: time for the desired block to rotate under the disk head Transfer time: time to read/write data in the block (= time for disk to rotate over the block) Jinze Liu @ University of Kentucky 4/28/2019

Random disk access Seek time + rotational delay + transfer time Average seek time Time to skip one half of the cylinders? Not quite; should be time to skip a third of them (why?) “Typical” value: 5 ms Average rotational delay Time for a half rotation (a function of RPM) “Typical” value: 4.2 ms (7200 RPM) Typical transfer time .08msec per 8K block Jinze Liu @ University of Kentucky 4/28/2019

Sequential Disk Access Improves Performance Seek time + rotational delay + transfer time Seek time 0 (assuming data is on the same track) Rotational delay 0 (assuming data is in the next block on the track) Easily an order of magnitude faster than random disk access! Jinze Liu @ University of Kentucky 4/28/2019

Performance tricks Disk layout strategy Double buffering Keep related things (what are they?) close together: same sector/block ! same track ! same cylinder ! adjacent cylinder Double buffering While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing) Disk scheduling algorithm Track buffer Read/write one entire track at a time Parallel I/O More disk heads working at the same time Jinze Liu @ University of Kentucky 4/28/2019

Files Blocks are the interface for I/O, but… Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, each containing a collection of records. Must support: insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved) Jinze Liu @ University of Kentucky 4/28/2019

Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page There are many alternatives for keeping track of this. We’ll consider 2 Jinze Liu @ University of Kentucky 4/28/2019

Heap File Implemented as a List Header Page Data Pages with Free Space Full Pages The header page id and Heap file name must be stored someplace. Database “catalog” Each page contains 2 `pointers’ plus data. Jinze Liu @ University of Kentucky 4/28/2019

Heap File Using a Page Directory Data Page 1 Page 2 Page N Header Page DIRECTORY The entry for a page can include the number of free bytes on the page. The directory is a collection of pages; linked list implementation is just one alternative. Much smaller than linked list of all HF pages! Jinze Liu @ University of Kentucky 4/28/2019

Record layout Record = row in a table Variable-format records Rare in DBMS—table schema dictates the format Relevant for semi-structured data such as XML Focus on fixed-format records With fixed-length fields only, or With possible variable-length fields Jinze Liu @ University of Kentucky 4/28/2019

Record Formats: Fixed Length Base address (B) Address = B+L1+L2 All field lengths and offsets are constant Computed from schema, stored in the system catalog Finding i’th field done via arithmetic. Jinze Liu @ University of Kentucky 4/28/2019

Fixed-length fields Watch out for alignment What about NULL? Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT); 142 4 Bart (padded with space) 24 10 2.3 28 36 Watch out for alignment May need to pad; reorder columns if that helps What about NULL? Add a bitmap at the beginning of the record Jinze Liu @ University of Kentucky 4/28/2019

Record Formats: Variable Length Two alternative formats (# fields is fixed): F1 F2 F3 F4 $ $ $ $ Fields Delimited by Special Symbols F1 F2 F3 F4 Array of Field Offsets Second offers direct access to i’th field, efficient storage of nulls (special don’t know value); small directory overhead. Jinze Liu @ University of Kentucky 4/28/2019

LOB fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT, picture BLOB(32000)); Student records get “de-clustered” Bad because most queries do not involve picture Decomposition (automatically done by DBMS and transparent to the user) Student(SID, name, age, GPA) StudentPicture(SID, picture) Jinze Liu @ University of Kentucky 4/28/2019

Block layout How do you organize records in a block? Fixed length records Variable length records NSM (N-ary Storage Model) is used in most commercial DBMS Jinze Liu @ University of Kentucky 4/28/2019

Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 . . . Free Space . . . Slot N Slot N Slot M N 1 . . . 1 1 M M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable. Jinze Liu @ University of Kentucky 4/28/2019

NSM Store records from the beginning of each block Use a directory at the end of each block To locate records and manage free space Necessary for variable-length records 142 Bart 10 2.3 123 Milhouse 10 3.1 456 Ralph 8 2.3 857 Lisa 8 4.3 Why store data and directory at two different ends? Both can grow easily Jinze Liu @ University of Kentucky 4/28/2019

Options Reorganize after every update/delete to avoid fragmentation (gaps between records) Need to rewrite half of the block on average What if records are fixed-length? Reorganize after delete Only need to move one record Need a pointer to the beginning of free space Do not reorganize after update Need a bitmap indicating which slots are in use Jinze Liu @ University of Kentucky 4/28/2019

System Catalogs Catalogs are themselves stored as relations! For each relation: name, file location, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints For each index: structure (e.g., B+ tree) and search key fields For each view: view name and definition Plus statistics, authorization, buffer pool size, etc. Catalogs are themselves stored as relations! Jinze Liu @ University of Kentucky 4/28/2019

Attr_Cat(attr_name, rel_name, type, position) Attribute_Cat string 1 rel_name Attribute_Cat string 2 type Attribute_Cat string 3 position Attribute_Cat integer 4 sid Students string 1 name Students string 2 login Students string 3 age Students integer 4 gpa Students real 5 fid Faculty string 1 fname Faculty string 2 sal Faculty real 3 Jinze Liu @ University of Kentucky 4/28/2019

Indexes (a sneak preview) A Heap file allows us to retrieve records: by specifying the rid, or by scanning all records sequentially Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., Find all students in the “CS” department Find all students with a gpa > 3 Indexes are file structures that enable us to answer such value-based queries efficiently. Jinze Liu @ University of Kentucky 4/28/2019

Summary Disks provide cheap, non-volatile storage. Random access, but cost depends on the location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. Jinze Liu @ University of Kentucky 4/28/2019

Summary (Contd.) DBMS vs. OS File Support DBMS needs features not found in many OS’s, e.g., forcing a page to disk, controlling the order of page writes to disk, files spanning disks, ability to control pre-fetching and page replacement policy based on predictable access patterns, etc. Variable length record format with field offset directory offers support for direct access to i’th field and null values. Jinze Liu @ University of Kentucky 4/28/2019