Lecture 15: Midterm Review Data Storage

Slides:



Advertisements
Similar presentations
Physical DataBase Design
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization.
1 Lecture 14: Transactions in SQL Monday, October 30, 2006.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CS411 Database Systems Kazuhiro Minami 09: Storage.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Storage and Representation Spring 2016.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
1 Query Processing Part 1: Managing Disks. 2 Main Topics on Query Processing Running-time analysis Indexes (e.g., search trees, hashing) Efficient algorithms.
The very Essentials of Disk and Buffer Management.
Storage and File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Query Processing Part 1: Managing Disks 1.
Module 11: File Structure
Storing Data: Disks and Files
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
End of XQuery DBMS Internals
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Performance Measures of Disks
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
CSE 544: Lectures 13 and 14 Storing Data, Indexes
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Representing Block & Record Addresses
Lecture 6: Data Storage and Indexes
Basics Storing Data on Disks and Files
DATABASE IMPLEMENTATION ISSUES
CSE 544: Lecture 11 Storing Data, Indexes
Introduction to Database Systems CSE 444 Lecture 14: Transactions in SQL October 26, 2007.
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Storing Data: Disks and Files
Database Implementation Issues
Lecture 18: DMBS Overview and Data Storage
Lecture 17: Data Storage and Recovery
Lecture 15: Data Storage Tuesday, February 20, 2001.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Database Implementation Issues
Lecture 20: Representing Data Elements
Presentation transcript:

Lecture 15: Midterm Review Data Storage Monday, October 31, 2005

Midterm Monday, 11:30, this room (in class) Open book 50’ Notes, notebooks, anything No computers

Midterm SQL E/R Diagrams Functional Dependencies XML/Xpath/XQuery

SQL Know the basics: SFW, GROUP-BY, HAVING… When are two queries equivalent ? Eliminating subqueries Be aware of duplicates Insert/delete, especially more than one tuple Constraints in SQL

E/R Diagrams Good design (don’t make stupid mistakes) Translation to relations Many-many v.s. many-one relationships Subtleties: Inheritance Union types Weak entity sets

Functional Dependencies Know the definition of X ® Y Does a given table satisfy X ® Y ? Understand inference If A ® B, B ® C, does it follow that C ® A ? Why ? Why not ? Understand closure: X+ Understand BCNF and 3NF

XML Basics in XPath and Xquery In what sense is XML “semistructured” ? Mapping relations to XML Simple ways to store XML data Exclude the XML index

Midterm How to prepare: Read lecture notes Read from the textbook Review the homeworks Make sure you understand

Outline Disks 11.3 Representing data elements 12 Recommended reading: entire chapter 11 Representing data elements 12

The Mechanics of Disk Mechanical characteristics: Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(105) Spindle Tracks Disk head Sector Unit of read or write: disk block Once in memory: page Typically: 4k or 8k or 16k Arm movement Platters Arm assembly

Disk Access Characteristics Disk latency = time between when command is issued and when data is in memory Disk latency = seek time + rotational latency Seek time = time for the head to reach cylinder 10ms – 40ms Rotational latency = time for the sector to rotate Rotation time = 10ms Average latency = 10ms/2 Transfer time = typically 40MB/s Disks read/write one block at a time

Average Seek Time Suppose we have N tracks, what is the average seek time ? Getting from cylinder x to y takes time |x-y|

RAID Several disks that work in parallel Redundancy: use parity to recover from disk failure Speed: read from several disks at once Various configurations (called levels): RAID 1 = mirror RAID 4 = n disks + 1 parity disk RAID 5 = n+1 disks, assign parity blocks round robin RAID 6 = “Hamming codes”

Buffer Management in a DBMS Page Requests from Higher Levels READ WRITE BUFFER POOL disk page free frame INPUT OUTUPT MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained 4

Buffer Manager Manages buffer pool: the pool provides space for a limited number of pages from disk. Needs to decide on page replacement policy LRU Clock algorithm Both work well in OS, but not always in DB Enables the higher levels of the DBMS to assume that the needed data is in main memory.

Buffer Manager Why not use the Operating System for the task?? - DBMS may be able to anticipate access patterns - Hence, may also be able to perform prefetching - DBMS needs the ability to force pages to disk, for recovery purposes

Representing Data Elements Relational database elements: A tuple is represented as a record The table is a sequence of records CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name) )

Issues Represent attributes inside the records Represent the records inside the blocs

Record Formats: Fixed Length Base address (B) Address = B+L1+L2 Information about field types same for all records in a file; stored in system catalogs. Finding i’th field requires scan of record. Note the importance of schema information! 9

Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist 9

Variable Length Records Other header information header F1 F2 F3 F4 L1 L2 L3 L4 length Place the fixed fields first: F1 Then the variable length fields: F2, F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end) 9

Records With Repeating Fields Other header information header F1 F2 F3 L1 L2 L3 length Needed e.g. in Object Relational systems, or fancy representations of many-many relationships 9

Storing Records in Blocks Blocks have fixed size (typically 4k – 8k) BLOCK R4 R3 R2 R1

Spanning Records Across Blocks When records are very large Or even medium size: saves space in blocks block header block header R1 R2 R3 R2

BLOB Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together CLOB = character large objec Supports only restricted operations

Modifications: Insertion File is unsorted: add it to the end (easy ) File is sorted: Is there space in the right block ? Yes: we are lucky, store it there Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

Overflow Blocks Blockn-1 Blockn Blockn+1 Overflow After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions Free space in block, shift records Maybe be able to eliminate an overflow block Can never really eliminate the record, because others may point to it Place a tombstone instead (a NULL record) How can we point to a record in an RDBMS ?

Modifications: Updates If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

Pointers Physical addresses Where do we need them in RDBMS ? Pointers Physical addresses Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block’s header Note: review what a pointer in C is

Pointers Logical address: a string of bytes (10-16) More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3

Main Memory Address When the block is read in main memory, it receives a main memory address Need another translation table Memory address Logical address M1 L1 M2 L2 M3 L3

Optimization: Pointer Swizzling = the process of replacing a physical/logical pointer with a main memory pointer Still need translation table, but subsequent references are faster

Pointer Swizzling Block 2 Block 1 Disk read in memory swizzled Memory unswizzled

Pointer Swizzling Automatic: when block is read in main memory, swizzle all pointers in the block On demand: swizzle only when user requests No swizzling: always use translation table

Pointer Swizzling When blocks return to disk: pointers need unswizzled Danger: someone else may point to this block Pinned blocks: we don’t allow it to return to disk Keep a list of references to this block