Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual.

Slides:



Advertisements
Similar presentations
Physical DataBase Design
Advertisements

1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advance Database System
SLIDE 1IS 257 – Fall 2010 Physical Database Design and Referential Integrity University of California, Berkeley School of Information IS 257:
9/26/2000SIMS 257: Database Management Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
SLIDE 1IS 257 – Spring 2004 Physical Database Design and Referential Integrity University of California, Berkeley School of Information Management.
SLIDE 1IS Fall 2002 Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 202:
9/25/2001SIMS 257: Database Management Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
SLIDE 1IS 257 – Fall 2008 Physical Database Design and Referential Integrity University of California, Berkeley School of Information IS 257:
Efficient Storage and Retrieval of Data
SLIDE 1IS 257 – Fall 2006 Physical Database Design and Referential Integrity University of California, Berkeley School of Information IS 257:
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
SLIDE 1IS 257 – Spring 2004 Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
SLIDE 1IS 257 – Fall 2009 Physical Database Design University of California, Berkeley School of Information I 257: Database Management.
9/28/2000SIMS 257: Database Management -- Ray Larson More on Physical Database Design and Referential Integrity University of California, Berkeley School.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
SLIDE 1IS Fall 2002 Physical Database Design and Referential Integrity University of California, Berkeley School of Information Management.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
1 Lecture 7: Data structures for databases I Jose M. Peña
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CS411 Database Systems Kazuhiro Minami 09: Storage.
Chapter 5 Record Storage and Primary File Organizations
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Storage and Representation Spring 2016.
CS4432: Database Systems II
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Select Operation Strategies And Indexing (Chapter 8)
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Lecture 11: DMBS Internals
Database Implementation Issues
Lecture 15: Midterm Review Data Storage
Lecture 19: Data Storage and Indexes
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Secondary Storage Management Brian Bershad
Lecture 6: Data Storage and Indexes
Physical Database Design
DATABASE IMPLEMENTATION ISSUES
CSE 544: Lecture 11 Storing Data, Indexes
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Secondary Storage Management Hank Levy
Database Implementation Issues
Lecture 15: Data Storage Tuesday, February 20, 2001.
Database Implementation Issues
Lecture 20: Representing Data Elements
Presentation transcript:

Data Storage and Access Methods Min Song IS698

Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model Physical Design

Physical Database Design  Many physical database design decisions are implicit in the technology adopted Also, organizations may have standards or an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.  We will be concerned with some of the possible physical implementation issues

Physical Database Design  The primary goal of physical database design is data processing efficiency  We will concentrate on choices often available to optimize performance of database services  Physical Database Design requires information gathered during earlier stages of the design process

Physical Design Information  Information needed for physical file and database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used  entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

Physical Design Decisions  There are several critical decisions that will affect the integrity and performance of the system Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

Storage Format  Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database  Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection  Minimize storage space  Represent all possible values  Improve data integrity  Support all data manipulations  The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Access Data Types  Numeric (1, 2, 4, 8 bytes, fixed or float)  Text (255 max)  Memo (64000 max)  Date/Time (8 bytes)  Currency (8 bytes, 15 digits + 4 digits decimal)  Autonumber (4 bytes)  Yes/No (1 bit)  OLE (limited only by disk space)  Hyperlinks (up to chars)

Access Numeric types  Byte Stores numbers from 0 to 255 (no fractions). 1 byte  Integer Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes  Long Integer(Default) Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes  Single Stores numbers from E38 to – E–45 for negative values and from E–45 to E38 for positive values.4 bytes  Double Stores numbers from – E308 to – E–324 for negative values and from E308 to E–324 for positive values.158 bytes  Replication ID Globally unique identifier (GUID)N/A16 bytes

Designing Physical Records  A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit  Fixed Length and variable fields

Data Storage  Storing Data: Disks  Buffer manager  Representing relational data in a disk

The Memory Hierarchy Main Memory = Disk Cache Volatile 256M-1G Access time: nanoseconds Persistent GB storage speed: Rate=5-10 MB/S Access time= msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Processor Cache: access time 10 nano’s 512K Disk Tape

Main Memory  Fastest, most expensive (excluding cache)  Today: 512MB are common even on PCs  Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen  Main issue is volatility

Secondary Storage  Disks  Slower, cheaper than main memory  Persistent !!!  The unit of disk I/O = block Typically 1 block = 4k A disk block is also called a disk page or simply a page  Used with a main memory buffer

Block  Blocking factor (bfr) for a file is the average number of records stored in a disk block.  Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes. What is the blocking factor?

The Mechanics of Disk Mechanical characteristics:  Rotation speed (5400RPM)  Number of platters (1-30)  Number of tracks (<=10000)  Number of sectors (256/track)  Number of bytes / sector (2 9 =512)  Block size (2 12 =4096) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder

Important Disk Access Characteristics  Block access time = Disk latency + transfer time  Disk latency = seek time + rotational latency  Seek time = time for the head to reach the right track 10ms – 40ms  Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms/2  Transfer time = typically 5-10MB/s  Disks read/write one block at a time (typically 4kB)

Representing Data Elements  Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name))  A tuple is represented as a record

Record Formats: Fixed Length  Information about field types same for all records in a file; stored in system catalogs.  Finding i ’ th field requires scan of record.  Note the importance of schema information! Base address (B) L1L2 L3L4 F1F2 F3F4 Address = B+L1+L2

Record Header L1L2 L3L4 F1F2 F3F4 To schema length timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist header

Variable Length Records L1L2 L3L4 F1F2 F3F4 Other header information length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end) header

Records With Referencing Fields L1L2 L3 F1F2 F3 Other header information length header E.g. to represent one-many or many-many relationships

Storing Records in Blocks  Blocks have fixed size (typically 4k) R1R2R3 BLOCK R4

Spanning Records Across Blocks  When records are very large  Or even medium size: saves space in blocks block header block header R1R2 R3

BLOB  Binary large objects  Supported by modern database systems  E.g. images, sounds, etc.  Storage: attempt to cluster blocks together

Modifications: Insertion  File is unsorted add it to the end  File is sorted: Is there space in the right block ?  Yes: we are lucky, store it there Is there space in a neighboring block ?  Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

Overflow Blocks  After a while the file starts being dominated by overflow blocks: time to reorganize Block n-1 Block n Block n+1 Overflow

Modifications: Deletions  Free space in block, shift records  Maybe be able to eliminate an overflow block

Modifications: Updates  If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

Physical Addresses  Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block  sometimes this is in the block ’ s header

Logical Addresses  Logical address: a string of bytes (10- 16)  More flexible: can blocks/records around  But need translation table: Logical address Physical address L1P1 L2P2 L3P3

Main Memory Address  When the block is read in main memory, it receives a main memory address  Buffer manager has another translation table Memory address Logical address M1L1 M2L2 M3L3

Designing Physical/Internal Model  Overview  terminology  Access methods

Physical Design  Internal Model/Physical Model Operating System Access Methods Data Base User request DBMS Internal Model Access Methods External Model Interface 1 Interface 3 Interface 2

Physical Design  Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query  Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.  Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design  A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records  Pointers - a field of data that can be used to locate a related field or record of data  Access Methods - An operating system algorithm for storing and locating data in secondary storage  Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods  Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed  Differences in Access Efficiency Storage Efficiency

Physical Sequential  Key values of the physical records are in logical sequence  Main use is for “dump” and “restore”  Access method may be used for storage as well as retrieval  Storage Efficiency is near 100%  Access Efficiency is poor (unless fixed size physical records)

Indexed Sequential  Key values of the physical records are in logical sequence  Access method may be used for storage and retrieval  Index of key values is maintained with entries for the highest key values per block(s)  Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow  Storage Efficiency depends on size of index and volatility of database

Index Sequential Data File Block 1 Block 2 Block 3 Address Block Number 123…123… Actual Value Dumpling Harty Texaci... Adams Becker Dumpling Getta Harty Mobile Sunoci Texaci

Indexed Sequential: Two Levels Address 789…789… Key Value Address 1212 Key Value Address 3434 Key Value Address 5656 Key Value

Indexed Random  Key values of the physical records are not necessarily in logical sequence  Index may be stored and accessed with Indexed Sequential Access Method  Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.  Access method may be used for storage and retrieval

Indexed Random Address Block Number Actual Value Adams Becker Dumpling Getta Harty Becker Harty Adams Getta Dumpling

Btree F | | P | | Z | R | | S | | Z |H | | L | | P |B | | D | | F | Devils Aces Boilers Cars Minors Panthers Seminoles Flyers Hawkeyes Hoosiers

Inverted  Key values of the physical records are not necessarily in logical sequence  Access Method is better used for retrieval  An index for every field to be inverted may be built  Access efficiency depends on number of database records, levels of index, and storage allocated for index

Inverted Address Block Number 123…123… Actual Value CH 145 CS 201 CS 623 PH 345 CH , 103,104 CS CS , 106 Adams Becker Dumpling Getta Harty Mobile Student name Course Number CH145 cs201 ch145 cs623

Direct  Key values of the physical records are not necessarily in logical sequence  There is a one-to-one correspondence between a record key and the physical address of the record  May be used for storage and retrieval  Access efficiency always 1  Storage efficiency depends on density of keys  No duplicate keys permitted

Hashing  Key values of the physical records are not necessarily in logical sequence  Many key values may share the same physical address (block)  May be used for storage and retrieval  Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated  Storage efficiency depends on distibution of keys and algorithm used for key transformation

Comparative Access Methods Indexed No wasted space for data but extra space for index Moderately Fast Very fast with multiple indexes OK if dynamic OK if dynamic Easy but requires Maintenance of indexes Factor Storage space Sequential retrieval on primary key Random Retr. Multiple Key Retr. Deleting records Adding records Updating records Sequential No wasted space Very fast Impractical Possible but needs a full scan can create wasted space requires rewriting file usually requires rewriting file Hashed more space needed for addition and deletion of records after initial load Impractical Very fast Not possible very easy