Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual.

Similar presentations


Presentation on theme: "Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual."— Presentation transcript:

1 Data Storage and Access Methods Min Song IS698

2 Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual requirements Conceptual requirements Conceptual requirements Application 1 Application 2Application 3Application 4 Application 2 Application 3 Application 4 External Model External Model External Model Internal Model Physical Design

3 Physical Database Design  Many physical database design decisions are implicit in the technology adopted Also, organizations may have standards or an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.  We will be concerned with some of the possible physical implementation issues

4 Physical Database Design  The primary goal of physical database design is data processing efficiency  We will concentrate on choices often available to optimize performance of database services  Physical Database Design requires information gathered during earlier stages of the design process

5 Physical Design Information  Information needed for physical file and database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used  entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

6 Physical Design Decisions  There are several critical decisions that will affect the integrity and performance of the system Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

7 Storage Format  Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database  Data Type (format) is chosen to minimize storage space and maximize data integrity

8 Objectives of data type selection  Minimize storage space  Represent all possible values  Improve data integrity  Support all data manipulations  The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

9 Access Data Types  Numeric (1, 2, 4, 8 bytes, fixed or float)  Text (255 max)  Memo (64000 max)  Date/Time (8 bytes)  Currency (8 bytes, 15 digits + 4 digits decimal)  Autonumber (4 bytes)  Yes/No (1 bit)  OLE (limited only by disk space)  Hyperlinks (up to 64000 chars)

10 Access Numeric types  Byte Stores numbers from 0 to 255 (no fractions). 1 byte  Integer Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes  Long Integer(Default) Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes  Single Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values.4 bytes  Double Stores numbers from –1.79769313486231E308 to – 4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values.158 bytes  Replication ID Globally unique identifier (GUID)N/A16 bytes

11 Designing Physical Records  A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit  Fixed Length and variable fields

12 Data Storage  Storing Data: Disks  Buffer manager  Representing relational data in a disk

13 The Memory Hierarchy Main Memory = Disk Cache Volatile 256M-1G Access time: 10-100 nanoseconds Persistent 10-100 GB storage speed: Rate=5-10 MB/S Access time= 10-15 msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Not for operational data Processor Cache: access time 10 nano’s 512K Disk Tape

14 Main Memory  Fastest, most expensive (excluding cache)  Today: 512MB are common even on PCs  Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen  Main issue is volatility

15 Secondary Storage  Disks  Slower, cheaper than main memory  Persistent !!!  The unit of disk I/O = block Typically 1 block = 4k A disk block is also called a disk page or simply a page  Used with a main memory buffer

16 Block  Blocking factor (bfr) for a file is the average number of records stored in a disk block.  Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes. What is the blocking factor?

17 The Mechanics of Disk Mechanical characteristics:  Rotation speed (5400RPM)  Number of platters (1-30)  Number of tracks (<=10000)  Number of sectors (256/track)  Number of bytes / sector (2 9 =512)  Block size (2 12 =4096) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder

18 Important Disk Access Characteristics  Block access time = Disk latency + transfer time  Disk latency = seek time + rotational latency  Seek time = time for the head to reach the right track 10ms – 40ms  Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms/2  Transfer time = typically 5-10MB/s  Disks read/write one block at a time (typically 4kB)

19 Representing Data Elements  Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name))  A tuple is represented as a record

20 Record Formats: Fixed Length  Information about field types same for all records in a file; stored in system catalogs.  Finding i ’ th field requires scan of record.  Note the importance of schema information! Base address (B) L1L2 L3L4 F1F2 F3F4 Address = B+L1+L2

21 Record Header L1L2 L3L4 F1F2 F3F4 To schema length timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist header

22 Variable Length Records L1L2 L3L4 F1F2 F3F4 Other header information length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end) header

23 Records With Referencing Fields L1L2 L3 F1F2 F3 Other header information length header E.g. to represent one-many or many-many relationships

24 Storing Records in Blocks  Blocks have fixed size (typically 4k) R1R2R3 BLOCK R4

25 Spanning Records Across Blocks  When records are very large  Or even medium size: saves space in blocks block header block header R1R2 R3

26 BLOB  Binary large objects  Supported by modern database systems  E.g. images, sounds, etc.  Storage: attempt to cluster blocks together

27 Modifications: Insertion  File is unsorted add it to the end  File is sorted: Is there space in the right block ?  Yes: we are lucky, store it there Is there space in a neighboring block ?  Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

28 Overflow Blocks  After a while the file starts being dominated by overflow blocks: time to reorganize Block n-1 Block n Block n+1 Overflow

29 Modifications: Deletions  Free space in block, shift records  Maybe be able to eliminate an overflow block

30 Modifications: Updates  If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

31 Physical Addresses  Each block and each record have a physical address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block  sometimes this is in the block ’ s header

32 Logical Addresses  Logical address: a string of bytes (10- 16)  More flexible: can blocks/records around  But need translation table: Logical address Physical address L1P1 L2P2 L3P3

33 Main Memory Address  When the block is read in main memory, it receives a main memory address  Buffer manager has another translation table Memory address Logical address M1L1 M2L2 M3L3

34 Designing Physical/Internal Model  Overview  terminology  Access methods

35 Physical Design  Internal Model/Physical Model Operating System Access Methods Data Base User request DBMS Internal Model Access Methods External Model Interface 1 Interface 3 Interface 2

36 Physical Design  Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query  Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.  Interface 3: The internal model access methods and OS access methods access the physical records of the database.

37 Physical File Design  A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records  Pointers - a field of data that can be used to locate a related field or record of data  Access Methods - An operating system algorithm for storing and locating data in secondary storage  Pages - The amount of data read or written in one disk input or output operation

38 Internal Model Access Methods  Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed  Differences in Access Efficiency Storage Efficiency

39 Physical Sequential  Key values of the physical records are in logical sequence  Main use is for “dump” and “restore”  Access method may be used for storage as well as retrieval  Storage Efficiency is near 100%  Access Efficiency is poor (unless fixed size physical records)

40 Indexed Sequential  Key values of the physical records are in logical sequence  Access method may be used for storage and retrieval  Index of key values is maintained with entries for the highest key values per block(s)  Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow  Storage Efficiency depends on size of index and volatility of database

41 Index Sequential Data File Block 1 Block 2 Block 3 Address Block Number 123…123… Actual Value Dumpling Harty Texaci... Adams Becker Dumpling Getta Harty Mobile Sunoci Texaci

42 Indexed Sequential: Two Levels Address 789…789… Key Value 385 678 805 001 003. 150 705 710. 785 251. 385 455 480. 536 605 610. 678 791. 805 Address 1212 Key Value 150 385 Address 3434 Key Value 536 678 Address 5656 Key Value 785 805

43 Indexed Random  Key values of the physical records are not necessarily in logical sequence  Index may be stored and accessed with Indexed Sequential Access Method  Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.  Access method may be used for storage and retrieval

44 Indexed Random Address Block Number 2132121321 Actual Value Adams Becker Dumpling Getta Harty Becker Harty Adams Getta Dumpling

45 Btree F | | P | | Z | R | | S | | Z |H | | L | | P |B | | D | | F | Devils Aces Boilers Cars Minors Panthers Seminoles Flyers Hawkeyes Hoosiers

46 Inverted  Key values of the physical records are not necessarily in logical sequence  Access Method is better used for retrieval  An index for every field to be inverted may be built  Access efficiency depends on number of database records, levels of index, and storage allocated for index

47 Inverted Address Block Number 123…123… Actual Value CH 145 CS 201 CS 623 PH 345 CH 145 101, 103,104 CS 201 102 CS 623 105, 106 Adams Becker Dumpling Getta Harty Mobile Student name Course Number CH145 cs201 ch145 cs623

48 Direct  Key values of the physical records are not necessarily in logical sequence  There is a one-to-one correspondence between a record key and the physical address of the record  May be used for storage and retrieval  Access efficiency always 1  Storage efficiency depends on density of keys  No duplicate keys permitted

49 Hashing  Key values of the physical records are not necessarily in logical sequence  Many key values may share the same physical address (block)  May be used for storage and retrieval  Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated  Storage efficiency depends on distibution of keys and algorithm used for key transformation

50 Comparative Access Methods Indexed No wasted space for data but extra space for index Moderately Fast Very fast with multiple indexes OK if dynamic OK if dynamic Easy but requires Maintenance of indexes Factor Storage space Sequential retrieval on primary key Random Retr. Multiple Key Retr. Deleting records Adding records Updating records Sequential No wasted space Very fast Impractical Possible but needs a full scan can create wasted space requires rewriting file usually requires rewriting file Hashed more space needed for addition and deletion of records after initial load Impractical Very fast Not possible very easy


Download ppt "Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual."

Similar presentations


Ads by Google