Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache.

Storage & File Structure Meghan Nagpal

Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache storage in database, but effects must be watched when designing queries & algorithms  Main memory: Storage media for data to be operated on; too small for entire database; volatile  Flash memory: Non-volatile; camera, cell phones, USBs

Storage Media  Magnetic-disk storage: Long-term online storage; entire database; system must move data from disk to main memory; non-volatile; disk storage devices can fail and destroy data  Optical storage: CDs, DVDs, Blue-Rays; ROM: Read-only; WORM: Write once, read many; RW: written many times; direct access to specified data  Tape storage: Backup & archival data; slower; sequential access to data

Hierarchy  Higher levels are more expensive, but faster Primary storage Secondary, online storage Tertiary, offline storage Volatile Non-Volatile

Magnetic Disk & Flash Storage Information recorded on platter surfaces 1 to 5 per disk Disk surface divided into tracks 50K to 100K per platter Tracks divided into sectors Unit of info read from or written to disk 512 bytes 500 to 1000 per inner track 1000 to 2000 per outer track Stores data magnetically, reverse direction of magnetic material Moves across platter to access different tracks Mounts platter Tracks mounted on disk arm Holds arms Lines up tracks When head on one platter on ith track, heads on all platters on the ith track ith tracks on all platters on ith cylinder

Disk Controllers  Disk controller: Interfaces between computer system and actual hardware and disk driver  Checksums: Attached by disk controllers to each sector that is written, computed from data to the sector. This checksum is compared to stored checksum to ensure accuracy.  In damaged sectors, disk controller can map sector to a different physical location  Can be connected to a high speed network. Remote access allows disks to be shared by multiple computers, run in parallel.

Performance measures of Disks  Access time: Time between read or write request and data transfer. Arm must move to position of correct track and wait for sector.  Average seek time: Average time to reposition the arm to track. Typically 4 to 10 ms.  Rotational latency: the time spent waiting for sector once head is on track. Typically 4 to 11.1 ms per rotation.  Data transfer rate: Average time data can be retrieved from or stored from to the disk. 25 to 100 MB per second.  Mean time to failure: Average time we can expect the risk to run continuously without failure. Most disks have a life expectancy of 5 years.

Optimization of Disk-Block-Access  Block: Fixed number of continuous sectors. Requests are made to disk address referred to as block number  Techniques to improve speed of access to blocks:  Buffering: Read blocks are temporarily stored in buffer for future requests. By OS or DB system.  Read-ahead: Consecutive blocks on same track are read into the memory buffer when disk block is accessed. Good for sequential access systems.  Scheduling: Algorithms to efficiently access tracks in same cylinder. Elevator algorithm is when arm moves like an elevator and services each track on the way where there is an access request. Changes direction and searches again.  File organization: Organize data blocks in a way which we would expect I to be accessed. If we want to access data sequentially, we would keep all data blocks of a file sequentially.  Non-volatile write buffers: Storing database updates on disk in event of system crash. Speed is crucial. Non-volatile RAM (NVRAM) to speed up disk writes when system requests block be written to a disk. Controller writes to disk when there are no more requests or when NVRAM buffer is full.  Log disk: Similar to NVRAM, all access is sequential and several consecutive blocks are written at once. Write to disk can happen afterwards so that DB system doesn’t have to wait for write to be complete. Log disk ca reorder write to minimize arm movement. File systems which do this are called journaling file systems and can keep data and the long on the same disk. For lower performance.

Flash Storage  NOR Flash: Access to individual words of memory. Read time comparable to main memory  NAND Flash: Reads entire page of data, 512 to 4096 bytes. Similar to sectors in a disk. Cheaper, more commonly used.  Faster access to random memory than magnetic disk  1 – 2 μs vs. 5 – 10 ms to retrieve data  Lower transfer rate at 20 MB/s  Memory cannot be overwritten; erased then re-written. Slow, 1 – 2 ms. Limit of 100 000 to 1 000 000 times.

Flash Storage  Logical page numbers can be mapped to an already erased physical page when updated and original location can be erased later. Physical page stores logical address and original page is marked as deleted when logical address is re-mapped. This makes up for slow erase speed.  Logical to physical page mapping is replicated in a translation table for quick access  Even distribution of erase operations is called wear levelling. Data updated rarely is called “cold data”, “hot data” updated regularly  Flash translation layer: Carries above actions. Above this layer, flash storage is identical to magnetic disk storage.  Hybrid disk drives combine magnetic storage with small amounts of flash memory, which is used as a cache for frequently accessed data

RAID – Improvement of Reliability via Redundancy  Large number of disks needed to store large amounts of data. Opportunity to improve rate at which data is read or written if operating in parallel.  Redundant arrays of independent disks (RAID) to improve performance and reliability  Redundancy: store extra information that can be used if one disk fails. Mirroring (duplicating a disk) is the simplest example. Mean time to data loss 55 to 110 years. However, failures aren’t necessarily independent (power failures, natural disasters, etc.)

RAID – Improvement in Performance through Parallelism  Bit-level striping: Splitting bits across multiple disks. Generally number of disks is a factor of 4 or 8. Every disk participates in every access so number of accesses per second are same as on a single disk, but each disk can be read 4 or 8 times faster than single disk.  Ex. In an array of 8 disks, bit i of each byte is written to each byte of disk i.  Block-level striping: Blocks stripped across multiple disks. Given n disks, logical block i is stored to disk (i mod n) + 1, stored in [i/n]th physical block of the disk.  In array of 8 disks, logical block 11 stored in physical block 1 of disk 4  High data transfer rate as n blocks are being fetched at a time from n disks  Goals of parallelism: 1. Load balance multiple small accesses (blocks) 2. Parallelize large accesses to reduce response time

RAID Levels  Striping isn’t always reliable. Alternative schemes to provide redundancy at lower cost Level 0: Disk striping at block level, no redundancy Level 1: Disk mirroring with block striping. This is an array of four disks, where C indicated a 2 nd copy of data Level 2: Extra bits from each byte stored in further disks and can reconstruct damaged data. This is an array of four disks and P indicates error correcting bits

RAID Levels Level 3: Similar to level 2, but uses just one error correcting bit on one extra disk. Reduces storage overhead. Level 4: Similar to level 3, except parity data is stored as a block on an extra disk Level 5: Parity and data blocks distributed among disks. Parity blocks not stored in same disk as data, as damaged data would be unrecoverable Level 6: Like Level 5, but stores extra redundant information though error correcting codes. 2 bits of redundant data for every 4 bits. Tolerates 2 disk failures

RAID  Choice of RAID levels depend on costs, efficiency, failure performance, and rebuild performance  Consider level of hardware as well. Some implementations use software. Otherwise there are special hardware implementations using nonvolatile RAM to complete incomplete writes in case of power failure.  Scrubbing is used when idle; damaged data is recovered.  Hot swapping: faulty disks can be removed and replaced by new ones when power is on, keeping an extra disk available in the array. Good for 24/7 systems.

Optical disks  CDs store software, multimedia data, electronically published info. 640 to 700 MB. Cheap to mass produce. Data transfers around 3 to 6MB/sec  DVDs are larger, up to 17 GB on DVD-18 formats (2 sides, 2 recording layers). Blue rays can store 27 to 54 GB. Data transfer 8 to 20 MB/sec.  Seek times longer than magnetic disk drives (100 ms) due to heavier assembly head. Slower rotations of 3000 rotations per minute.

Magnetic Tapes  Permanent  Large amounts of data. Some formats could store up 300 GB.  Sequential data  Slow. Moving to desired spot takes seconds or minutes. Data transfer of few to tens of MB/s  Tapes are cheap, but tape drives are more expensive to disk drives so not used as often as disks.

File Organization  File is sequence of records, records mapped onto disk blocks  Block sizes typically 4 to 8 KB  No record is larger than disk block. Each record is entirely contained in a single block.

Fixed Length Record Example: type instructor = record ID varchar (5); name varchar(20); dept name varchar (20); salary numeric (8,2); End  Must allocate 53 bytes. Allocate as many records to a block as would fit entirely in the block. Leave remaining bytes unused.  File header added to the beginning of file containing file information. Address of first deleted record stored in file headers and remaining records are stored in linked list known as free list. Insertion of new records in the file pointed by header and header then points to next available record. If no space available, then new record added to end of file.

Fixed Length Record Free List

Variable Length Record  Record has initial part which is fixed length attributes and variable length attributes following.  In variable length attribute, record noted by offset & length. Offset denotes where data begins and length is length in bytes of variable-sized attributes. Indicates which attributes have null values

Variable Length Records – Slotted Page Structure Header in the beginning of block containing number of record entries in header, end of free space in block, and array whose entries contain location and size of each record Records stored and inserted continguously, starting at end of block Free space contiguous between final entry and first record Other records moved upon deletion to occupy free space

Sequential File Organization  Records sorted in order aby some search key (attribute or set of attributes not necessarily primary key or superkey)  Records stored physically in search key order  Deletion managed by pointer chains. With insertion, store record in deleted block if available, if not, place in overflow and place new record in overflow block and adjust pointers. If too much overflow, reorganization must be done when system is low, though costly and time consuming.

Multitable Clustering File Organization  Records of one relation stored in a given block, but sometimes it’s advantageous to store records of more than one relation on block  Multitable clustering file organization stores records of two or more relation in one block  Allows us to read records that satisfy a join condition in one block read, so faster

Data-Dictionary Storage  Maintain metadata or data about the data  Stored in data dictionary or system catalogue  Names of relations, attribute names of relations, domains & lengths of attirbutes, names & definitions of the views, integrity constraints  User data stored in system  Names, authorization, and account information about users, authentication information  Statistical and descriptive data of relations  Number of tuples, method of storage (clustered, non-clustered)  Storage organization and location of relation

Data Dictionary Storage

Buffer Manager  Buffer: Part of main memory available for storage of copies of disk blocks. Buffer Manager allocates buffer space.  Buffer replacement strategy: Least recently used block is removed from scheme when no room left in buffer  Pinned blocks: Restrict times when block is written to disk in instances of crashes. Block not allowed to be written is referred to as pinned block.  Forced output of blocks: In situations when block is needed to be written to disk, it is forced out even if buffer space is not needed

Final Remarks  Hierarchy of storage starts from cache to tertiary storage devices  Redundancy can be used for improving reliability, parallel processing improves efficiency  Consider the system organization structure when designing database  Data dictionaries are useful for metadata

Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache.

Similar presentations

Presentation on theme: "Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache.

Similar presentations

Presentation on theme: "Storage & File Structure Meghan Nagpal. Storage Media  Cache: Small, fastest form of storage; managed by the hardware; no effects about managing cache."— Presentation transcript:

Similar presentations

About project

Feedback