FILE & SYSTEM STRUCTURE (CHAPTER 11)

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
CpSc 3220 File and Database Processing Lecture 1 Course Overview File Storage Basics.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Recap of Feb 25: Physical Storage Media Issues are speed, cost, reliability Media types: –Primary storage (volatile): Cache, Main Memory –Secondary or.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Classification of Physical storage Media Speed with which data can be accessed Cost per unit of data Reliability  data loss on power failure or system.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Storing Data: Disks & Files
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Lecture 8 of Advanced Databases Storage and File Structure Instructor: Mr.Ahmed Al Astal.
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Chapter 10 Storage & File Structure. n Overview of Physical Storage Media n Magnetic Disks n Tertiary Storage n Storage Access n File Organization n Organization.
SCUHolliday13–1 Schedule Today: u Data Storage and Indexing u Read Sections Next u Query Evaluation.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Magnetic Hard Disk Mechanism NOTE: Diagram is schematic, and simplifies the structure of.
Database System Concepts, 5th Ed. Bin Mu at Tongji University Chapter 11: Storage and File Structure.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
1 Storage and File Structure. 2 Classification of Physical Storage Media Speed with which data can be accessed Cost per unit of data Reliability  data.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Chapter 5 Record Storage and Primary File Organizations
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Data Storage and Querying in Various Storage Devices.
File Organization Record Storage and Primary File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Storage Overview of Physical Storage Media Magnetic Disks RAID
Chapter 11: Storage and File Structure
Chapter 11: Storage and File Structure
Module 11: File Structure
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Database Management Systems (CS 564)
CPSC-608 Database Systems
Oracle SQL*Loader
Performance Measures of Disks
9/12/2018.
Introduction to Database
Lecture 11: DMBS Internals
Lecture 10: Buffer Manager and File Organization
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Chapter 11: File System Implementation
Chapter 10: Storage and File Structure
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Disk storage Index structures for files
CPSC-310 Database Systems
Module 10: Physical Storage Systems
Storage and File Structure
Secondary Storage Management Brian Bershad
File Storage and Indexing
Basics Storing Data on Disks and Files
Introduction to Database
Secondary Storage Management Hank Levy
Networks & I/O Devices.
Presentation transcript:

FILE & SYSTEM STRUCTURE (CHAPTER 11) 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte (KB), e.g., your textbook. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte (KB). 1024 KB is one Megabyte (MB), a high resolution photograph. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte (MB). 1024 MB is one Gigabyte (GB), e.g., a DVD quality movie. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte (TB), all text in the library of congress. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte (TB). 1024 TB is one Petabyte (PB), entire multimedia collection at LoC. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte. 1024 TB is one Petabyte (PB). 1024 PB is one Exabyte (XB), record all phone conversations in a year. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte. 1024 TB is one Petabyte. 1024 PB is one Exabyte. 1024 XB is one Zetabyte (ZB), all uncompressed medical data. 0101111110011010101010110000000000…..00000 0101111110011010101010110000000000…..00000 2018/12/8

HOW MUCH DATA IS THERE? Approximately 5000 films are made each year (worldwide) Two hour display time at 240 mbps; 900 TB Approximately 52 billion photographs are taken each year @ 10 KB per photograph, 520 PB Library of congress: 20 million books @ 1MB; 20 TB 15 million photographs @ 1 MB; 13 TB 4 million maps @ 100 MB; 400 TB 500,000 movies @ 10 GB; 5 PB 3.5 million sound recordings at library of congress @ 1 audio per CD; 2 PB 2018/12/8

Physical Storage Media A system consists of several forms of storage: Cache – fastest and most costly form of storage; volatile; managed by the computer system hardware. Main memory: fast access (10ns to 100ns ; 1 nanosecond = 10–9 seconds) generally too small (or too expensive) to store the entire database capacities of up to a few Gigabytes widely used currently Capacities have gone up and per-byte costs have decreased steadily and rapidly (roughly factor of 2 every 2 to 3 years) Volatile — contents of main memory are usually lost if a power failure or system crash occurs. 2018/12/8

Physical Storage Media (Cont.) Magnetic-disk Data is stored on spinning disk, and read/written magnetically Primary medium for the long-term storage of data; typically stores entire database. Data must be moved from disk to main memory for access, and written back for storage Much slower access than main memory (more on this later) direct-access – possible to read data on disk in any order, unlike magnetic tape Capacities range up to roughly ? GB currently Much larger capacity and cost/byte than main memory Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years) Survives power failures and system crashes disk failure can destroy data, but is very rare 2018/12/8

Magnetic Hard Disk Mechanism NOTE: Diagram is schematic, and simplifies the structure of actual disk drives 2018/12/8

Magnetic Disks Read-write head Positioned very close to the platter surface (almost touching it) Reads or writes magnetically encoded information. Surface of platter divided into circular tracks Over 16,000 tracks per platter on typical hard disks Each track is divided into sectors. A sector is the smallest unit of data that can be read or written. Sector size typically 512 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks) To read/write a sector disk arm swings to position head on right track platter spins continually; data is read/written as sector passes under head Head-disk assemblies multiple disk platters on a single spindle (typically 2 to 4) one head per platter, mounted on a common arm. Cylinder i consists of ith track of all the platters 2018/12/8

Magnetic Disks (Cont.) Earlier generation disks were susceptible to head-crashes Surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted Disk controller – interfaces between the computer system and the disk drive hardware. accepts high-level commands to read or write a sector initiates actions such as moving the disk arm to the right track and actually reading or writing the data Computes and attaches checksums to each sector to verify that data is read back correctly If data is corrupted, with very high probability stored checksum won’t match recomputed checksum Ensures successful writing by reading back sector after writing it Performs remapping of bad sectors 2018/12/8

Disk Subsystem Multiple disks connected to a computer system through a controller Controllers functionality (checksum, bad sector remapping) often carried out by individual disks; reduces load on controller Disk interface standards families ATA (AT adaptor) range of standards SCSI (Small Computer System Interconnect) range of standards Several variants of each standard (different speeds and capabilities) 2018/12/8

Performance Measures of Disks Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track. Average seek time is 1/2 the worst case seek time. Would be 1/3 if all tracks had the same number of sectors, and we ignore the time to start and stop arm movement 4 to 10 milliseconds on typical disks Rotational latency – time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency. 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.) Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 4 to 8 MB per second is typical Multiple disks may share a controller, so rate that controller can handle is also important E.g. ATA-5: 66 MB/second, SCSI-3: 40 MB/s Fiber Channel: 256 MB/s 2018/12/8

Optimization of Disk-Block Access Block – a contiguous sequence of sectors from a single track data is transferred between disk and main memory in blocks sizes range from 512 bytes to several kilobytes Smaller blocks: more transfers from disk Larger blocks: more space wasted due to partially filled blocks Typical block sizes today range from 4 to 16 kilobytes Disk-arm-scheduling algorithms order pending accesses to tracks so that disk arm movement is minimized elevator algorithm : move disk arm in one direction (from outer to inner tracks or vice versa), processing next request in that direction, till no more requests in that direction, then reverse direction and repeat 2018/12/8

Optimization of Disk Block Access (Cont.) File organization – optimize block access time by organizing the blocks to correspond to how data will be accessed E.g. Store related information on the same or nearby cylinders. Files may get fragmented over time E.g. if data is inserted to/deleted from the file Or free blocks on disk are scattered, and newly created file has its blocks scattered over the disk Sequential access to a fragmented file results in increased disk arm movement Some systems have utilities to defragment the file system, in order to speed up file access 2018/12/8

FILE & SYSTEM STRUCTURE (Cont…) A database system is organized as several layers of software: Query parser: translates a higher level query language to an internal representation Query optimizer: transforms the internal representation to an efficient execution paradigm Concurrency control and crash recovery: ensures consistency of data in the presence of multiple concurrent update operations and crash-recoveries. Index methods: efficient retrieval of records for fast retrieval and update operations Abstraction of multiple records on a disk page: implements the concept of multiple records on a disk page. 2018/12/8

BIG PICTURE SELECT SS# FROM emp WHERE sal > 50K DBMS 2018/12/8

Overall Organization SELECT SS# FROM emp WHERE sal > 50K Relational Algebra operators: , , , , , , , ,  2018/12/8

SS#(sal> 50K (emp)) Overall Organization SELECT SS# FROM emp WHERE sal > 50K Query Parser SS#(sal> 50K (emp)) Relational Algebra operators: , , , , , , , ,  2018/12/8

SS#(sal> 50K (emp)) becomes a query tree: Computer Screen  TMP File1  sal> 50K emp 2018/12/8

Overall Organization Query Parser Query Optimizer Query Interpretor Relational Algebra operators: , , , , , , , ,  Index structures Abstraction of records Buffer Pool Manager File System 2018/12/8

FILE & SYSTEM STRUCTURE (Cont…) Buffer manager maintains a portion of memory that is conceptualized as disk page frames. It maintains which disk pages are memory resident. It also implements a replacement policy in order to swap a page out in favor of another disk page that is being referenced. This happens because the number of memory page frames is significantly smaller than the number of disk pages. File manager provides the following services: create a file, delete a file, read a disk page into a specific memory address given the physical address of disk page on the secondary storage device, write a disk page from a memory address on to the appropriate physical disk address, insert a page into a file, modify a page, and delete a page from a file. 2018/12/8

FILE & SYSTEM STRUCTURE (Cont…) When a program requests a disk page (by specifying its address), the buffer manager takes the following steps: Check if the page is in the buffer. If it is then pass its address to the calling program. Otherwise, read the page from the disk into the buffer, possibly replacing some other page, and then pass its address to the calling program. Pinned blocks: Occasionally, the DBMS needs to specifically indicate that some blocks have to be kept in the buffer until released by unpinning them. These blocks are termed pinned. Forced writing of blocks to disks: To preserve the consistency of the database during crash-recovery, the DBMS might force the buffer manager to flush some blocks to disks. 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS Key issues in organizing a file into blocks and records: Formatting fields within a record. Formatting records within a block. Assigning records into blocks. 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…) Formatting fields within a record: Fixed length fields stored in a specific order: Address of attribute i = β + ∑ Lk Fixed length fields stored on an indexed heap Fields may be stored in an arbitrary manner There is exactly one pointer in the header for each field, whether it is present or not. The order of pointers is fixed and specifies the order of attributes for all records. β 32 bytes 4 bytes int Name SS# age salary i-1 k=1 Name SS# age salary 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…) Variable length fields delimited by special symbols Variable length fields delimited by length name SS# age salary 32 name SS# age salary 4 4 4 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…) Now that once the structure of a record is defined, it must get mapped to disk page. Consider fixed length records only. Fixed-length: store records continuously within the block. record i is located at Ri = β + (i-1)L β 1 2 3 L … n 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…) Disadvantage: Records may span multiple disk page Solution: don’t allow if results in disk fragmentation Insertion and deletion become complicated How do you utilize space that was unallocated? Page reorganization affects external pointers … 2018/12/8

PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…) Indexed Heap: Each page consists of an array of pointers, each pointer points to a record within the block. A record is located by providing its block number and index in the pointer array. This combination is called a TID and an RID. Insertion and deletion are easy, accomplished by manipulating the pointer array. The contents of a block may be reorganized without affecting external pointers pointing to records. RID does not change when records are moved around within a block. Header 2018/12/8