Presentation is loading. Please wait.

Presentation is loading. Please wait.

FILE & SYSTEM STRUCTURE (CHAPTER 11)

Similar presentations


Presentation on theme: "FILE & SYSTEM STRUCTURE (CHAPTER 11)"— Presentation transcript:

1 FILE & SYSTEM STRUCTURE (CHAPTER 11)
2018/12/8

2 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 2018/12/8

3 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 2018/12/8

4 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 2018/12/8

5 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte (KB), e.g., your textbook. 2018/12/8

6 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte (KB). 1024 KB is one Megabyte (MB), a high resolution photograph. 2018/12/8

7 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte (MB). 1024 MB is one Gigabyte (GB), e.g., a DVD quality movie. 2018/12/8

8 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte (TB), all text in the library of congress. 2018/12/8

9 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte (TB). 1024 TB is one Petabyte (PB), entire multimedia collection at LoC. 2018/12/8

10 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte. 1024 TB is one Petabyte (PB). 1024 PB is one Exabyte (XB), record all phone conversations in a year. 2018/12/8

11 TERMINOLOGY Computers represent data as a sequence of zero and ones, termed bits: A byte is eight contiguous bits: 1024 bytes is one Kilobyte. 1024 KB is one Megabyte. 1024 MB is one Gigabyte. 1024 GB is one Terabyte. 1024 TB is one Petabyte. 1024 PB is one Exabyte. 1024 XB is one Zetabyte (ZB), all uncompressed medical data. 2018/12/8

12 HOW MUCH DATA IS THERE? Approximately 5000 films are made each year (worldwide) Two hour display time at 240 mbps; 900 TB Approximately 52 billion photographs are taken each year @ 10 KB per photograph, 520 PB Library of congress: 20 million 1MB; 20 TB 15 million 1 MB; 13 TB 4 million 100 MB; 400 TB 500, GB; 5 PB 3.5 million sound recordings at library of 1 audio per CD; 2 PB 2018/12/8

13 Physical Storage Media
A system consists of several forms of storage: Cache – fastest and most costly form of storage; volatile; managed by the computer system hardware. Main memory: fast access (10ns to 100ns ; 1 nanosecond = 10–9 seconds) generally too small (or too expensive) to store the entire database capacities of up to a few Gigabytes widely used currently Capacities have gone up and per-byte costs have decreased steadily and rapidly (roughly factor of 2 every 2 to 3 years) Volatile — contents of main memory are usually lost if a power failure or system crash occurs. 2018/12/8

14 Physical Storage Media (Cont.)
Magnetic-disk Data is stored on spinning disk, and read/written magnetically Primary medium for the long-term storage of data; typically stores entire database. Data must be moved from disk to main memory for access, and written back for storage Much slower access than main memory (more on this later) direct-access – possible to read data on disk in any order, unlike magnetic tape Capacities range up to roughly ? GB currently Much larger capacity and cost/byte than main memory Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years) Survives power failures and system crashes disk failure can destroy data, but is very rare 2018/12/8

15 Magnetic Hard Disk Mechanism
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives 2018/12/8

16 Magnetic Disks Read-write head
Positioned very close to the platter surface (almost touching it) Reads or writes magnetically encoded information. Surface of platter divided into circular tracks Over 16,000 tracks per platter on typical hard disks Each track is divided into sectors. A sector is the smallest unit of data that can be read or written. Sector size typically 512 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks) To read/write a sector disk arm swings to position head on right track platter spins continually; data is read/written as sector passes under head Head-disk assemblies multiple disk platters on a single spindle (typically 2 to 4) one head per platter, mounted on a common arm. Cylinder i consists of ith track of all the platters 2018/12/8

17 Magnetic Disks (Cont.) Earlier generation disks were susceptible to head-crashes Surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted Disk controller – interfaces between the computer system and the disk drive hardware. accepts high-level commands to read or write a sector initiates actions such as moving the disk arm to the right track and actually reading or writing the data Computes and attaches checksums to each sector to verify that data is read back correctly If data is corrupted, with very high probability stored checksum won’t match recomputed checksum Ensures successful writing by reading back sector after writing it Performs remapping of bad sectors 2018/12/8

18 Disk Subsystem Multiple disks connected to a computer system through a controller Controllers functionality (checksum, bad sector remapping) often carried out by individual disks; reduces load on controller Disk interface standards families ATA (AT adaptor) range of standards SCSI (Small Computer System Interconnect) range of standards Several variants of each standard (different speeds and capabilities) 2018/12/8

19 Performance Measures of Disks
Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track. Average seek time is 1/2 the worst case seek time. Would be 1/3 if all tracks had the same number of sectors, and we ignore the time to start and stop arm movement 4 to 10 milliseconds on typical disks Rotational latency – time it takes for the sector to be accessed to appear under the head. Average latency is 1/2 of the worst case latency. 4 to 11 milliseconds on typical disks (5400 to r.p.m.) Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 4 to 8 MB per second is typical Multiple disks may share a controller, so rate that controller can handle is also important E.g. ATA-5: 66 MB/second, SCSI-3: 40 MB/s Fiber Channel: 256 MB/s 2018/12/8

20 Optimization of Disk-Block Access
Block – a contiguous sequence of sectors from a single track data is transferred between disk and main memory in blocks sizes range from 512 bytes to several kilobytes Smaller blocks: more transfers from disk Larger blocks: more space wasted due to partially filled blocks Typical block sizes today range from 4 to 16 kilobytes Disk-arm-scheduling algorithms order pending accesses to tracks so that disk arm movement is minimized elevator algorithm : move disk arm in one direction (from outer to inner tracks or vice versa), processing next request in that direction, till no more requests in that direction, then reverse direction and repeat 2018/12/8

21 Optimization of Disk Block Access (Cont.)
File organization – optimize block access time by organizing the blocks to correspond to how data will be accessed E.g. Store related information on the same or nearby cylinders. Files may get fragmented over time E.g. if data is inserted to/deleted from the file Or free blocks on disk are scattered, and newly created file has its blocks scattered over the disk Sequential access to a fragmented file results in increased disk arm movement Some systems have utilities to defragment the file system, in order to speed up file access 2018/12/8

22 FILE & SYSTEM STRUCTURE (Cont…)
A database system is organized as several layers of software: Query parser: translates a higher level query language to an internal representation Query optimizer: transforms the internal representation to an efficient execution paradigm Concurrency control and crash recovery: ensures consistency of data in the presence of multiple concurrent update operations and crash-recoveries. Index methods: efficient retrieval of records for fast retrieval and update operations Abstraction of multiple records on a disk page: implements the concept of multiple records on a disk page. 2018/12/8

23 BIG PICTURE SELECT SS# FROM emp WHERE sal > 50K DBMS 2018/12/8

24 Overall Organization SELECT SS# FROM emp WHERE sal > 50K Relational Algebra operators: , , , , , , , ,  2018/12/8

25 SS#(sal> 50K (emp))
Overall Organization SELECT SS# FROM emp WHERE sal > 50K Query Parser SS#(sal> 50K (emp)) Relational Algebra operators: , , , , , , , ,  2018/12/8

26 SS#(sal> 50K (emp)) becomes a query tree:
Computer Screen TMP File1  sal> 50K emp 2018/12/8

27 Overall Organization Query Parser Query Optimizer Query Interpretor
Relational Algebra operators: , , , , , , , ,  Index structures Abstraction of records Buffer Pool Manager File System 2018/12/8

28 FILE & SYSTEM STRUCTURE (Cont…)
Buffer manager maintains a portion of memory that is conceptualized as disk page frames. It maintains which disk pages are memory resident. It also implements a replacement policy in order to swap a page out in favor of another disk page that is being referenced. This happens because the number of memory page frames is significantly smaller than the number of disk pages. File manager provides the following services: create a file, delete a file, read a disk page into a specific memory address given the physical address of disk page on the secondary storage device, write a disk page from a memory address on to the appropriate physical disk address, insert a page into a file, modify a page, and delete a page from a file. 2018/12/8

29 FILE & SYSTEM STRUCTURE (Cont…)
When a program requests a disk page (by specifying its address), the buffer manager takes the following steps: Check if the page is in the buffer. If it is then pass its address to the calling program. Otherwise, read the page from the disk into the buffer, possibly replacing some other page, and then pass its address to the calling program. Pinned blocks: Occasionally, the DBMS needs to specifically indicate that some blocks have to be kept in the buffer until released by unpinning them. These blocks are termed pinned. Forced writing of blocks to disks: To preserve the consistency of the database during crash-recovery, the DBMS might force the buffer manager to flush some blocks to disks. 2018/12/8

30 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS
Key issues in organizing a file into blocks and records: Formatting fields within a record. Formatting records within a block. Assigning records into blocks. 2018/12/8

31 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…)
Formatting fields within a record: Fixed length fields stored in a specific order: Address of attribute i = β + ∑ Lk Fixed length fields stored on an indexed heap Fields may be stored in an arbitrary manner There is exactly one pointer in the header for each field, whether it is present or not. The order of pointers is fixed and specifies the order of attributes for all records. β 32 bytes 4 bytes int Name SS# age salary i-1 k=1 Name SS# age salary 2018/12/8

32 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…)
Variable length fields delimited by special symbols Variable length fields delimited by length name SS# age salary 32 name SS# age salary 4 4 4 2018/12/8

33 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…)
Now that once the structure of a record is defined, it must get mapped to disk page. Consider fixed length records only. Fixed-length: store records continuously within the block. record i is located at Ri = β + (i-1)L β L n 2018/12/8

34 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…)
Disadvantage: Records may span multiple disk page Solution: don’t allow if results in disk fragmentation Insertion and deletion become complicated How do you utilize space that was unallocated? Page reorganization affects external pointers 2018/12/8

35 PHYSICAL ORGANIZATION OF RECORDS AND BLOCKS (Cont…)
Indexed Heap: Each page consists of an array of pointers, each pointer points to a record within the block. A record is located by providing its block number and index in the pointer array. This combination is called a TID and an RID. Insertion and deletion are easy, accomplished by manipulating the pointer array. The contents of a block may be reorganized without affecting external pointers pointing to records. RID does not change when records are moved around within a block. Header 2018/12/8


Download ppt "FILE & SYSTEM STRUCTURE (CHAPTER 11)"

Similar presentations


Ads by Google