File Processing : Storage Media 2017, Spring Pusan National University Ki-Joune Li
Major Functions of Computer Computation Storage Communication Presentation
Storage of Data Major Challenges How to store and manage a large amount of data Example : more than 100 peta bytes for EOS Project How to represent sophisticated data
Modeling and Representation of Real World Example Building DB about Korean History Very complicated and Depending on viewpoint Database Course : 2017 Fall semester Real World Computer World
Managing Large Volume of Data Cost for Storage Media Not very important and negligible Processing Time Time is the most valuable resource Comparison between main memory and disk access time RAM (Random Access Memory) : several 10-9 sec SSD (Solid State Driver) : under 10-4 sec HDD (Hard Disk Driver): several 10-3 sec HDD is 106 times slower than RAM Difference between handling data in RAM and HDD Handling data in HDD: Same way that we handle data in RAM How to handle this gap between RAM and Disk Memory
Managing Large Volume of Data Management of Data Secure Management From hacking From any kinds of disasters Consistency of Data Example Failure during a flight reservation transaction Concurrent transaction
Goals of File Systems To provide with 1. efficient Data Structures for storing large and complex data 2. Access Methods for rapid search 3. Query Processing Methods 4. Robust Management of Transactions
Memory Hierarchy Large Data Volume Memory Hierarchy Not be stored in main memory But in secondary memory Memory Hierarchy Faster Cache Memory 8 M bytes (Core i7, L3 Cache) Cheaper Main Memory 16 G bytes Secondary Memory 1 T bytes Tertiary Memory 10 Peta bytes
SSD (Flash Memory) Solid State Driver Only Electronic Operations unlike HDD. Characteristics Aging Problem: only a limited number of write/erase cycles. (e.g. 1 M) Asymmetric Read/Write Speed: a byte (or word) can be read at a time Write: Erasing of memory has to be done to an entire bank of memory Reading is fast and a byte (or word) can be read at a time Writing is a little bit slower than reading Easing is slower NAND vs. NOR Flash Memory
Optical Storage Non-volatile Speed Juke-box systems CD, DVD Slower than HDD Juke-box systems Large numbers of removable disks, Few drives, and Mechanism for automatic loading/unloading of disks For storing large volumes of data
Tape Non-Volatile and Large Volume (e.g. 15 TB per Cartridge) Primarily Used for backup Sequential access: much slower than disk But data transfer rate: up to 750 MB for some tape driver
Data Access with Secondary Memory Get Data Hit Ratio rh = nh / na Access Request Get Data How to increase hit ratio ? If in main memory Main Memory Load on main memory Access to Disk If not in main memory Disk
Why Hit Ratio is so important ? Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,100); 1000 disk accesses ? when rh = 0 when rh = 1 1000 * 10-2 sec = 10 sec 1000 * 10-8 sec = 10-5 sec
Physical Structure of Disk 200~400 sectors 512 bytes 2 * nDF
Disk Access Time Disk Access Time t = tS + tR + tT , where tS : Seek Time Time to reposition the head over the correct track Average seek time is 1/2 the worst case seek time 4 to 10 milliseconds on typical disks tR : Rotational Latency Time to reposition the head over the correct sector Average rotational latency : ½ r (to find index point) + ½ r = r In case of 15000 rpm : r =1*60sec/15000 = 4 msec tT : Transfer Time Time to transfer data from disk to main memory via channel Proportional to the number of sectors to read Real transfer time is negligible
Block-Oriented Disk Access Example for(int i=0;i<1000;i++) Nbytes=read(fd,buf,10); 10 bytes 1000 times 100 times Buffer in main memory 1024 bytes Number of Disk Accesses 10 times 1 block (e.g. 1024 bytes)
Disk Block Unit of Disk Access Block Size Why not large block ? Normally multiple of sectors 1K, 4K, 16K or 64K bytes depending on configuration Why not large block ? Limited by the size of available main memory Too large : unnecessary accesses of sectors e.g. only 100 bytes, when block size is given as 64K 1 block : 128 sectors (about ½ track, ½ rotation, 2 msec) Too wasteful