Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Part IV: Memory Management
Chapter 6: Memory Management
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Chapter 4 : File Systems What is a file system?
Understanding Operating Systems Fifth Edition
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Organizing Files for Performance.
Comp 335 File Structures Reclaiming and Reusing File Space Techniques for File Maintenance.
February 1 & 31 Files Organizing Files for Performance.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Fixed/Variable Partitioning
Allocating Memory.
Chap6. Organizing Files for Performance. Chapter Objectives(1)  Look at several approaches to data compression  Look at storage compaction as a simple.
BTrees & Bitmap Indexes
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
File System Implementation
12.5 Record Modifications Jayalakshmi Jagadeesan Id 106.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
CS 104 Introduction to Computer Science and Graphics Problems
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
1 Operating Systems Chapter 7-File-System File Concept Access Methods Directory Structure Protection File-System Structure Allocation Methods Free-Space.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Organizing files for performance Chapter Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:
Programming Logic and Design Fourth Edition, Comprehensive
Chapter 7 Indexing Objectives: To get familiar with: Indexing
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1. Memory Manager 2 Memory Management In an environment that supports dynamic memory allocation, the memory manager must keep a record of the usage of.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
File Processing - Indexing MVNC1 Indexing Jim Skon.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Chapter 2 Memory Management: Early Systems Understanding Operating Systems, Fourth Edition.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
CS4432: Database Systems II Record Representation 1.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
1 Address Translation Memory Allocation –Linked lists –Bit maps Options for managing memory –Base and Bound –Segmentation –Paging Paged page tables Inverted.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Copyright ©: Nahrstedt, Angrave, Abdelzaher, Caccamo 1 Memory management & paging.
CS6502 Operating Systems - Dr. J. Garrido Memory Management – Part 1 Class Will Start Momentarily… Lecture 8b CS6502 Operating Systems Dr. Jose M. Garrido.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
Storage and File Organization
Module 11: File Structure
CPSC 231 Organizing Files for Performance (D.H.)
Lecture 7 Data Compression
9/12/2018.
Main Memory Management
Chapter 11: File System Implementation
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
DATABASE IMPLEMENTATION ISSUES
Chap6. Organizing Files for Performance
Database Implementation Issues
Database Implementation Issues
CSE 542: Operating Systems
Presentation transcript:

Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter 6 Organizing File for Performance

Folk/Zoellick/Riccardi, File Structures 2 Outline Data compression Reclaiming space in files Record deletion Dynamic space reclaiming for fixed-length record Dynamic space reclaiming for variable-length record Storage fragmentation Internal sorting and binary search Keysorting

Folk/Zoellick/Riccardi, File Structures 3 Data Compression Data compression: to organize files into smaller size. –Use less storage, –Can be transmitted faster, –Can be processed faster sequentially. Encoding with a different notation –The “State” field in the address file requires two bytes. However, 50 states can be encoded using 6 bits. 50% space saving for each occurrence of the state field. –The compact notation is a redundancy reduction technique. –Costs: »The file is not readable by humans. »The overhead of encoding and decoding operations.

Folk/Zoellick/Riccardi, File Structures 4 Data Compression (cont’d) Suppressing repeating sequences –Suitable for sparse arrays or images with regions of same colors. –Run-length encoding: choose an unused byte value to indicate that a run- length code following that byte. –Encoding algorithm: »Read through the data (pixels or values) that make up the image or data content, copying the data values to the file in sequence, except where the same data value occurs more the once in the succession, »Where the same value occurs more than once in succession, substitute the following three entries:  The special run-length code indicator,  The data value that is repeated, and  The number of times that the value is repeated. »Example, The encoded sequence is: ff ff ff ff

Folk/Zoellick/Riccardi, File Structures 5 Data Compression (cont’d) Variable length encoding –Letters with high frequency are encoded using shorter symbols. –Letters with low frequency are encoded using longer symbols. –Huffman code (for a set of seven letters): »four bits per letter (minimum 3 bits). –The string “abefd” is encoded as “ ”. –Huffman codes are used in some UNIX systems for data compression. Irreversible compression techniques –Voice coding –Some image coding scheme that change pixel granularity or reduce color quality

Folk/Zoellick/Riccardi, File Structures 6 Reclaiming Space in Files File organization with the following operations: –record insertion –record deletion –record modification Space reclaiming is needed when –deleting fixed-length and variable-length records –modifying variable-length records »can be treated as a deletion followed by an insertion

Folk/Zoellick/Riccardi, File Structures 7 Record Deletion Identifying deleted records –Place a special mark in each deleted record. Eg., place an asterisk (*) as the first field in a deleted record. »Before deletion Ames|John|123 Maple|Stillwater|OK|74075|... Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311|... »After deletion Ames|John|123 Maple|Stillwater|OK|74075|... *|rrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311|... –Keep the deleted records around for sometimes. »Delay the disk compaction. »Programs must be able to ignore the deleted records. »Allow to “undelete” records.

Folk/Zoellick/Riccardi, File Structures 8 Record Deletion (cont’d) Space reclamation: –Happens after accumulating a number of deleted records. –A simple solution is to copy the file by skipping the deleted records. »Suitable for both fixed-length and variable-length records. »After space reclamation Ames|John|123 Maple|Stillwater|OK|74075|... Brown|Martha|625 Kimbark|Des Moines|IA|50311|... –In place (not copying a file) space reclamation is more complicated and time consuming.

Folk/Zoellick/Riccardi, File Structures 9 Dynamic Space Reclaiming -- Fixed-Length Records An naive approach: When inserting a new record, –searching the file record by record; –if a deleted record is found, insert the new record in the place of the deleted record; –otherwise, insert the new record at the end of the file. Issues on reclaiming space quickly: –How to know immediately if there are empty slots in the file? –How to jump to one of those slots, if they exist? Linking all deleted records together using a linked list: pointer deleted record Head pointer deleted record deleted record pointer...

Folk/Zoellick/Riccardi, File Structures 1010 Dynamic Space Reclaiming -- Fixed-Length Records (cont’d) –Use the link list of the deleted records as a stack: –Add (push) a recently deleted record of RRN 3 to the top of the stack: –Remove a free space of RRN from the top of the stack for an inserted record: 2 RRN 5 Head pointer RRN 2 2 RRN 5 Head pointer RRN 2 5 RRN 3 2 RRN 5 Head pointer RRN 2

Folk/Zoellick/Riccardi, File Structures 11 Dynamic Space Reclaiming -- Fixed-Length Records (cont’d) –Use the link list of the deleted records as a stack: –Add (push) a recently deleted record of RRN 3 to the top of the stack: –Insert three new records to the space of the deleted records:

Dynamic Space Reclaiming -- Variable-Length Records An available list to store the deleted variable-length records: –How to link the deleted records together into a list? –How to add newly deleted records to the available list? –How to find and remove records from the available list when space is reclaimed? An available list of variable-length records HEAD.FIRST_AVAILABLE: Ames|John|123 Maple|Stillwater|OK|74075|64 Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420|45 Brown|Martha|625 Kimbark|Des Moines|IA|50311| Delete the second record: HEAD.FIRST_AVAILABLE: Ames|John|123 Maple|Stillwater|OK|74075|64 *| |45 Brown|Martha|625 Kimbark|Des Moines|IA|50311|

Folk/Zoellick/Riccardi, File Structures 1313 Dynamic Space Reclaiming -- Variable-Length Records (cont’d) When inserting a new record, we need to search the available list for a deleted record with large enough record length: –The current available list: –Insert a record of 55 bytes: Size 72 Size 68 Size 38 Size 47 Size 68 New Link Size 38 Size 47 Size 72 removed record:

Folk/Zoellick/Riccardi, File Structures 1414 Storage Fragmentation Internal fragmentation caused by fixed-length records: Ames|John|123 Maple|Stillwater|OK|74075| Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311| Internal fragmentation caused by variable-length records: –The inserted records is shorter than the deleted record HEAD.FIRST_AVAILABLE: Ames|John|123 Maple|Stillwater|OK|74075|64 Ham|Al|28 Elm| Ada|OK|70332| |45 Brown|Martha| 625 Kimbark|Des Moines|IA|50311| –Reclaim the used part of the deleted record: HEAD.FIRST_AVAILABLE: Ames|John|123 Maple|Stillwater|OK|74075|35 *| Ham|Al|28 Elm|Ada|OK|70332|45 Brown|Martha|625 Kimbark|Des Moines|IA|50311|

Folk/Zoellick/Riccardi, File Structures 1515 Storage Fragmentation (cont’d) External fragmentation caused by continuing to insert records so some space becomes too fragmented to be useful: –Insert a record of 25 bytes HEAD.FIRST_AVAILABLE: Ames|John|123 Maple|Stillwater|OK|74075|8 *| Lee|Ed |Rt 2|Ada|OK| Ham|Al|28 Elm|Ada|OK|70332|45 Brown |Martha|625 Kimbark|Des Moines|IA|50311| How to handle external fragmentation: –storage compaction: regenerate the file when external fragmentation becomes intolerable. –coalescing the holes: combine two record slots on the available list if they are physically adjacent. –placement strategy: adopt a placement strategy to minimize fragmentation.

Folk/Zoellick/Riccardi, File Structures 1616 Placement Strategies First-fit placement strategy: search the first available space which is large enough for the inserted record. –Least amount of work when we place a newly available space on the list. Best-fit placement strategy: search the smallest available which is large enough for the inserted record. –Order the available list in ascending order by size, then use the first-fit placement strategy. –After inserting the new record, the free area left over may be too small to be useful. May cause serious external fragmentation. –The small free slots are placed at the beginning of the available list. Make the search of the first-fit space increasingly long as time goes on. Worst-fit placement strategy: –Order the available list in descending order by size, then use first-fit placement strategy. »Always insert the new record to the first slot. If the first slot is not large enough. The new record is inserted to the end of the file. »Decrease the chance of external fragmentation.

Folk/Zoellick/Riccardi, File Structures 1717 Binary Search Search by guessing. –Use RRN to jump around Searching a file of n records: –the worst case:  log n  +1 comparisons, –the average case:  log n  +1/2 comparisons. Requirement –Works only for fixed-length records. –The records must be in order in the searching field.

Folk/Zoellick/Riccardi, File Structures 1818 Sorting a Disk File in RAM If the records are not in order, they must be sorted before we can use binary search. Consider any internal sorting algorithms: bubble sort, quick sort, bucket sort, etc. –If applied directly on data stored on disk, they require many disk accesses (seeking, rotational delay) and multiple passes over the list. Extremely slow –If the entire file can fit into RAM. Load the entire contents of the file into RAM and perform internal sorting. »Can access records sequentially. »Much faster if the file is stored sequentially. »This is an example of a general rule: minimizing disk access cost by forcing disk accesses into a sequential mode and performing complex, direct access in memory.

Folk/Zoellick/Riccardi, File Structures 1919 Limitations of Binary Searching and Internal Sorting Binary searching requires more than one or two disk accesses –Accessing records by relative record number (RRN), we can retrieve a record with a single disk access. –Ideally, we can combine RRN retrieval (single access) and search by key (ease of use). Keeping a file sorted is very expensive –If record insertion is as frequent as record search, it is expensive to keep records sorted. –Keep records unsorted and use sequential search. An internal sort works only on small files –It is not possible to read all records of a large file into the main memory. –Only load the keys to the main memory -- keysorting.

Keysorting Only load records keys into RAM. A KEYNODES[ ] array has two fields: KEY and RRN. There is a correspondence between KEYNODES[ ] and records in the actual file. Actual sorting process, simply sort the KEYNODES[ ] array according to the key field.

Limitation of Keysorting The keysort method requires two reads and one write for each record. –The first pass of reads can be done sequentially, sector by sector. –The second pass of reads cannot be done sequentially. It may requires many random seeks for these reads. –Since the write operations interleave with the reads in the second pass, these writes also require separate seeks. If only one copy of the records are kept in the disk, it is not an easy job to create a sorted version of the file from KEYNODES [ ] array. Solution: –Not to write the sorted file back to the disk. –Only write the KEYNODES [ ] array back to the disk as the index file.