Download presentation
Presentation is loading. Please wait.
Published byLambert Thompson Modified over 9 years ago
1
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter 6 Organizing File for Performance
2
Folk/Zoellick/Riccardi, File Structures 2 Outline Data compression Reclaiming space in files Record deletion Dynamic space reclaiming for fixed-length record Dynamic space reclaiming for variable-length record Storage fragmentation Internal sorting and binary search Keysorting
3
Folk/Zoellick/Riccardi, File Structures 3
4
4 Dish Party Bring something delicious for yourself and 2 more people Can be anything at any budget Cannot ? Forgot ? Bring yourself Come and join us لقمة هنية تكفى مية طعام الواحد يكفي الاثنين وطعام الاثنين يكفي الأربعة وطعام الأربعة يكفي الثمانية ( حديث شريف)
5
Folk/Zoellick/Riccardi, File Structures 5 Improving Performance Less Space (Ch. 6) – less storage, – reclaiming space – defragmentation Less Time (Ch. 7) – Indexing
6
Folk/Zoellick/Riccardi, File Structures 6 Compression Definition –Reduce size of data (number of bits needed to represent data) Benefits –Reduce storage needed –Reduce transmission cost / latency / bandwidth
7
Folk/Zoellick/Riccardi, File Structures 7 Sources of Compressibility Redundancy –Recognize repeating patterns –Exploit using »Dictionary »Variable length encoding Human perception –Less sensitive to some information –Can discard less important data
8
Folk/Zoellick/Riccardi, File Structures 8 Types of Compression Lossless –Preserves all information –Exploits redundancy in data –Applied to general data Lossy –May lose some information –Exploits redundancy & human perception –Applied to audio, image, video
9
Folk/Zoellick/Riccardi, File Structures 9 Effectiveness of Compression Metrics –Bits per byte (8 bits) »2 bits / byte ¼ original size »8 bits / byte no compression –Percentage »75% compression ¼ original size
10
Folk/Zoellick/Riccardi, File Structures 1010 Effectiveness of Compression Depends on data –Random data hard »Example: 1001110100 ? –Organized data easy »Example: 1111111111 1 10 Corollary –No universally best compression algorithm
11
Folk/Zoellick/Riccardi, File Structures 11 Data Compression Data compression: to organize files into smaller size. –Use less storage, –Can be transmitted faster, –Can be processed faster sequentially. 1) Encoding with a different notation –The “State” field in the address file requires two bytes. However, 50 states can be encoded using 6 bits. 50% space saving for each occurrence of the state field. –The compact notation is a redundancy reduction technique. –Costs: »The file is not readable by humans. »The overhead of encoding and decoding operations.
12
Folk/Zoellick/Riccardi, File Structures 1212 Example StateTwo LettersEncoding New YorkNY1000001 CaliforniaCA2000010 FloridaFL3000011 …. Los AngelosLA50110010
13
Folk/Zoellick/Riccardi, File Structures 1313 Data Compression (cont’d) 2) Suppressing repeating sequences –Suitable for sparse arrays or images with regions of same colors. –Run-length encoding: choose an unused byte value to indicate that a run-length code follows that byte. –Encoding algorithm: »Read through the data (pixels or values) that make up the image or data content, copying the data values to the file in sequence, except where the same data value occurs more the once in the succession,
14
Folk/Zoellick/Riccardi, File Structures 1414 Data Compression (cont’d) 2) Suppressing repeating sequences »Where the same value occurs more than once in succession, substitute the following three entries: The special run-length code indicator, The data value that is repeated, and The number of times that the value is repeated. »Example, 50 51 52 52 52 52 52 53 54 54 54 54 54 54 54 55 52 52 53 53 53 54 The encoded sequence is: 50 51 ff 52 05 53 ff 54 07 55 ff 52 02 ff 53 03 54
15
Folk/Zoellick/Riccardi, File Structures 1515 Data Compression (cont’d) 3) Variable length encoding –Letters with high frequency are encoded using shorter symbols. –Letters with low frequency are encoded using longer symbols. –Huffman code (for a set of seven letters): »four bits per letter (minimum 3 bits). –The string “abefd” is encoded as “1010000100100000”. –Huffman codes are used in some UNIX systems for data compression.
16
Folk/Zoellick/Riccardi, File Structures 1616 Huffman Code (Was explained in lecture. Read about it.) Approach –Variable length encoding of symbols –Exploit statistical frequency of symbols –Efficient when symbol probabilities vary widely Principle –Use fewer bits to represent frequent symbols –Use more bits to represent infrequent symbols AABA AAAB
17
Folk/Zoellick/Riccardi, File Structures 1717 Data Compression (cont’d) 4) Irreversible compression techniques –Voice coding –Some image coding scheme that change pixel granularity or reduce color quality
18
Folk/Zoellick/Riccardi, File Structures 1818
19
1919 Reclaiming Space in Files File organization with the following operations: –record insertion –record deletion –record modification Space reclaiming is needed when –deleting fixed-length and variable-length records –modifying variable-length records »can be treated as a deletion followed by an insertion
20
Folk/Zoellick/Riccardi, File Structures 2020 Record Deletion Identifying deleted records –Place a special mark in each deleted record. Eg., place an asterisk (*) as the first field in a deleted record. »Before deletion Ames|John|123 Maple|Stillwater|OK|74075|... Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311|... »After deletion Ames|John|123 Maple|Stillwater|OK|74075|... *| rrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311|…
21
Folk/Zoellick/Riccardi, File Structures 2121 Record Deletion –Keep the deleted records around for sometimes. »Delay the disk compaction. »Programs must be able to ignore the deleted records. »Allow to “undelete” records.
22
Folk/Zoellick/Riccardi, File Structures2 Record Deletion (cont’d) Space reclamation: –Happens after accumulating a number of deleted records. –A simple solution is to copy the file by skipping the deleted records. »Suitable for both fixed-length and variable-length records. »After space reclamation Ames|John|123 Maple|Stillwater|OK|74075|... Brown|Martha|625 Kimbark|Des Moines|IA|50311|... –In place (not copying a file) space reclamation is more complicated and time consuming.
23
Folk/Zoellick/Riccardi, File Structures 2323 Dynamic Space Reclaiming -- Fixed-Length Records A naive approach: When inserting a new record, –searching the file record by record; –if a deleted record is found, insert the new record in the place of the deleted record; –otherwise, insert the new record at the end of the file.
24
Folk/Zoellick/Riccardi, File Structures 2424 Dynamic Space Reclaiming -- Fixed-Length Records Issues on reclaiming space quickly: –How to know immediately if there are empty slots in the file? –How to jump to one of those slots, if they exist? Linking all deleted records together using a linked list: pointer deleted record Head pointer deleted record deleted record pointer...
25
Folk/Zoellick/Riccardi, File Structures 2525 Dynamic Space Reclaiming -- Fixed-Length Records (cont’d) –Use the link list of the deleted records as a stack: –Add (push) a recently deleted record of RRN 3 to the top of the stack: –Remove a free space of RRN from the top of the stack for an inserted record: 2 RRN 5 Head pointer RRN 2 2 RRN 5 Head pointer RRN 2 5 RRN 3 2 RRN 5 Head pointer RRN 2
26
Folk/Zoellick/Riccardi, File Structures 2626 Dynamic Space Reclaiming -- Fixed-Length Records (cont’d) –Use the link list of the deleted records as a stack: –Add (push) a recently deleted record of RRN 3 to the top of the stack: –Insert three new records to the space of the deleted records:
27
Folk/Zoellick/Riccardi, File Structures 2727
28
Dynamic Space Reclaiming -- Variable-Length Records An available list to store the deleted variable-length records: –How to link the deleted records together into a list? –How to add newly deleted records to the available list? –How to find and remove records from the available list when space is reclaimed?
29
Folk/Zoellick/Riccardi, File Structures 2929 Dynamic Space Reclaiming -- Variable-Length Records An available list of variable-length records HEAD.FIRST_AVAILABLE: -1 40 Ames|John|123 Maple|Stillwater|OK|74075|64 Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420|45 Brown|Martha|625 Kimbark|Des Moines|IA|50311| Delete the second record: HEAD.FIRST_AVAILABLE: 43 40 Ames|John|123 Maple|Stillwater|OK|74075|64 *|- 1.............................................................................................|45 Brown|Martha|625 Kimbark|Des Moines|IA|50311|
30
Folk/Zoellick/Riccardi, File Structures 3030 Dynamic Space Reclaiming -- Variable-Length Records (cont’d) When inserting a new record, we need to search the available list for a deleted record with large enough record length: –The current available list: –Insert a record of 55 bytes: Size 72 Size 68 Size 38 Size 47 Size 68 New Link Size 38 Size 47 Size 72 removed record:
31
Folk/Zoellick/Riccardi, File Structures 3131
32
3232 Storage Fragmentation Internal fragmentation caused by fixed-length records: Ames|John|123 Maple|Stillwater|OK|74075|................................... Morrison|Sebastian|9035 South Hillcrest|Forest Village|OK|78420| Brown|Martha|625 Kimbark|Des Moines|IA|50311|......................... Internal fragmentation caused by variable-length records: –The inserted records is shorter than the deleted record HEAD.FIRST_AVAILABLE: -1 40 Ames|John|123 Maple|Stillwater|OK|74075|64 Ham|Al|28 Elm| Ada|OK|70332|.....................................................|45 Brown|Martha| 625 Kimbark|Des Moines|IA|50311| –Reclaim the used part of the deleted record: HEAD.FIRST_AVAILABLE: 43 40 Ames|John|123 Maple|Stillwater|OK|74075|35 *|-1................................26 Ham|Al|28 Elm|Ada|OK|70332|45 Brown|Martha|625 Kimbark|Des Moines|IA|50311|
33
Folk/Zoellick/Riccardi, File Structures3
34
3434 Storage Fragmentation (cont’d) External fragmentation caused by continuing to insert records so some space becomes too fragmented to be useful: –Insert a record of 25 bytes HEAD.FIRST_AVAILABLE: 43 40 Ames|John|123 Maple|Stillwater|OK|74075|8 *|-1.....25 Lee|Ed |Rt 2|Ada|OK|7482026 Ham|Al|28 Elm|Ada|OK|70332|45 Brown |Martha|625 Kimbark|Des Moines|IA|50311| How to handle external fragmentation: –storage compaction: regenerate the file when external fragmentation becomes intolerable. –coalescing the holes: combine two record slots on the available list if they are physically adjacent. –placement strategy: adopt a placement strategy to minimize fragmentation.
35
Folk/Zoellick/Riccardi, File Structures 3535
36
3636 Placement Strategies First-fit placement strategy: search the first available space which is large enough for the inserted record. –Least amount of work when we place a newly available space on the list. Best-fit placement strategy: search the smallest available which is large enough for the inserted record. –Order the available list in ascending order by size, then use the first-fit placement strategy. –After inserting the new record, the free area left over may be too small to be useful. May cause serious external fragmentation. –The small free slots are placed at the beginning of the available list. Make the search of the first-fit space increasingly long as time goes on. Worst-fit placement strategy: –Order the available list in descending order by size, then use first-fit placement strategy. »Always insert the new record to the first slot. If the first slot is not large enough. The new record is inserted to the end of the file. »Decrease the chance of external fragmentation.
37
Folk/Zoellick/Riccardi, File Structures 3737
38
3838 13.38 Logical view of an indexed file
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.