File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick.

Slides:



Advertisements
Similar presentations
Indexing.
Advertisements

Chapter 6: Memory Management
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Organizing Files for Performance.
Comp 335 File Structures Reclaiming and Reusing File Space Techniques for File Maintenance.
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter.
February 1 & 31 Files Organizing Files for Performance.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
File StructureSNU-OOPSLA Lab1 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 Chap 5. Managing Files of Records File Structures by Folk, Zoellick, and Ricarrdi.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
Allocating Memory.
Chap6. Organizing Files for Performance. Chapter Objectives(1)  Look at several approaches to data compression  Look at storage compaction as a simple.
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Memory Management Chapter 4. Memory hierarchy Programmers want a lot of fast, non- volatile memory But, here is what we have:
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Fundamental File Structure Concepts
A Data Compression Algorithm: Huffman Compression
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Organizing files for performance Chapter Data compression Advantages of reduced file size Redundancy reduction: state code example Repeating sequences:
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Chapter 7 Indexing Objectives: To get familiar with: Indexing
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with: Data compression Storage management Internal sorting and binary search Chapter.
DATA STRUCTURE Subject Code -14B11CI211.
CSE Lectures 22 – Huffman codes
Real-Time Concepts for Embedded Systems Author: Qing Li with Caroline Yao ISBN: CMPBooks.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Hash Table March COP 3502, UCF.
Announcements Exam Friday Project: Steps –Due today.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures.
ICS 145B -- L. Bic1 Project: Main Memory Management Textbook: pages ICS 145B L. Bic.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
CS4432: Database Systems II Record Representation 1.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Evidence from Content INST 734 Module 2 Doug Oard.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
CS6502 Operating Systems - Dr. J. Garrido Memory Management – Part 1 Class Will Start Momentarily… Lecture 8b CS6502 Operating Systems Dr. Jose M. Garrido.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Course Code #IDCGRF001-A 5.1: Searching and sorting concepts Programming Techniques.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
LINKED LISTS.
Welcome to ….. File Organization.
Module 11: File Structure
CPSC 231 Organizing Files for Performance (D.H.)
Lecture 7 Data Compression
Unit -3 Preeti Deshmukh.
Operating Systems (CS 340 D)
Subject Name: File Structures
Disk Storage, Basic File Structures, and Buffer Management
Chap6. Organizing Files for Performance
CENG 351 Data Management and File Structures
Presentation transcript:

File StructuresSNU-OOPSLA Lab.1 Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File structures by Folk, Zoellick and Ricarrdi

File Structures SNU-OOPSLA Lab. 2 Chapter Objectives(1) u Look at several approaches to data compression u Look at storage compaction as a simple way of reusing space in a file u Develop a procedure for deleting fixed-length records that allows vacated file space to be reused dynamically u Illustrate the use of linked lists and stacks to manage an avail list u Consider several approaches to the problem of deleting variable-length records u Introduce the concepts associated with the terms internal fragmentation and external fragmentation

File Structures SNU-OOPSLA Lab. 3 Chapter Objectives(2) u Outline some placement strategies associated with the reuse of space in a variable-length record file u Provide an introduction to the idea underlying a binary search u Undertake an examination of the limitations of binary searching u Develop a keysort procedure for sorting larger files; investigate the costs associated with keysort u Introduce the concept of a pinned record

File Structures SNU-OOPSLA Lab. 4 Contents 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 5 Data Compression(1) u Reasons for data compression u less storage u transmitting faster, decreasing access time u processing faster sequentially 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 6 Data Compression(2) : Using a different notation u Fixed-Length fields are good candidates u Decrease the # of bits by finding a more compact notation ex) original state field notation is 16bits, but we can encode with 6bit notation because of the # of all states are 50 u Cons. u unreadable by human u cost in encoding time u decoding modules => increase the complexity of s/w => used for particular application 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 7 Data Compression(3) : Suppressing repeating sequences u Run-length encoding algorithm u read through pixels, copying pixel values to file in sequence, except the same pixel value occurs more than once in succession u when the same value occurs more than once in succession, substitute the following three bytes Ê special run-length code indicator((ex) ff) Ë pixel value repeated Ì the number of times that value is repeated u ex) Ô ff ff Data Compression

File Structures SNU-OOPSLA Lab. 8 화면 pixel 빛의 세기 수치화 (digital) 각 pixel 당 전기 신호 (analog) 컴퓨터내 image 의 표현

File Structures SNU-OOPSLA Lab. 9 화면 …... 컴퓨터내 image 의 표현

File Structures SNU-OOPSLA Lab. 10 화면 … … …... ** 동영상 --- 초당 개의 정지화상을 교체 (video)(image) 컴퓨터내 color 영상의 표현

File Structures SNU-OOPSLA Lab. 11 Data Compression(3) : Suppressing repeating sequences u Run-length encoding (cont’d) u example of redundancy reduction u cons. u not guarantee any particular amount of space savings u under some circumstances, compressed image is larger than original image u Why? Can you prevent this? 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 12 Data Compression(4) : Assigning variable-length codes u Morse code: oldest & most common scheme of variable-length code u Some values occur more frequently than others u that value should take the least amount of space u Huffman coding u base on probability of occurrence u determine probabilities of each value occurring u build binary tree with search path for each value u more frequently occurring values are given shorter search paths in tree 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 13 Data Compression(5) : Assigning variable-length codes u Huffman coding Letter:abcdefg Prob: Code: ex) the string “abde” è Data Compression

File Structures SNU-OOPSLA Lab. 14 d(0000)e(0001) f(0010) g(0011) b(010)c(011) a(1) Huffman Tree Data Compression

File Structures SNU-OOPSLA Lab. 15 Data Compression(6) : Irreversible compression techniques u Some information can be sacrificed u Less common in data files u Shrinking raster image u 400-by-400 pixels to 100-by-100 pixels u 1 pixel for every 16 pixels u Speech compression u voice coding (the lost information is of no little or no value) 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 16 Compression in UNIX u System V u pack & unpack use Huffman codes u after compress file, appends “.z” to end of packed file u Berkeley UNIX u compress & uncompress use Lempel-Ziv method u after compress file, appends “.Z” to end of compressed file 6.1 Data Compression

File Structures SNU-OOPSLA Lab. 17 Record Deletion and Storage Compaction u Storage compaction u record deletion : just marks each deleted record u reclamation of all deleted records => pros : delete/undelete operation with little effort Ex) Ames|123|OK|…...| Morrison|9035|OK| Brown|625|IA|…...| Delete second record Ames|123|OK|…| *|rrison|9035|OK| Brown|625|IA|…| After compaction Ames|123|OK|…| Brown|625|IA|…| 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 18 Deleting Fixed-length Records for Reclaiming Space Dynamically(1) u Reuse the space from deleted records as soon as possible u deleted records must be marked in special way u we could find the deleted space u To make record reuse quickly, we need u a way to know immediately if there are empty slots in the file u a way to jump directly to one of those slots if they exist => Linked lists or Stacks for avail list * avail list : a list that is made up of deleted records 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 19 Deleting Fixed-length Records for Reclaiming Space Dynamically(2) Head pointer RRN 5 RRN 2 Head pointer RRN 3 PRN 5 RRN 2 (2) (3) 2 25 (a) (b) after pushing record of RRN 3 Head pointer ptr The Linked List The Stack 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 20 Deleting Fixed-length Records for Reclaiming Space Dynamically(3) u Linking and stacking deleted records u arranging and rearranging links are used to make one available record slot point to the next u second field of deleted record points to next record 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab Edwards... Betas... Wills... *-1 Masters.. *3 Chavez... Edwards... *5 Wills... *-1 Masters.. *3 Chavez... Edwards.. 1st new rec Wills... 3rd new rec Masters.. 2nd new rec Chavez... Sample file showing linked list of deleted records List head(first available record) 5 (delete 3, 5 ) List head(first available record) 1 (delete 1) List head(first available record) -1 (insert three new records) (a) (b) (c) 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 22 Deleting Variable-length Records u Avail list of variable-length records u it has byte count of record at beginning of each record u use byte offset instead of RRN u Adding and removing records u in adding records, search through avail list for right size (=>big enough) 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 23 Size 47 Size 38 Size 72 Size 68 Size 47 Size 68 Size 38 Size 72 New Link Removed record (a)Before removal (b)After removal Removal of a record from an avail list with variable-length records 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 24 Storage Fragmentation u Internal fragmentation (in fixed-length record) u waste space within a record u in variable-length records, minimize wasted space by doing away with internal fragmentation u External fragmentation (in variable-length record) u unused space outside or between individual records u three possible solutions Ê storage compaction · coalescing the holes: a single, larger record slot ¸ minimizing fragmentation by adopting placement strategy 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 25 Internal Fragmentation in Fixed-length Records Ames | John | 123 Maple | Stillwater | OK | | Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | | Brown | Martha | 625 Kimbark | Des Moines | IA | | byte fixed-length records Unused space -> Internal fragmentation 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 26 External Fragmentation in Variable-length Records 40 Ames | Jone | 123 Maple | Stillwater | OK | | 64 Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | | 45 Brown | Martha | 625 Kimb bark | Des Moines | IA | | Record[1]Record[2] Record[3] ex) Delete Record[2] and Insert New Record[i] : 12-byte unused space 52 Adams | Kits | 3301 Washington D.C | Forest Village | IA | | External fragmentation record length Record[i] 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 27 Placement Strategies u First-fit u select the first available record slot u suitable when lost space is due to internal fragmentation u Best-fit u select the available record slot closest in size u avail list in ascending order u suitable when lost space is due to internal fragmentation u Worst-fit u select the largest record slot u avail list in descending order u suitable when lost space is due to external fragmentation 6. 2 Reclaiming Space in Files

File Structures SNU-OOPSLA Lab. 28 Finding Things Quickly(1) u Goal: Minimize the number of disk accesses u Finding things in simple field and record files may have many seeks u Binary search algorithm for fixed-sized record int BinarySearch(FixedRecordFile &file, RecType &obj, KeyType &key) // binary search for key. { int low = 0; int high = file.NumRecs() - 1; while (low <= high){ int guess = (high - low)/2; file.ReadByRRN(obj, guess); if(obj.Key () == key) return 1; // record found if*obj.Key() < key) high = guess - 1; // search before guess else low = guess + 1; // search after guess } return 0; // loop ended without finding key } 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

File Structures SNU-OOPSLA Lab. 29 Classes and Methods for Binary Search Class KeyType {public int operator == (KeyType &); int operator < (KeyType &); }; class RecType {public: KeyType Key();}; class FixedRecordFile{public: int NumRecs(); int ReadByRRN (RecType & Record, int RRN); };

File Structures SNU-OOPSLA Lab. 30 Finding Things Quickly(2) u Binary search vs. Sequential search u binary search u O(log n) u list is sorted by key u sequential search u O(n) 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

File Structures SNU-OOPSLA Lab. 31 Finding Things Quickly(3) u Sorting a disk file in RAM u read the entire file from disk to memory u use internal sort (=sort in memory) u UNIX sort utility uses internal sort u Limitations of binary search & internal sort u binary search requires more than one or two access c.f.) single access by RRN u keeping a file sorted is very expensive u an internal sort works only on small files 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

File Structures SNU-OOPSLA Lab. 32 Internal Sort unsorted file unsorted file sorted file Read the entire file Sort in memory disk memory 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching

File Structures SNU-OOPSLA Lab. 33 Key Sorting & Its Limitations u So called, “tag sort” : sorted thing is “key” only u Sorting procedure ¶ Read only the keys into memory · Sort the keys ¸ Rearrange the records in file by the sorted keys u Advantage u less RAM than internal sort u Disadvantages(=Limitations) u reading records in disk twice is required u a lot of seeking for records for constructing a new(sorted) file 6.4 Keysorting

File Structures SNU-OOPSLA Lab k HARRISON KELLOG HARRIS BELL Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records In RAMOn secondary storage k3k3 1 2 HARRISON KELLOG HARRIS BELL Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records Conceptual view after sorting keys in RAM Conceptual view before sorting KEYNODES array 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 35 Pseudocode for keysort(1)  Program: keysort  open input file as IN_FILE  create output file as OUT_FILE  read header record from IN_FILE and write a copy to OUT_FILE  REC_COUNT := record count from header record  /* read in records; set up KEYNODES array */  for i := 1 to REC_COUNT  read record from IN_FILE into BUFFER  extract canonical key and place it in KEYNODES[i].KEY  KEYNODES[i].KEY = i (continued....) 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 36 Pseudocode for keysort(2)  /* sort KEYNODES[].KEY, thereby ordering RRNs correspondingly */  sort(KEYNODES, REC_COUNT)  /* read in records according to sorted order, and write them out in this order */  for i := 1 to REC_COUNT  seek in IN_FILE to record with RRN of KEYNODES[I].RRN write BUFFER contents to OUT_FILE  close IN_FILE and OUT_FILE  end PROGRAM 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 37 Two Solutions :why bother to write the file back? u Write out sorted KEYNODES[] array without writing records back in sorted order u KEYNODES[] array is used as index file 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 38 k3k3 1 2 HARRISON KELLOG HARRIS BELL Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... Bell|Robert|8912 Hill.... KEYRRN Records Index file Original file Relationship between the index file and the data file 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 39 Pinned records(1) u Records that are referenced to physical location of themselves by other records u Not free to alter physical location of records for avoiding dangling references u Pinned records make sorting more difficult and sometimes impossible u solution: use index file, while keeping actual data file in original order 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 40 Pinned records(2) File with pinned records Record(i) Pinned Record Record (i+1)Pinned Record delete pinned record dangling pointer 6.4 Keysorting

File Structures SNU-OOPSLA Lab. 41 Let’s Review !!! 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting