Download presentation
Presentation is loading. Please wait.
1
File Organization & processing
Cairo University FCI 2014 File Organization & processing CS 215 Lecture 11 Rest of Ch4 & Ch5 By: Presenter: Dr. Mohamamd El-Ramly Many slides by Others
2
Using Classes to Manipulate Buffers
Examples of three C++ classes to encapsulate operation of buffer object Function : Pack, Unpack, Read, Write Output: pack into a buffer & write a buffer to a file Input: read into a buffer from a file & unpack a buffer ‘pack and unpack’ deals with only one field DelimTextBuffer class for delimited fields LengthTextBuffer class for length-based fields FixedTextBuffer class for fixed-length fields Appendix E : Full implementation (Buggy)
3
Buffer Class for Delimited Text Fields(1)
Variable-length buffer Fields are represented as delimited text Class DelimTextBuffer { public: DelimTextBuffer (char Delim = ‘|’, int maxBtytes = 1000); int Read(istream & file); int Write (ostream & file) const; int Pack(const char * str, int size = -1); int Unpack(char * str); private: char Delim; // delimiter character char * Buffer; // character array to hold field values int BufferSize; // current size of packed fields int MaxBytes; // maximum # of chars in the buffer int NextByte; // packing/unpacking position in buffer };
4
Buffer Class for Delimited Text Fields(2)
int DelimTextBuffer::Unpack(char *str) { start = nextByte from start to buffer end search for delimter if not found return if found read from start till delimeter into str update nextByte if more data return true else return false } Unpack() extracts one field from a record in a buffer.
5
Buffer Class for Delimited Text Fields(3)
int DelimTextBuffer::Unpack(char *str) // extract the value of the next field of the buffer { int len = -1; // length of packed string int start = NextByte; // first character to be unpacked for(int i = start; i < BufferSize; i++) if(Buffer[i] == Delim) {len = i-start; break; } if(len == -1) return FALSE; // delimiter not found NextByte += len + 1; if(NextByte > BufferSize) return FALSE; strncpy (str, &Buffer[start], len); str[len] = 0; // zero termination for string return TRUE; Unpack() extracts one field from a record in a buffer.
6
Buffer Class for Delimited Text Fields(4)
int DelimTextBuffer::Pack(char * str, int size) { If string is too short return If string will overflow buffer return Else write string in buffer from nextByte Add delimiter Update nextByte Return True } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.
7
Buffer Class for Delimited Text Fields(5)
int DelimTextBuffer :: Pack (char * str, int size) // set the value of the next field of the buffer; // if size = -1 (default) use strlen(str) as Delim of field { short len; // length of string to be packed if (size >= 0) len = size; else len = strlen (str); //C-string len fn: # chars to \0 if (len > strlen(str)) // str is too short! return FALSE; int start = NextByte; // first character to be packed NextByte += len + 1; if (NextByte > MaxBytes) return FALSE; memcpy (&Buffer[start], str, len); Buffer [start+len] = Delim; // add delimeter BufferSize = NextByte; return TRUE; } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.
8
Buffer Class for Delimited Text Fields (6)
Read method of DelimTextBuffer Clears the current buffer contents Extracts the record size Read the proper number of bytes into buffer Set the buffer size int DelimTextBuffer::Read(istream & stream) { Clear(); stream.read((char *)&BufferSize, sizeof(BufferSize)); if (Stream.fail()) return FALSE; if (BubberSize > MaxBytes) return FALSE; stream.read(Buffer, BufferSize); return stream.good(); }
9
Buffer Class for Delimited Text Fields (7)
Write method of DelimTextBuffer Write size data Write buffer content int DelimTextBuffer :: Write (ostream & stream){ stream . write ((char*)&BufferSize, sizeof(BufferSize)); stream . write (Buffer, BufferSize); return stream . good (); }
11
CS215: File Structure and Processing Chapter 5
Managing Files of Records
13
Chapter Objectives Extend the file structure concepts of Chapter 4:
Search keys and canonical forms Sequential search and Direct access Files access and file organization Examine other kinds of the file structures in terms of Abstract data models Metadata Object-oriented file access Extensibility Examine issues of portability and standardization.
14
Record Access Record Key Canonical form : a standard form of a key
e.g. Ames or ames or AMES (need conversion) Distinct keys : uniquely identify a single record Primary keys, Secondary keys, Candidate keys Primary keys should be dataless (not updatable) Primary keys should be unchanging Social-securiy-number: good primary key but, for all non-registered aliens Measurement of work: Comparisons: occur in main memory Disk accesses: main bottleneck
15
Sequential Search Sequential search is least efficient. Our main pursuit for the duration of the term is to present improved search methods O(n), n : the number of records Use record blocking to reduce work A block of several records fields < records < blocks O(n), but blocking decreases the number of seek sequential within each block e.g records, 512 bytes each, sector size 512 bytes Unblocked (sector-sized buffers): 512 (½K buffer) => average 2000 READ() calls Blocked (16 recs / block) : 8K size buffer => average 125 READ() calls Can further improve upon performance by using block key containing last record key to avoid searching within blocks where data can’t be
16
Sequential Search: Best Uses
When is Sequential Search Superior? Repetitive hits Searching for patterns in ASCII files Searching records with a certain secondary key value Small Search Set Processing files with few records Devices/media most hospitable to sequential access tape
18
Direct Access Access a record without searching
O(1) operation RRN ( Relative Record Number ) Gives relative position of the record O(n) process with variable-length records Easy with fixed-length records: RRN*sizeof(record) View file as collection of records, not bytes; all byte info is internal Byte offset = N X R r : record size n : RRN value
20
Direct Access Class IOBuffer includes direct read (DRead)
direct write (DWrite) take byte offset as argument, along with stream
21
Choosing Record Length and Structure
Record length is related to the size of the fields Access vs. fragmentaion vs. implementation Fixed length record fixed-length fields variable-length fields Unused space portion is filled with null character in C e.g. delimited OHIO COLUMBUS OHIO| |7|264.9|41330|35|3|1|1803|17|COLUMBUS\0....\0
22
Header Records File as a Self-Describing Object
General information about file date and time of recent update, number of records size of record, fields (fixed-length record & field) delimiter (variable-length field) Often placed at the beginning of the file
23
IO Buffer Class definition
class IOBuffer Abstract base class for file buffers public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values
24
IO Buffer Class definition
Full definition of buffer class hierarchy WriteHeader method : writes the header string at the beginning of the file. Possible strings: “Variable” “Fixed” Returns size of header written ReadHeader method : reads the header id string. Must be the expected record type, variable or fixed length If the string matches that subclass’ header string, returns size of header any other string causes return of –1 header doesn’t match buffer
25
IO Buffer Class definition
Full definition of buffer class hierarchy DWrite/DRead methods : operates using the byte address of the record as the record reference. Methods begin by seeking to the requested spot.
27
File Access and File Organization
There is difference between file access and file organization. Variable-length records Sequential access is suitable Fixed-length records Direct access and sequential access are possible File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access
28
Abstract Data Model Data object such as document, images, sound
e.g. images, sound Abstract Data Model does not view data as it appears on a particular medium. application-oriented view application shielded from details of storage on medium How to specify a file’s content? Headers and Self-describing files e.g. images: jpg: ÿØÿà JFIF gif: GIF89a e.g. sounds: mp3: ÿûD EQ¹à wav: RIFF$P WAVEfmt
29
Graphics Interchange Format
Example: GIF Graphics Interchange Format Industry standard graphic format for on-screen viewing through the Internet and Web. Not meant to be used for printing. The best format for all images except scanned photographic images (use JPEG for these). GIF supports lossless compression.
31
Metadata Data that describe the primary data in a file
e.g. <Meta> in html Store in the header record Standard format As shown on next slide
32
Html: Metadata
33
Metadata <!DOCTYPE html> <html> <head>
<meta name="description" content="My Web tutorials"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="Hege Refsnes"> <meta charset="UTF-8"> </head> <body> <p>All meta information goes in the head section...</p> </body> </html>
34
Mixing object Types in a file
Each field is identified using “keyword = value” Index table with tags e.g.
37
Object-oriented file access
Separate translating to and from the physical format and application (representation-independent file access) provide a function to handle access (OO style) encapsulate details read_image() is image file type independent; method determines file type Program find_star : read_image(“star1”, image) process image end find_star image : star1 star2 RAM Disk
38
Extensibility Advantage of using tags New type of object
Identify object within files do not require a priori knowledge of the types of objects New type of object implement method for reading and writing in appropriate module (separate concerns) call the method.
39
Factor affecting Portability
Differences among operating systems e.g. CR/LF in DOS Differences among languages physical layout of files may be constrained by language limitation Differences in machine architectures byte order: e.g. Unix: hton, ntoh Differences on platforms e.g. EBCDIC vs. ASCII
40
Achieving Portability
Standardization Standard physical record format extensible, simple Standard binary encoding for data elements IEEE, XDR File structure conversion Number and text conversion Established, well-known methods of conversion
41
Achieving Portability
File system difference Block size is 512 bytes on UNIX systems Block size is 2880 bytes on many non-UNIX systems UNIX and Portability UNIX support portability by being commonly available on a large number of platforms UNIX provides a utility called dd dd : facilitates data conversion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.