File Organization & processing

Slides:



Advertisements
Similar presentations
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Advertisements

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 8 – File Structures.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 122 – Data Structures Characters and Strings.
1 Today’s lecture  Last lecture we started talking about control flow in MIPS (branches)  Finish up control-flow (branches) in MIPS —if/then —loops —case/switch.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search.
File StructureSNU-OOPSLA Lab1 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 Chap 5. Managing Files of Records File Structures by Folk, Zoellick, and Ricarrdi.
Chapter 11 C File Processing Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Fundamental File Structure Concepts
Chapter 12 File Management
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
CPSC 231 Managing Files of Records (D.H.) 1 Learning Objectives Concept of key - primary and secondary keys. Sequential versus direct access. RRN Use of.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with Alternatives in field and record organizations Object-oriented approach to buffered.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Managing Files of Records CS 3050, Spring /4/2007 Dr Melanie Martin.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
File Structure Fundamentals (D.H.)1 Learning Objectives Field and record organization Index file C++ code that deals with field and record organization.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
File StructuresSNU-OOPSLA Lab.1 Chap4. Fundamental File Structure Concepts 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick,
C How to Program, 6/e © by Pearson Education, Inc. All Rights Reserved.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Fundamental File Structure Concepts & Managing Files of Records
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
1 Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas 78704
File Structures Foundations of Computer Science  Cengage Learning.
Chapter 10: File-System Interface 10.1 Silberschatz, Galvin and Gagne ©2011 Operating System Concepts – 8 th Edition 2014.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 Chap4. Fundamental File Structure Concepts. 2 Chapter Objectives  Introduce file structure concepts dealing with Stream files Reading and writing fields.
Operating Systems COMP 4850/CISG 5550 File Systems Files Dr. James Money.
CS4432: Database Systems II Record Representation 1.
FILE HANDLING IN C++.
File Organization Lecture 1
File Processing - Fundamental concepts MVNC1 Fundamental File Structure Concepts Chapter 4.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
Chapter 8 Characters and Strings. Objectives In this chapter, you will learn: –To be able to use the functions of the character handling library ( ctype).
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
Comp 335 File Structures Fundamental File Structure Concepts.
Chapter 5 Record Storage and Primary File Organizations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
Examples (D. Schmidt et al)
Fundamental File Structure Concepts
Module 11: File Structure
CPSC 231 Managing Files of Records (D.H.)
Chapter 11: Storage and File Structure
Database Management Systems (CS 564)
Lecture 10: Buffer Manager and File Organization
Disk Storage, Basic File Structures, and Buffer Management
An overview of Java, Data types and variables
Chap4. Fundamental File Structure Concepts
File Storage and Indexing
Files Management – The interfacing
ICOM 5016 – Introduction to Database Systems
Chap 5. Managing Files of Records
Real-World File Structures
Presentation transcript:

File Organization & processing Cairo University FCI 2014 File Organization & processing CS 215 Lecture 11 Rest of Ch4 & Ch5 By: m_el_ramly@yahoo.ca Presenter: Dr. Mohamamd El-Ramly Many slides by Others

Using Classes to Manipulate Buffers Examples of three C++ classes to encapsulate operation of buffer object Function : Pack, Unpack, Read, Write Output: pack into a buffer & write a buffer to a file Input: read into a buffer from a file & unpack a buffer ‘pack and unpack’ deals with only one field DelimTextBuffer class for delimited fields LengthTextBuffer class for length-based fields FixedTextBuffer class for fixed-length fields Appendix E : Full implementation (Buggy)

Buffer Class for Delimited Text Fields(1) Variable-length buffer Fields are represented as delimited text Class DelimTextBuffer { public: DelimTextBuffer (char Delim = ‘|’, int maxBtytes = 1000); int Read(istream & file); int Write (ostream & file) const; int Pack(const char * str, int size = -1); int Unpack(char * str); private: char Delim; // delimiter character char * Buffer; // character array to hold field values int BufferSize; // current size of packed fields int MaxBytes; // maximum # of chars in the buffer int NextByte; // packing/unpacking position in buffer };

Buffer Class for Delimited Text Fields(2) int DelimTextBuffer::Unpack(char *str) { start = nextByte from start to buffer end search for delimter if not found return if found read from start till delimeter into str update nextByte if more data return true else return false } Unpack() extracts one field from a record in a buffer.

Buffer Class for Delimited Text Fields(3) int DelimTextBuffer::Unpack(char *str) // extract the value of the next field of the buffer { int len = -1; // length of packed string int start = NextByte; // first character to be unpacked for(int i = start; i < BufferSize; i++) if(Buffer[i] == Delim) {len = i-start; break; } if(len == -1) return FALSE; // delimiter not found NextByte += len + 1; if(NextByte > BufferSize) return FALSE; strncpy (str, &Buffer[start], len); str[len] = 0; // zero termination for string return TRUE; Unpack() extracts one field from a record in a buffer.

Buffer Class for Delimited Text Fields(4) int DelimTextBuffer::Pack(char * str, int size) { If string is too short return If string will overflow buffer return Else write string in buffer from nextByte Add delimiter Update nextByte Return True } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.

Buffer Class for Delimited Text Fields(5) int DelimTextBuffer :: Pack (char * str, int size) // set the value of the next field of the buffer; // if size = -1 (default) use strlen(str) as Delim of field { short len; // length of string to be packed if (size >= 0) len = size; else len = strlen (str); //C-string len fn: # chars to \0 if (len > strlen(str)) // str is too short! return FALSE; int start = NextByte; // first character to be packed NextByte += len + 1; if (NextByte > MaxBytes) return FALSE; memcpy (&Buffer[start], str, len); Buffer [start+len] = Delim; // add delimeter BufferSize = NextByte; return TRUE; } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.

Buffer Class for Delimited Text Fields (6) Read method of DelimTextBuffer Clears the current buffer contents Extracts the record size Read the proper number of bytes into buffer Set the buffer size int DelimTextBuffer::Read(istream & stream) { Clear(); stream.read((char *)&BufferSize, sizeof(BufferSize)); if (Stream.fail()) return FALSE; if (BubberSize > MaxBytes) return FALSE; stream.read(Buffer, BufferSize); return stream.good(); }

Buffer Class for Delimited Text Fields (7) Write method of DelimTextBuffer Write size data Write buffer content int DelimTextBuffer :: Write (ostream & stream){ stream . write ((char*)&BufferSize, sizeof(BufferSize)); stream . write (Buffer, BufferSize); return stream . good (); }

CS215: File Structure and Processing Chapter 5 Managing Files of Records

Chapter Objectives Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access Files access and file organization Examine other kinds of the file structures in terms of Abstract data models Metadata Object-oriented file access Extensibility Examine issues of portability and standardization.

Record Access Record Key Canonical form : a standard form of a key e.g. Ames or ames or AMES (need conversion) Distinct keys : uniquely identify a single record Primary keys, Secondary keys, Candidate keys Primary keys should be dataless (not updatable) Primary keys should be unchanging Social-securiy-number: good primary key but, 999-99-9999 for all non-registered aliens Measurement of work: Comparisons: occur in main memory Disk accesses: main bottleneck

Sequential Search Sequential search is least efficient. Our main pursuit for the duration of the term is to present improved search methods O(n), n : the number of records Use record blocking to reduce work A block of several records fields < records < blocks O(n), but blocking decreases the number of seek sequential within each block e.g.- 4000 records, 512 bytes each, sector size 512 bytes Unblocked (sector-sized buffers): 512 (½K buffer) => average 2000 READ() calls Blocked (16 recs / block) : 8K size buffer => average 125 READ() calls Can further improve upon performance by using block key containing last record key to avoid searching within blocks where data can’t be

Sequential Search: Best Uses When is Sequential Search Superior? Repetitive hits Searching for patterns in ASCII files Searching records with a certain secondary key value Small Search Set Processing files with few records Devices/media most hospitable to sequential access tape

Direct Access Access a record without searching O(1) operation RRN ( Relative Record Number ) Gives relative position of the record O(n) process with variable-length records Easy with fixed-length records: RRN*sizeof(record) View file as collection of records, not bytes; all byte info is internal Byte offset = N X R r : record size n : RRN value

Direct Access Class IOBuffer includes direct read (DRead) direct write (DWrite) take byte offset as argument, along with stream

Choosing Record Length and Structure Record length is related to the size of the fields Access vs. fragmentaion vs. implementation Fixed length record fixed-length fields variable-length fields Unused space portion is filled with null character in C e.g. delimited OHIO 10847115 7264.9 4133035 3 1180317COLUMBUS OHIO|10847115|7|264.9|41330|35|3|1|1803|17|COLUMBUS\0....\0

Header Records File as a Self-Describing Object General information about file date and time of recent update, number of records size of record, fields (fixed-length record & field) delimiter (variable-length field) Often placed at the beginning of the file

IO Buffer Class definition class IOBuffer Abstract base class for file buffers public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values

IO Buffer Class definition Full definition of buffer class hierarchy WriteHeader method : writes the header string at the beginning of the file. Possible strings: “Variable” “Fixed” Returns size of header written ReadHeader method : reads the header id string. Must be the expected record type, variable or fixed length If the string matches that subclass’ header string, returns size of header any other string causes return of –1  header doesn’t match buffer

IO Buffer Class definition Full definition of buffer class hierarchy DWrite/DRead methods : operates using the byte address of the record as the record reference. Methods begin by seeking to the requested spot.

File Access and File Organization There is difference between file access and file organization. Variable-length records Sequential access is suitable Fixed-length records Direct access and sequential access are possible File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access

Abstract Data Model Data object such as document, images, sound e.g. images, sound Abstract Data Model does not view data as it appears on a particular medium. application-oriented view application shielded from details of storage on medium How to specify a file’s content? Headers and Self-describing files e.g. images: jpg: ÿØÿà JFIF  gif: GIF89a e.g. sounds: mp3: ÿûD EQ¹à wav: RIFF$P WAVEfmt

Graphics Interchange Format Example: GIF Graphics Interchange Format Industry standard graphic format for on-screen viewing through the Internet and Web. Not meant to be used for printing. The best format for all images except scanned photographic images (use JPEG for these). GIF supports lossless compression.

http://faculty.kutztown.edu/spiegel/CSc402/demo/States/DelimText/

Metadata Data that describe the primary data in a file e.g. <Meta> in html Store in the header record Standard format As shown on next slide

Html: Metadata

Metadata <!DOCTYPE html> <html> <head> <meta name="description" content="My Web tutorials"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="Hege Refsnes"> <meta charset="UTF-8"> </head> <body> <p>All meta information goes in the head section...</p> </body> </html>

Mixing object Types in a file Each field is identified using “keyword = value” Index table with tags e.g.

Object-oriented file access Separate translating to and from the physical format and application (representation-independent file access) provide a function to handle access (OO style) encapsulate details read_image() is image file type independent; method determines file type Program find_star : read_image(“star1”, image) process image end find_star image : star1 star2 RAM Disk

Extensibility Advantage of using tags New type of object Identify object within files do not require a priori knowledge of the types of objects New type of object implement method for reading and writing in appropriate module (separate concerns) call the method.

Factor affecting Portability Differences among operating systems e.g. CR/LF in DOS Differences among languages physical layout of files may be constrained by language limitation Differences in machine architectures byte order: e.g. Unix: hton, ntoh Differences on platforms e.g. EBCDIC vs. ASCII

Achieving Portability Standardization Standard physical record format extensible, simple Standard binary encoding for data elements IEEE, XDR File structure conversion Number and text conversion Established, well-known methods of conversion

Achieving Portability File system difference Block size is 512 bytes on UNIX systems Block size is 2880 bytes on many non-UNIX systems UNIX and Portability UNIX support portability by being commonly available on a large number of platforms UNIX provides a utility called dd dd : facilitates data conversion