Chap 5. Managing Files of Records. Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search.

Slides:



Advertisements
Similar presentations
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Advertisements

Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
File StructureSNU-OOPSLA Lab1 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 Chap 5. Managing Files of Records File Structures by Folk, Zoellick, and Ricarrdi.
File Organization & processing
Variable Length Data and Records Eswara Satya Pavan Rajesh Pinapala CS 257 ID: 221.
NetCDF An Effective Way to Store and Retrieve Scientific Datasets Jianwei Li 02/11/2002.
Chapter 12 File Management
CPSC 231 Managing Files of Records (D.H.) 1 Learning Objectives Concept of key - primary and secondary keys. Sequential versus direct access. RRN Use of.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
17 File Processing. OBJECTIVES In this chapter you will learn:  To create, read, write and update files.  Sequential file processing.  Random-access.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Folk/Zoellick/Riccardi, File Structures 1 Objectives: To get familiar with Alternatives in field and record organizations Object-oriented approach to buffered.
Managing Files of Records CS 3050, Spring /4/2007 Dr Melanie Martin.
File Structure Fundamentals (D.H.)1 Learning Objectives Field and record organization Index file C++ code that deals with field and record organization.
CSE Lectures 22 – Huffman codes
File StructureSNU-OOPSLA Lab.1 Chap 7. Indexing 서울대학교 컴퓨터공학과 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick, and Ricarrdi.
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Stream Handling Streams - means flow of data to and from program variables. - We declare the variables in our C++ for holding data temporarily in the memory.
File StructuresSNU-OOPSLA Lab.1 Chap4. Fundamental File Structure Concepts 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures by Folk, Zoellick,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Fundamental File Structure Concepts & Managing Files of Records
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
1 Real-World File Structures by Tom Davis Asst. Professor, Computer Science St. Edward's University 3001 South Congress Avenue Austin, Texas 78704
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization.
1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 Chap4. Fundamental File Structure Concepts. 2 Chapter Objectives  Introduce file structure concepts dealing with Stream files Reading and writing fields.
Operating Systems COMP 4850/CISG 5550 File Systems Files Dr. James Money.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure File Organization Organization of Records in Files.
CS4432: Database Systems II Record Representation 1.
FILE HANDLING IN C++.
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
File Processing - Fundamental concepts MVNC1 Fundamental File Structure Concepts Chapter 4.
Chapter 13 – C++ String Class. String objects u Do not need to specify size of string object –C++ keeps track of size of text –C++ expands memory region.
File I/O. fstream files File: similar to vector of elements Used for input and output Elements of file can be –character (text)struct –object (non-text.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright 2003 Scott/Jones Publishing Standard Version of Starting Out with C++, 4th Edition Chapter 12 Advanced File Operations.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
DCT1063 Programming 2 CHAPTER 3 FILE INPUT AND FILE OUTPUT Mohd Nazri Bin Ibrahim Faculty of Computer, Media & Technology TATi University College
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
12/15/2015Engineering Problem Solving with C++, Second Edition, J. Ingber 1 Engineering Problem Solving with C++, Etter Chapter 6 One-Dimensional Arrays.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
1 Today’s Objectives  Announcements Homework #5 is due next Monday, July 24. Since this date is so close to the end of the semester, no late assignments.
Learners Support Publications Working with Files.
Binary Files. Text Files vs. Binary Files Text files: A way to store data in a secondary storage device using Text representation (e.g., ASCII characters)
Comp 335 File Structures Fundamental File Structure Concepts.
Chapter 5 Record Storage and Primary File Organizations
CS212: Object Oriented Analysis and Design
17 File Processing.
Fundamental File Structure Concepts
Module 11: File Structure
CPSC 231 Managing Files of Records (D.H.)
FILE HANDLING IN C++.
Disk Storage, Basic File Structures, and Hashing
files Dr. Bhargavi Goswami Department of Computer Science
Topics Input and Output Streams More Detailed Error Testing
Variable Length Data and Records
Chap4. Fundamental File Structure Concepts
Files Management – The interfacing
Chap 5. Managing Files of Records
Real-World File Structures
VIJAYA PAMIDI CS 257- Sec 01 ID:102
Beyond Record Structures
Lecture 20: Representing Data Elements
Presentation transcript:

Chap 5. Managing Files of Records

Chapter Objectives  Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access Files access and file organization  Examine other kinds of the file structures in terms of Abstract data models Metadata Object-oriented file access Extensibility Examine issues of portability and standardization.

Contents 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization

Record Access  Record Key Canonical form : a standard form of a key –e.g. Ames or ames or AMES (need conversion) Distinct keys : uniquely identify a single record Primary keys, Secondary keys, Candidate keys –Primary keys should be dataless (not updatable) –Primary keys should be unchanging Social-securiy-number: good primary key –but, for all non-registered aliens

 O (n), n : the number of records  Use record blocking A block of several records fields < records < blocks O(n), but blocking decreases the number of seeking e.g records, 512 bytes length Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call Sequential Search (1)

Sequential Search (2)  UNIX tools for sequential processing cat, wc, grep  When sequential search is useful Searching patterns in ASCII files Processing files with few records Searching records with a certain secondary key value

Direct Access  O(1) operation  RRN ( Relative Record Number ) It gives relative position of the record  Byte offset = N X R r : record size, n : RRN value In fixed length records  Class IOBuffer includes direct read (DRead) direct write (DWrite)

Record length is related to the size of the fields Access vs. fragmentaion vs. implementation Fixed length record (a) With a fixed-length fields (b) With a variable-length fields Unused space portion is filled with null character in C Ames John 123 Maple Stillwater OK74075 Mason Alan 90 Eastgate Ada OK74820 Ames|John|123 Maple|Stillwater|OK|74075| Mason|Alan|90 Eastgate|Ada|OK|74820| Unused space (a) (b) Choosing a record length and structure

Header Records  General information about file date and time of recent update, count of the num of records  Header record is often placed at the beginning of the file  Header records are a widely used, important file design tool

IO Buffer Class definition(1) class IOBuffer Abstract base class for file buffers public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values

 The full definition of buffer class hierarchy write method : adds header to a file and return the number of bytes in the header read method : reads the header and check for consistency WriteHeader method : writes the string IOBuffer at the beginning of the file. ReadHeader method : reads the record size from the header and checks that its value is the same as that of the BufferSize member of the buffer object DWrite/DRead methods : operates using the byte address of the record as the record reference. Dread method begins by seeking to the requested spot. IO Buffer Class definition(2)

ReadHeader, WriteHeader Member Functions static const char * headerStr = "IOBuffer"; static const int headerSize = strlen (headerStr); int IOBuffer::ReadHeader (istream & stream) { char str[headerSize+1]; stream. seekg (0, ios::beg); stream. read (str, headerSize); if (! stream. good ()) return -1; if (strncmp (str, headerStr, headerSize)==0) return headerSize; else return -1; } int IOBuffer::WriteHeader (ostream & stream) const { stream. seekp (0, ios::beg); stream. write (headerStr, headerSize); if (! stream. good ()) return -1; return headerSize; }

WriteHeader Member Function – VariableLengthBuffer int VariableLengthBuffer :: WriteHeader (ostream & stream) const // write a buffer header to the beginning of the stream // A header consists of the //IOBUFFER header //header string //Variable sized record of length fields //that describes the file records { int result; // write the parent (IOBuffer) header result = IOBuffer::WriteHeader (stream); if (!result) return FALSE; // write the header string stream. write (headerStr, headerSize); if (!stream. good ()) return FALSE; // write the record description return stream. tellp(); }

ReadHeader Member Function - VariableLengthBuffer int VariableLengthBuffer :: ReadHeader (istream & stream) // read the header and check for consistency { char str[headerSize+1]; int result; // read the IOBuffer header result = IOBuffer::ReadHeader (stream); if (!result) return FALSE; // read the header string stream. read (str, headerSize); if (!stream.good()) return FALSE; if (strncmp (str, headerStr, headerSize) != 0) return FALSE; // read and check the record description return stream. tellg (); }

Read Member Function - VariableLengthBuffer int VariableLengthBuffer :: Read (istream & stream) // write the number of bytes in the buffer field definitions // the record length is represented by an unsigned short value { if (stream.eof()) return -1; int recaddr = stream. tellg (); Clear (); unsigned short bufferSize; stream. read ((char *)&bufferSize, sizeof(bufferSize)); if (! stream. good ()){stream.clear(); return -1;} BufferSize = bufferSize; if (BufferSize > MaxBytes) return -1; // buffer overflow stream. read (Buffer, BufferSize); if (! stream. good ()){stream.clear(); return -1;} return recaddr; }

Write Member Function - VariableLengthBuffer int VariableLengthBuffer :: Write (ostream & stream) const // write the length and buffer into the stream { int recaddr = stream. tellp (); unsigned short bufferSize; bufferSize = BufferSize; stream. write ((char *)&bufferSize, sizeof(bufferSize)); if (!stream) return -1; stream. write (Buffer, BufferSize); if (! stream. good ()) return -1; return recaddr; }

DRead, DWrite Member Functions int IOBuffer::DRead (istream & stream, int recref) // read specified record { stream. seekg (recref, ios::beg); if (stream. tellg () != recref) return -1; return Read (stream); } int IOBuffer::DWrite (ostream & stream, int recref) const // write specified record { stream. seekp (recref, ios::beg); if (stream. tellp () != recref) return -1; return Write (stream); }

Encapsulation Record I/O Ops in a Single Class(1)  Good design for making objects persistent provide operation to read and write objects directly  Write operation until now : two operation : pack into a buffer + write the buffer to a file  Class ‘RecordFile’ supports a read operation that takes an object of some class and writes it to a file. the use of buffers is hidden inside the class problem with defining class ‘RecordFile’: –how to make it possible to support files for different object types without needing different versions of the class

class BufferFile // file with buffers { public: BufferFile (IOBuffer &); // create with a buffer int Open(char * fname, int MODE); // open an existing file int Create (char * fname, int MODE); // create a new file int Close(); int Rewind();// reset to the first data record // Input and Output operations int Read(int recaddr = -1); int Write(int recaddr = -1); int Append(); // write the current buffer at the end of file protected: IOBuffer & Buffer; // reference to the file’s buffer fstream File;// the C++ stream of the file }; Usage: DelimFieldBuffer buffer; BufferFile file(buffer); file.open(myfile); file.Read(); buffer.Unpack(myobject); BufferFile Class Definition

template class RecordFile : public BufferFile { public: int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1 ); RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } };  Class ‘RecordFile’ uses C++ template features to solve the problem definition of the template class RecordFile Encapsulation Record I/O Operation in a Single Class(2)

// template method bodies template int RecordFile ::Read (RecType & record, int recaddr = -1) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template int RecordFile ::Write (const RecType & record, int recaddr = -1) { int result; result = record. Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); }

There is difference between file access and file organization. Variable-length records Sequential access is suitable Fixed-length records Direct access and sequential access are possible File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization

Abstract Data Model  Data object such as document, images, sound e.g. color raster images, FITS image file  Abstract Data Model does not view data as it appears on a particular medium.  application-oriented view  Headers and Self-describing files

Metadata  Data that describe the primary data in a file  A place to store metadata in a file is the header record  Standard format FITS (Flexible Image Transport System) by International Astronomers’ union (see Figure 5.7)

Mixing object Types in a file  Each field is identified using “keyword = value”  Index table with tags e.g.

Object-oriented file access  Separate translating to and from the physical format and application (representation- independent file access) RAM image : star1star2 Disk Program find_star : read_image(“star1”, image) process image : end find_star

Extensibility  Advantage of using tags Identify object within files is that we do not have to know a priori what all of the objects will look like  When we encounter new type of object, we implement method for reading and writing that object and add the method.

Factor affecting Portability  Differences among operating system  Differences among language  Differences in machine architecture  Differences on platforms EBCDIC and ASCII

Achieving Portability (1)  Standardization Standard physical record format –extensible, simple Standard binary encoding for data elements –IEEE, XDR  File structure conversion  Number and text conversion

Achieving Portability (2)  File system difference Block size is 512 bytes on UNIX systems Block size is 2880 bytes on non-UNIX systems  UNIX and Portability UNIX support portability by being commonly available on a large number of platforms UNIX provides a utility called dd –dd : facilitates data conversion

Portability  화일 공유 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능 이식성 (Portability) 과 표준화 (Standardization)  이식성에 영향을 주는 요인들 두 회사가 화일을 공유 –A 회사 : sun 컴퓨터, C 프로그래밍, B 회사 : IBM PC 에서 Turbo PASCAL 프로그래밍 운영체제 사이의 차이점들 – 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음 프로그래밍 언어들 사이의 차이점들  화일 공유 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능 이식성 (Portability) 과 표준화 (Standardization)  이식성에 영향을 주는 요인들 두 회사가 화일을 공유 –A 회사 : sun 컴퓨터, C 프로그래밍, B 회사 : IBM PC 에서 Turbo PASCAL 프로그래밍 운영체제 사이의 차이점들 – 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음 프로그래밍 언어들 사이의 차이점들

Portability  이식성의 달성 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름 – 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으 로 같게 표현되는 것 –ex) FITS 데이터 요소를 위한 표준 이진 코드화에 동의 – 기본적 데이터 요소 : 텍스트, 숫자 –ex) IEEE 표준형식과 XDR  이식성의 달성 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름 – 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으 로 같게 표현되는 것 –ex) FITS 데이터 요소를 위한 표준 이진 코드화에 동의 – 기본적 데이터 요소 : 텍스트, 숫자 –ex) IEEE 표준형식과 XDR

Portability  변환 1: 직접 변환 형태  변환 2 : 중간 표준 형태  변환 1: 직접 변환 형태  변환 2 : 중간 표준 형태 IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC XDR

Let’s Review !!! 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization