PROGRAMMING CONCEPTS CHAPTER 8

Slides:



Advertisements
Similar presentations
Chapter 2.7 Data management.
Advertisements

What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Chapter 8 File organization and Indices.
Information Processing Lecture 9B Criteria for File Organisation.
Hashing General idea: Get a large array
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
File Organization Techniques
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
Prof. Yousef B. Mahdy , Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Data and its manifestations. Storage and Retrieval techniques.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Now, please open your book to page 60, and let’s talk about chapter 9: How Data is Stored.
 2001 Prentice Hall Business Publishing, Accounting Information Systems, 8/E, Bodnar/Hopwood A field may be a single character or number, or it.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Organization of Information. Files, records and fields Paper files  computer files E.g. customer accounts information stored in a bank Customer name,
Chapter 11 File Processing. Objectives In this chapter, you will learn: –To be able to create, read, write and update files. –To become familiar with.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
FILE ORGANIZATION.
Chapter 5 Record Storage and Primary File Organizations
Topics Covered: File Components of file Components of file Terms used Terms used Types of business file Types of business file Operations on file Operations.
( ) 1 Chapter # 8 How Data is stored DATABASE.
Storage and File Organization
Sets and Maps Chapter 9.
Module 11: File Structure
Subject Name: File Structures
CHP - 9 File Structures.
Record Storage, File Organization, and Indexes
Indexing and hashing.
Hashing, Hash Function, Collision & Deletion
TMF1414 Introduction to Programming
Physical Database Design and Performance
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Chapter 11: File System Implementation
Data Structures Interview / VIVA Questions and Answers
Hashing CENG 351.
Database Management System
Subject Name: File Structures
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Hash Table.
Chapter 11: File System Implementation
Disk Storage, Basic File Structures, and Hashing
Database Implementation Issues
File organization and Indexing
Chapter 11: Indexing and Hashing
Chapter 11 – File Processing
Disk storage Index structures for files
FILE ORGANIZATION.
Hash Tables.
Chapter 10 Hashing.
Chapter 11: File System Implementation
MSIS 670: Object-Oriented Software Engineering
Programming Logic and Design Fourth Edition, Comprehensive
Indexing and Hashing Basic Concepts Ordered Indices
File Storage and Indexing
Fundamental of Programming (C)
CS202 - Fundamental Structures of Computer Science II
DATABASE IMPLEMENTATION ISSUES
Sets and Maps Chapter 9.
Indexing 4/11/2019.
File Organization.
Chapter 11: File System Implementation
Database Implementation Issues
What we learn with pleasure we never forget. Alfred Mercier
Chapter 11: Indexing and Hashing
Database Implementation Issues
Presentation transcript:

PROGRAMMING CONCEPTS CHAPTER 8 FILE PROCESSING CONCEPT

PROGRAMMING CONCEPTS CHAPTER 8 CONTENTS Introduction Primary Key Classification of Data Files By Content By Mode of Processing By Organization of files Serial Sequential Index Sequential Random Transformation Method Q&A

PROGRAMMING CONCEPTS CHAPTER 8 Introduction File Processing is a computer programming term refers to the use of computer files to store data in persistent memory/permanent storage Variables and arrays are temporary storage of data File processing is a useful alternative to a database only where the information is only going to be accessed by a single user, where speed of data input is vital and where the amount of data being stored is relatively small

PROGRAMMING CONCEPTS CHAPTER 8 Introduction Elements of Computer file A collection of info, stored on magnetic media/optical disks/pen drive Data files – similar in concepts Files can be created, updated and processed File contains logical record fields Characters There are 2 categories of record: Logical Record and Physical record. Logical records are referred to each line of data in a file. Physical record is defined as one or more logical records read into or written from main memory as a unit of information

Introduction – Data Hierarchy PROGRAMMING CONCEPTS CHAPTER 8 Introduction – Data Hierarchy FILE FILE REC-1 Field-1 REC-2 REC-n Char-1 Field-n Field-2 Char-n Char-2 … ... Mimi HND IS Single 25 Anna 24 Ena HND CP Married 28 Minor LOGICAL RECORD Minor HND CP Single 24 FIELD Minor CHARACTER

PROGRAMMING CONCEPTS CHAPTER 8 Introduction The number of characters grouped into a field can vary from field to field in a record 2 types of record : fixed length Where each record has a fixed length e.g. 90 characters. Fields not completely filled will be padded with space characters resulting waste of space. variable length Where fields of record size vary according to the size of data contained in them. Special character called field separators are used to indicate the start and end of a record.

PROGRAMMING CONCEPTS CHAPTER 8 Introduction the information contained in the file is related to specific detail Different files are used to store different types of details – different types of details are not mixed into a single file Records are not usually transferred to and from main memory as single logical records but grouped together (as a block of logical records). When read, records are stored in a buffer temporarily. File normally ends with “end of file” marker.

PROGRAMMING CONCEPTS CHAPTER 8 Primary Key File always contains primary key (a field of the record which has unique value) to uniquely identify a particular record Primary Key is made up of one field or combination of two or more fields of the record Primary key allows easier/quicker search and retrieval of a particular record by matching the search key and the primary key.

Classification of Data file PROGRAMMING CONCEPTS CHAPTER 8 Classification of Data file The way data files are used is dependent upon : the contents, mode of processing and organisation of the file

Classification according to content PROGRAMMING CONCEPTS CHAPTER 8 Classification according to content 6 basic categories: Master File Transaction File Index File Table File Archival/History File Backup File

PROGRAMMING CONCEPTS CHAPTER 8 Master File contain permanent info of current status type. used for basic identification and accumulation of certain statistical data e.g. Product file, Staff file, Customer File etc. Transaction File Contain all the data and activities included on the master file. Accumulated records are used to update the master file e.g. invoices, purchase order etc. Updating method is batch

PROGRAMMING CONCEPTS CHAPTER 8 Index File Index files actually consist of a pair of files: one holding the data and one storing an index to that data. Used to indicate location of specific records in other files (usually master file) using an index key or address. Table File Static reference data used during processing e.g. pay rate table for preparation of payroll

Archival/History File PROGRAMMING CONCEPTS CHAPTER 8 Archival/History File Often termed master files. Contain non-current statistical data – used to create comparative reports, pay commission etc. Normally updated periodically & involve large volume of data Back up File Non-current files stored in the file library Used when the current master file is destroyed

Classification according to processing mode PROGRAMMING CONCEPTS CHAPTER 8 Classification according to processing mode Input Data loaded into CPU, processed, output placed in another file Output Data processed, written onto another file Overlay A record is accessed, loaded into CPU, updated, written back to the original location (overwrite the original value).

Classification according to organization of file PROGRAMMING CONCEPTS CHAPTER 8 Classification according to organization of file File organization is how the records is stored, processed and accessed It has 3 functions: Storage of records. Maintenance of files (updating, editing, deleting) Enable retrieval of required items (searching).

Classification according to organization of file PROGRAMMING CONCEPTS CHAPTER 8 Classification according to organization of file There are several types of file organization: Serial Sequential Indexed Sequential Random

PROGRAMMING CONCEPTS CHAPTER 8 Serial File Most simple form of file organization Records are not kept in any pre-determined order Records are position one after another new records are added to the bottom of the file regardless of what these rows contain This type of technique is normally used for storing records for further processing (eg. Sorting) Normally applied to storage on magnetic tape Accessing records is very slow

PROGRAMMING CONCEPTS CHAPTER 8 Sequential File more organised than a serial file records are kept in some pre-defined order - in the order of primary key e.g. books data are stored alphabetically according to their author Will not be necessary to search the whole file if the record is not present This is less flexible because if we are looking for books with authors whose names beginning with N, then we need to scan along from A until we come to N

PROGRAMMING CONCEPTS CHAPTER 8 Sequential File Data cannot be modified without the risk of destroying the other data in the file. E.g. if the name “Sam” needed to be changed to “Shaun”, the old name cannot simply be overwritten. The new record contains more characters than the original one. The characters beyond the ‘a’ in “Shaun” would overwrite the beginning of the next sequential record in the file. Suitable for storage on magnetic tape Sequential access is not usually used to update records in place. Instead the entire file usually rewritten. This requires processing every record in the file to update one record. NOTE : In both files (serial and sequential), individual records can only be found by reading the whole file until the required key value is located.

Indexed Sequential File PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File basically a hybrid of sequential and random file organisation techniques (uses Sequential & random access method) Often referred to as ISAM (Indexed Sequential Access Method) Records are maintained in key sequence but have an index structure built on top of actual data The index to a (large) file may be split into different index levels – INDEX OF INDEXES Master Index – highest level index, contain pointers to the low level index

Indexed Sequential File PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File Locating a particular record – following the index tree from master index to the target data block containing the target record. Block is read to locate the target record with matching key This organisation may be useful for auto-bank machines i.e. customers randomly access their accounts throughout the day and at the end of the day the banks can update the whole file sequentially One of the drawback of using this organization is the fact that several tables must be stored for the index which makes for a considerable storage overhead

Indexed Sequential File PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File 044A 046E 047J 048E Locating record 7, which address is 050E Block 2 INDEX 049A 049T 050E 050K Block # Last rec key 2 048E 3 050K 4 051D Block 3 3 050K 050J 050Z 051C 051D Block 4

Indexed Sequential File PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File Multi-level structure Block # Last Rec Key 81 002A 82 004C . 007E 158 052A 159 058E 160 063X Locating record 100, which address is 053X Low-level index 2 Index Low Level Index # Last Rec Key 2 053X 3 098E 4 122A 052C 053X 056J 058E Block 159

PROGRAMMING CONCEPTS CHAPTER 8 Random File Records normally fixed in length Accessed directly without searching thru the preceding records Data can be inserted in a randomly accessed file without destroying other data in the file. Data previously stored can also be updated or deleted without rewriting the entire file/overwriting. Eg. Airline reservation systems, banking systems etc. Since every record is the same length, the computer can quickly calculates (as a function of the record key) the exact location of a record relative to the beginning of the file.

PROGRAMMING CONCEPTS CHAPTER 8 Random File Random file uses block address calculation algorithm Using this algorithm, the return is the block number with the record key as the input to the algorithm Problem is how to store data efficiently, so that by giving the record key, the storage location can be found. Keys are unlikely to run sequentially  file has clusters and gaps. For example, storage is determined by key sequence in alphabetical order of first letter of customer name. Some of the letters are common eg. A, B, D but some are not e.g. Q, X. Need of a good algorithm to generate the uniform/consistent addresses – hashing algorithm

Transformation Method PROGRAMMING CONCEPTS CHAPTER 8 Transformation Method 5 major techniques for hash coding Division Truncation Extraction Folding Randomizing All techniques aim to generate a uniformly distributed set of addresses which will map the keys to the storage area as uniformly as possible. Best known and most used technique– division Division is done by dividing the primary key by a positive integer, usually a prime number, which is approximately equal to the number of available addresses and use the remainder as the address

Transformation Method Here are some relatively simple hash functions that have been used: The division-remainder method: The size of the number of items in the table is estimated. That number is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.) Folding: This method divides the original value (digits in this case) into several parts, adds the parts together, and then uses the last four digits (or some other arbitrary number of digits that will work ) as the hashed value or key.

Transformation Method Radix transformation: Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length. Digit rearrangement: This is simply taking part of the original value or key such as digits in positions 3 through 6, reversing their order, and then using that sequence of digits as the hash value or key.