Download presentation
Presentation is loading. Please wait.
1
PROGRAMMING CONCEPTS CHAPTER 8
FILE PROCESSING CONCEPT
2
PROGRAMMING CONCEPTS CHAPTER 8
CONTENTS Introduction Primary Key Classification of Data Files By Content By Mode of Processing By Organization of files Serial Sequential Index Sequential Random Transformation Method Q&A
3
PROGRAMMING CONCEPTS CHAPTER 8
Introduction File Processing is a computer programming term refers to the use of computer files to store data in persistent memory/permanent storage Variables and arrays are temporary storage of data File processing is a useful alternative to a database only where the information is only going to be accessed by a single user, where speed of data input is vital and where the amount of data being stored is relatively small
4
PROGRAMMING CONCEPTS CHAPTER 8
Introduction Elements of Computer file A collection of info, stored on magnetic media/optical disks/pen drive Data files – similar in concepts Files can be created, updated and processed File contains logical record fields Characters There are 2 categories of record: Logical Record and Physical record. Logical records are referred to each line of data in a file. Physical record is defined as one or more logical records read into or written from main memory as a unit of information
5
Introduction – Data Hierarchy
PROGRAMMING CONCEPTS CHAPTER 8 Introduction – Data Hierarchy FILE FILE REC-1 Field-1 REC-2 REC-n Char-1 Field-n Field-2 Char-n Char-2 … ... Mimi HND IS Single 25 Anna 24 Ena HND CP Married 28 Minor LOGICAL RECORD Minor HND CP Single 24 FIELD Minor CHARACTER
6
PROGRAMMING CONCEPTS CHAPTER 8
Introduction The number of characters grouped into a field can vary from field to field in a record 2 types of record : fixed length Where each record has a fixed length e.g. 90 characters. Fields not completely filled will be padded with space characters resulting waste of space. variable length Where fields of record size vary according to the size of data contained in them. Special character called field separators are used to indicate the start and end of a record.
7
PROGRAMMING CONCEPTS CHAPTER 8
Introduction the information contained in the file is related to specific detail Different files are used to store different types of details – different types of details are not mixed into a single file Records are not usually transferred to and from main memory as single logical records but grouped together (as a block of logical records). When read, records are stored in a buffer temporarily. File normally ends with “end of file” marker.
8
PROGRAMMING CONCEPTS CHAPTER 8
Primary Key File always contains primary key (a field of the record which has unique value) to uniquely identify a particular record Primary Key is made up of one field or combination of two or more fields of the record Primary key allows easier/quicker search and retrieval of a particular record by matching the search key and the primary key.
9
Classification of Data file
PROGRAMMING CONCEPTS CHAPTER 8 Classification of Data file The way data files are used is dependent upon : the contents, mode of processing and organisation of the file
10
Classification according to content
PROGRAMMING CONCEPTS CHAPTER 8 Classification according to content 6 basic categories: Master File Transaction File Index File Table File Archival/History File Backup File
11
PROGRAMMING CONCEPTS CHAPTER 8
Master File contain permanent info of current status type. used for basic identification and accumulation of certain statistical data e.g. Product file, Staff file, Customer File etc. Transaction File Contain all the data and activities included on the master file. Accumulated records are used to update the master file e.g. invoices, purchase order etc. Updating method is batch
12
PROGRAMMING CONCEPTS CHAPTER 8
Index File Index files actually consist of a pair of files: one holding the data and one storing an index to that data. Used to indicate location of specific records in other files (usually master file) using an index key or address. Table File Static reference data used during processing e.g. pay rate table for preparation of payroll
13
Archival/History File
PROGRAMMING CONCEPTS CHAPTER 8 Archival/History File Often termed master files. Contain non-current statistical data – used to create comparative reports, pay commission etc. Normally updated periodically & involve large volume of data Back up File Non-current files stored in the file library Used when the current master file is destroyed
14
Classification according to processing mode
PROGRAMMING CONCEPTS CHAPTER 8 Classification according to processing mode Input Data loaded into CPU, processed, output placed in another file Output Data processed, written onto another file Overlay A record is accessed, loaded into CPU, updated, written back to the original location (overwrite the original value).
15
Classification according to organization of file
PROGRAMMING CONCEPTS CHAPTER 8 Classification according to organization of file File organization is how the records is stored, processed and accessed It has 3 functions: Storage of records. Maintenance of files (updating, editing, deleting) Enable retrieval of required items (searching).
16
Classification according to organization of file
PROGRAMMING CONCEPTS CHAPTER 8 Classification according to organization of file There are several types of file organization: Serial Sequential Indexed Sequential Random
17
PROGRAMMING CONCEPTS CHAPTER 8
Serial File Most simple form of file organization Records are not kept in any pre-determined order Records are position one after another new records are added to the bottom of the file regardless of what these rows contain This type of technique is normally used for storing records for further processing (eg. Sorting) Normally applied to storage on magnetic tape Accessing records is very slow
18
PROGRAMMING CONCEPTS CHAPTER 8
Sequential File more organised than a serial file records are kept in some pre-defined order - in the order of primary key e.g. books data are stored alphabetically according to their author Will not be necessary to search the whole file if the record is not present This is less flexible because if we are looking for books with authors whose names beginning with N, then we need to scan along from A until we come to N
19
PROGRAMMING CONCEPTS CHAPTER 8
Sequential File Data cannot be modified without the risk of destroying the other data in the file. E.g. if the name “Sam” needed to be changed to “Shaun”, the old name cannot simply be overwritten. The new record contains more characters than the original one. The characters beyond the ‘a’ in “Shaun” would overwrite the beginning of the next sequential record in the file. Suitable for storage on magnetic tape Sequential access is not usually used to update records in place. Instead the entire file usually rewritten. This requires processing every record in the file to update one record. NOTE : In both files (serial and sequential), individual records can only be found by reading the whole file until the required key value is located.
20
Indexed Sequential File
PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File basically a hybrid of sequential and random file organisation techniques (uses Sequential & random access method) Often referred to as ISAM (Indexed Sequential Access Method) Records are maintained in key sequence but have an index structure built on top of actual data The index to a (large) file may be split into different index levels – INDEX OF INDEXES Master Index – highest level index, contain pointers to the low level index
21
Indexed Sequential File
PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File Locating a particular record – following the index tree from master index to the target data block containing the target record. Block is read to locate the target record with matching key This organisation may be useful for auto-bank machines i.e. customers randomly access their accounts throughout the day and at the end of the day the banks can update the whole file sequentially One of the drawback of using this organization is the fact that several tables must be stored for the index which makes for a considerable storage overhead
22
Indexed Sequential File
PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File 044A 046E 047J 048E Locating record 7, which address is 050E Block 2 INDEX 049A 049T 050E 050K Block # Last rec key 2 048E 3 050K 4 051D Block 3 K 050J 050Z 051C 051D Block 4
23
Indexed Sequential File
PROGRAMMING CONCEPTS CHAPTER 8 Indexed Sequential File Multi-level structure Block # Last Rec Key 81 002A 82 004C . 007E 158 052A 159 058E 160 063X Locating record 100, which address is 053X Low-level index 2 Index Low Level Index # Last Rec Key 2 053X 3 098E 4 122A 052C 053X 056J 058E Block 159
24
PROGRAMMING CONCEPTS CHAPTER 8
Random File Records normally fixed in length Accessed directly without searching thru the preceding records Data can be inserted in a randomly accessed file without destroying other data in the file. Data previously stored can also be updated or deleted without rewriting the entire file/overwriting. Eg. Airline reservation systems, banking systems etc. Since every record is the same length, the computer can quickly calculates (as a function of the record key) the exact location of a record relative to the beginning of the file.
25
PROGRAMMING CONCEPTS CHAPTER 8
Random File Random file uses block address calculation algorithm Using this algorithm, the return is the block number with the record key as the input to the algorithm Problem is how to store data efficiently, so that by giving the record key, the storage location can be found. Keys are unlikely to run sequentially file has clusters and gaps. For example, storage is determined by key sequence in alphabetical order of first letter of customer name. Some of the letters are common eg. A, B, D but some are not e.g. Q, X. Need of a good algorithm to generate the uniform/consistent addresses – hashing algorithm
26
Transformation Method
PROGRAMMING CONCEPTS CHAPTER 8 Transformation Method 5 major techniques for hash coding Division Truncation Extraction Folding Randomizing All techniques aim to generate a uniformly distributed set of addresses which will map the keys to the storage area as uniformly as possible. Best known and most used technique– division Division is done by dividing the primary key by a positive integer, usually a prime number, which is approximately equal to the number of available addresses and use the remainder as the address
27
Transformation Method
Here are some relatively simple hash functions that have been used: The division-remainder method: The size of the number of items in the table is estimated. That number is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.) Folding: This method divides the original value (digits in this case) into several parts, adds the parts together, and then uses the last four digits (or some other arbitrary number of digits that will work ) as the hashed value or key.
28
Transformation Method
Radix transformation: Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length. Digit rearrangement: This is simply taking part of the original value or key such as digits in positions 3 through 6, reversing their order, and then using that sequence of digits as the hash value or key.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.