Data Structures & File Processing Topic 1 Introduction and Overview
Basic Terminology; Elementary Data Organization Data – values/set of values. Data item – single unit of values. Group items – divided into subitems. Elementary items – not divided. E.g., employee name may be divided into three subitems – first name, middle initial and last name – social security number – single item.
Basic Terminology; Elementary Data Organization Entity – something that has attributes/properties which may be assigned values. Values may be numeric/nonnumeric. Entities with similar attributes (e.g., all employees in organization) form entity set. Attributes: Name Age Sex Social Security Number Values: Rumaisya, Saifi 34 F 134-24-5533
Basic Terminology; Elementary Data Organization Each attribute of entity set has range of values – set of all possible values that could be assigned to the particular attribute. Term “information” sometimes used for data with given attributes, or, in other words, meaningful or processed data.
Basic Terminology; Elementary Data Organization Field – single elementary unit of information representing entity attribute. Record – collection of field values of an entity. File – collection of entities records in an entity set.
Basic Terminology; Elementary Data Organization Each record in file may contain many field items, but the value in a certain field may uniquely determine the record in the file. Such a field K is called a primary key, and values k1, k2, … in such a field are called keys or key values.
Basic Terminology; Elementary Data Organization Example: Automobile dealership inventory file, each record contains: Serial Number, Type, Year, Price, Accessories Primary key? Since each automobile has unique serial number
Basic Terminology; Elementary Data Organization Example: Organization membership file, each record contains: Name, Address, Telephone Number, Dues Owed Group items – Name, Address Primary key – Name Address and Telephone Number may not serve as primary key
Basic Terminology; Elementary Data Organization Records may also be classified according to length. File can have: Fixed-length records – all records contain same data items with same amount of space Variable-length records – different lengths of records. E.g., student records since different students take different number of courses. Usually have minimum and maximum length.
Data Structures Data structure – logical or mathematical model of a particular organization of data. Choice of data model depends on: Richness in structure to mirror actual relationships of data in real world Simplicity, one can effectively process data when necessary
Classification of Data Structures
Arrays Linear (or dimensional) array List of finite number n of similar data elements referenced by a set of n consecutive numbers, usually 1, 2, 3, …, n. Array name A, elements of A denoted by subscript a1, a2, a3, …, an A(1), A(2), A(3), …, A(N) A[1], A[2], A[3], …, A[N]
Arrays Number K in A[K] is called subscript A[K] is called subscripted variable Linear arrays are called one-dimensional arrays because each element in array is referenced by one subscript.
Arrays List weekly sales can be stored in two-dimensional array, first subscript denotes store and second subscript the department. SALES is the name given to the array
Linked List Brokerage firm
Linked List Integer used as a pointer requires less space than a name
Linked List Suppose the firm wants the list of customers for a given salesperson Using Fig. 1.5, the firm would have to search through the entire customer file
Linked List Disadvantage: each salesperson have many pointers and the set of pointers change as customers are added and deleted.
Linked List
Trees Rooted tree graph/tree – data contain hierarchical relationship between elements
Trees Example Record Structure: Employee personnel record contain data items: Social Security Number, Name, Address, Age, Salary, Dependents Group item Subitem Name Last, First, MI (middle initial) Address Street, Area Area City, State, ZIP code number
Trees Example Algebraic Expression: (2x + y)(a – 7b)3 Exponentiation – vertical arrow (↑) Multiplication – asterisk (*)
Trees Exponentiation take place after subtraction, multiplication at top of tree executed last.
Stack Also called last-in first-out (LIFO) system – linear list which insertions and deletions take place only at one end, called top New dishes inserted at top of stack and deleted only from top of stack
Queue Also called first-in first out (FIFO) system – linear list which deletions take place only at one end of list, “front” of list, and insertions take place only at other end of list, “rear” Line of people waiting at bus stop
Queue First person in line – first person to board the bus Automobiles waiting to pass through an intersection – first car in line – first car through
Graph Data contain relationship between pairs of elements which is not hierarchical E.g., airline flies only between cities connected by lines
Data Structure “record” – files “node” – linked list, trees and graphs
Data Structure Operations Data structure that one chooses depends on frequency with which specific operations are performed
Data Structure Operations Four operations: Traversing: Accessing each record exactly once so that items in record may be processed (sometimes called “visiting”) Searching: Finding location of record with given key value/finding locations of all records which satisfy one/more conditions Inserting: adding new record Deleting: removing a record
Data Structure Operations Two/more operations used; e.g., delete record with given key value, mean first need to search for location of record
Data Structure Operations Two operations: Sorting: arranging records in logical order (e.g., alphabetically according to NAME key/numerical order according to NUMBER key, such as social security/account number) Merging: combining records in two different sorted files into single