Presentation is loading. Please wait.

Presentation is loading. Please wait.

File and Database Concepts

Similar presentations


Presentation on theme: "File and Database Concepts"— Presentation transcript:

1 File and Database Concepts

2 Records Fixed or variable length records: pros and cons?
“A record is an organised collection of information about an object or item, with a unique identifier or key.” Ted Postlethwaite 10 West Hill Highgate London N6 3PY Fixed or variable length records: pros and cons? Ted Postlethwaite 10 West Hill Highgate London N6 3PY Ted Postlethwaite 10 West Hill Highgate London N6 3PY

3 Pros and Cons: Fixed and Variable Length Records
Fixed-Length Records Easy to implement Can predict where records will start and end so support for direct access Wastes space Can be inflexible as space allocated may become too small Variable-Length Records Uses disk space economically Flexible Difficult to implement Sequential access only

4 Key fields A key field is a piece of data that uniquely identifies a record Fields like surname or date of birth are not sufficient because they are not necessarily unique Most systems create a random number to serve as a key Advanced database systems will detect immediately if a key is not unique

5 Unordered Sequential Access
Records not in order. Search starts at the beginning of the file. Records read in the order that they are stored. Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher

6 Ordered Sequential Access
Records are in some order Search starts at the beginning of the file Arnold Christopher Hansen Harris Heim Hussein O’Hanlon Oworu Schmidt Shah Tooey Wickham Zachary

7 Ordered vs Unordered? Imagine you have a two files, each of 100,000 surnames, many of which are repeated. One file is unordered, the other is alphabetical. You want to find all occurrences of the name “Smith”.

8 Five-Minute Task How much of the unordered file do you have to search?
How much of the ordered file do you have to search? Geek question: In terms of search time, how much more efficient are ordered sequential files than unordered sequential files on average in this type of search?

9 Five-Minute Task Feedback: Unordered File
With an unordered file you would always have to search all of the file. Why? Because you have no way of knowing that the last record of the file isn’t a “Smith”!

10 Five-Minute Task Feedback: Ordered File
You would only have to search until the end of the “Smiths” Why? Because the file is in order, you know there are no more “Smiths” to look for.

11 Five-Minute Task Feedback: Geek Question
Ordered files would be approximately twice as efficient on average. Why? Of 100,000 sequenced records, you would sometimes need to search 1 records, sometimes 2, sometimes 3… sometimes 99,999, sometimes 100,000. The average of 1 … 100,000 is (100,001 / 2) ≈ 50,000. So on average you would have to search 50,000 records to find the block you want. With an unordered file we know we always have to search all 100,000 records. That’s twice as many as the ordered file, so an ordered file is twice as efficient for this type of search!

12 Direct Access Also called random access
Direct access to a single record with no need to search. Hashing algorithm creates disk address from record key. Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher Hashing Algorithm Hansen? “Collision” occurs when hashing algorithm tries to put two things in the same place!

13 Dealing with Collisions
Hashing algorithm gives the address at which to store the record. Overflow to next address if target address is full. Harris Shah Oworu Tooey Wickham Hussein Heim Schmidt Arnold O’Hanlon Zachary Christopher Hansen ‘Bucket’ Hashing Algorithm Hansen FULL Performance deteriorates over time as more overflows happen. Overflow

14 How Hashing Works There are lots of ways, but we’ll look at N Mod M.
Modulo (Mod) is a mathematical operation like , , , , which gives the remainder when one number is divided by another. Question: What is the highest possible remainder when a number is divided by M?

15 N Mod M Mod 11 We have 11 buckets, so M = 11
1243 1 1244 2 1234 3 1235 4 1236 5 1237 6 1238 7 1239 8 1240 9 1241 10 1242 We have 11 buckets, so M = 11 1234 1235 1236 1237 Mod 11 1238 Record Keys 1239 Buckets 1240 1241 1242 1243 1234 Mod 11 = 2, so the record with the key 1234 goes into bucket number 2. 1244

16 Indexed Sequential Wickham? Harris Shah Oworu Tooey Wickham Hussein
Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher Hashing Algorithm Supports direct access and sequential access. A table with more than one column can have more than one index. A very large index may have its own index (multi-level index). INDEX

17 Another method of direct access
Let's say we use fixed-length records Each record is 128 bytes long I know the memory address of the first record How do I get to the nth record? Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher I access the first record with an offset of 128n

18 The Power of Indexes From the phone book Contestant number one find…
Ibay, Aileen M Contestant number two find…

19 Partially indexed files
The file is ordered The index contains an entry of every nth record To access a record you sequentially search the index for the key just before the key you are seeking Then follow the index pointer to the memory address in the middle of the file Then search the file sequentially for the correct record Arnold Heim Schmidt Arnold Christopher Hansen Harris Heim Hussein O’Hanlon Oworu Schmidt Shah Tooey Wickham Zachary

20 Fully indexed files The file is unordered
To access a file you search the index sequentially Followed by direct access to the record Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher INDEX

21 Five-Minute Task In groups, think carefully about exactly what you do when you look up someone’s name in the phone book. Which access methods do you use? What real-world objects correspond to: The key The record

22 Five-Minute Task Feedback
You open the book somewhere in the middle: Direct Access You skip a few pages till you find the right one and then run your finger down the list to find the right name: Sequential Access These two add up to: Indexed Sequential Access The key must be the surname The record could either be the address and phone number or, if you consider the address as a pointer, the house itself!

23 Review So far we have looked at: Sequential Access (unordered)
Sequential Access (ordered) Direct Access (using hashing) Direct Access (using fixed-length records) Indexed Sequential Access

24 Five-minute task: Fill in the table
Access starts Can access records in sequence? Example Sequential (unordered) Sequential (ordered) Direct Access (hashing) Direct Access (fixed-length) Indexed Sequential

25 Five-minute task feedback: File Organisation Summary
Access starts Can access records in sequence? Example Sequential (unordered) Beginning of file HTTP log Sequential (ordered) Video tape Direct Access (hashing) Anywhere Booking system Direct Access (fixed length) if the file is ordered Account transactions Indexed Sequential Relational database table

26 Past Exam Question Higher Level Paper 2 May 2007

27 Past Exam Question Higher Level Paper 2 November 2008

28 Past Exam Question Higher Level Paper 2 May 2009


Download ppt "File and Database Concepts"

Similar presentations


Ads by Google