File and Database Concepts

Slides:



Advertisements
Similar presentations
Topic 7: File Organization. Definitions database  collection of related files file  collection of related records record  collection of related fields.
Advertisements

Methods of Access Serial Sequential Indexed Sequential Random Access
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
23/05/20151 Data Structures Random Access Files. 223/05/2015 Learning Objectives Explain Random Access Searches. Explain the purpose and operation of.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Chapter 11: File System Implementation
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
CpSc 3220 File and Database Processing Hashing. Exercise – Build a B + - Tree Construct an order-4 B + -tree for the following set of key values: (2,
1 Lecture 7: Data structures for databases I Jose M. Peña
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Comp 335 File Structures Hashing.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
©G. Millbery 2003File and Database ConceptsSlide 1 Module File and Database Concepts.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Data Indexing Herbert A. Evans.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CHP - 9 File Structures.
The Data Types and Data Structures
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
Indexing Goals: Store large files Support multiple search keys
Data Structures Using C++ 2E
CS522 Advanced database Systems
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Database Management Systems (CS 564)
Are they better or worse than a B+Tree?
Hashing CENG 351.
Database Management System
Subject Name: File Structures
Database Management Systems (CS 564)
Data Structures Using C++ 2E
Hashing Exercises.
Database Implementation Issues
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
Computer Science 2 Hashing
Hash Tables.
Resolving collisions: Open addressing
External Memory Hashing
Advance Database System
Variable Length Data and Records
RDBMS Chapter 4.
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
DATABASE IMPLEMENTATION ISSUES
INDEXING.
Indexing 4/11/2019.
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
File Organization.
File System Implementation
Database Implementation Issues
Indexing, Access and Database System Architecture
Advance Database System
Database Implementation Issues
Database Implementation Issues
Presentation transcript:

File and Database Concepts

Records Fixed or variable length records: pros and cons? “A record is an organised collection of information about an object or item, with a unique identifier or key.” Ted Postlethwaite 10 West Hill Highgate London N6 3PY Fixed or variable length records: pros and cons? Ted Postlethwaite 10 West Hill Highgate London N6 3PY Ted Postlethwaite 10 West Hill Highgate London N6 3PY

Pros and Cons: Fixed and Variable Length Records Fixed-Length Records Easy to implement Can predict where records will start and end so support for direct access Wastes space Can be inflexible as space allocated may become too small Variable-Length Records Uses disk space economically Flexible Difficult to implement Sequential access only

Key fields A key field is a piece of data that uniquely identifies a record Fields like surname or date of birth are not sufficient because they are not necessarily unique Most systems create a random number to serve as a key Advanced database systems will detect immediately if a key is not unique

Unordered Sequential Access Records not in order. Search starts at the beginning of the file. Records read in the order that they are stored. ► Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher

Ordered Sequential Access Records are in some order Search starts at the beginning of the file ► Arnold   Christopher Hansen Harris Heim Hussein O’Hanlon Oworu Schmidt Shah Tooey Wickham Zachary

Ordered vs Unordered? Imagine you have a two files, each of 100,000 surnames, many of which are repeated. One file is unordered, the other is alphabetical. You want to find all occurrences of the name “Smith”.

Five-Minute Task How much of the unordered file do you have to search? How much of the ordered file do you have to search? Geek question: In terms of search time, how much more efficient are ordered sequential files than unordered sequential files on average in this type of search?

Five-Minute Task Feedback: Unordered File With an unordered file you would always have to search all of the file. Why? Because you have no way of knowing that the last record of the file isn’t a “Smith”!

Five-Minute Task Feedback: Ordered File You would only have to search until the end of the “Smiths” Why? Because the file is in order, you know there are no more “Smiths” to look for.

Five-Minute Task Feedback: Geek Question Ordered files would be approximately twice as efficient on average. Why? Of 100,000 sequenced records, you would sometimes need to search 1 records, sometimes 2, sometimes 3… sometimes 99,999, sometimes 100,000. The average of 1 … 100,000 is (100,001 / 2) ≈ 50,000. So on average you would have to search 50,000 records to find the block you want. With an unordered file we know we always have to search all 100,000 records. That’s twice as many as the ordered file, so an ordered file is twice as efficient for this type of search!

Direct Access Also called random access Direct access to a single record with no need to search. Hashing algorithm creates disk address from record key. Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher Hashing Algorithm Hansen? “Collision” occurs when hashing algorithm tries to put two things in the same place!

Dealing with Collisions Hashing algorithm gives the address at which to store the record. Overflow to next address if target address is full. Harris Shah Oworu Tooey Wickham Hussein Heim Schmidt Arnold O’Hanlon Zachary Christopher Hansen ‘Bucket’ Hashing Algorithm Hansen FULL Performance deteriorates over time as more overflows happen. Overflow

How Hashing Works There are lots of ways, but we’ll look at N Mod M. Modulo (Mod) is a mathematical operation like , , , , which gives the remainder when one number is divided by another. Question: What is the highest possible remainder when a number is divided by M?

N Mod M Mod 11 We have 11 buckets, so M = 11 1243 1 1244 2 1234 3 1235 4 1236 5 1237 6 1238 7 1239 8 1240 9 1241 10 1242 We have 11 buckets, so M = 11 1234 1235 1236 1237 Mod 11 1238 Record Keys 1239 Buckets 1240 1241 1242 1243 1234 Mod 11 = 2, so the record with the key 1234 goes into bucket number 2. 1244

Indexed Sequential Wickham? Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher Hashing Algorithm Supports direct access and sequential access. A table with more than one column can have more than one index. A very large index may have its own index (multi-level index). INDEX

Another method of direct access Let's say we use fixed-length records Each record is 128 bytes long I know the memory address of the first record How do I get to the nth record? Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher I access the first record with an offset of 128n

The Power of Indexes From the phone book Contestant number one find… Ibay, Aileen M Contestant number two find… 294 1639

Partially indexed files The file is ordered The index contains an entry of every nth record To access a record you sequentially search the index for the key just before the key you are seeking Then follow the index pointer to the memory address in the middle of the file Then search the file sequentially for the correct record Arnold Heim Schmidt Arnold   Christopher Hansen Harris Heim Hussein O’Hanlon Oworu Schmidt Shah Tooey Wickham Zachary

Fully indexed files The file is unordered To access a file you search the index sequentially Followed by direct access to the record Harris Shah Oworu Tooey Wickham Hussein Heim Hansen Schmidt Arnold O’Hanlon Zachary Christopher INDEX

Five-Minute Task In groups, think carefully about exactly what you do when you look up someone’s name in the phone book. Which access methods do you use? What real-world objects correspond to: The key The record

Five-Minute Task Feedback You open the book somewhere in the middle: Direct Access You skip a few pages till you find the right one and then run your finger down the list to find the right name: Sequential Access These two add up to: Indexed Sequential Access The key must be the surname The record could either be the address and phone number or, if you consider the address as a pointer, the house itself!

Review So far we have looked at: Sequential Access (unordered) Sequential Access (ordered) Direct Access (using hashing) Direct Access (using fixed-length records) Indexed Sequential Access

Five-minute task: Fill in the table Access starts Can access records in sequence? Example Sequential (unordered) Sequential (ordered) Direct Access (hashing) Direct Access (fixed-length) Indexed Sequential

Five-minute task feedback: File Organisation Summary Access starts Can access records in sequence? Example Sequential (unordered) Beginning of file  HTTP log Sequential (ordered)  Video tape Direct Access (hashing) Anywhere Booking system Direct Access (fixed length) if the file is ordered Account transactions Indexed Sequential Relational database table

Past Exam Question Higher Level Paper 2 May 2007

Past Exam Question Higher Level Paper 2 November 2008

Past Exam Question Higher Level Paper 2 May 2009