IDA / ADIT Databasteknik Databaser och bioinformatik Data structures and Indexing (I) Fang Wei-Kleiner.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Physical DataBase Design
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
1 Lecture 8: Data structures for databases II Jose M. Peña
Advance Database System
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
Predecessor to the Database: Traditional File Processing Records are stored in files. Programs are customized to process the data.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Efficient Storage and Retrieval of Data
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
CS 728 Advanced Database Systems Chapter 16
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
1 Disk Storage, Basic File Structures, and Hashing.
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
1 Rizwan Rehman Centre for Computer Studies Dibrugarh University.
1 Lecture 7: Data structures for databases I Jose M. Peña
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Lecture 11: DMBS Internals
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Disk Storage Copyright © 2004 Pearson Education, Inc.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Disk Storage, Basic File Structures, and Hashing
1 Physical Data Organization and Indexing Lecture 14.
Announcements Exam Friday Project: Steps –Due today.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Chapter 9 Disk Storage and Indexing Structures for Files Copyright © 2004 Pearson Education, Inc.
External data structures
1) Disk Storage, Basic File Structures, and Hashing This material is a modified version of the slides provided by Ramez Elmasri and Shamkant Navathe for.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Chapter 5 Record Storage and Primary File Organizations
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
Decomposition Storage Model (DSM) An alternative way to store records on disk.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.
File Organization Record Storage and Primary File Organization
Query Processing Part 1: Managing Disks 1.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Oracle SQL*Loader
9/12/2018.
Lecture 11: DMBS Internals
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Disk storage Index structures for files
File Storage and Indexing
Lecture 20: Indexes Monday, February 27, 2006.
Advance Database System
Presentation transcript:

IDA / ADIT Databasteknik Databaser och bioinformatik Data structures and Indexing (I) Fang Wei-Kleiner

IDA / ADIT Searching and Indexing There are 8 Million archives. Each archive consists of the personal information such as person number, name, address, age, … The task is given a person number, find the corresponding archive. 2

IDA / ADIT Searching and Indexing Constraint: the archives are stored in some place, you can go and fetch the archive (each time only one piece), but you can only read the information after you are back in your place. How much time do we need to find the archive given a person number n? 3

IDA / ADIT Sort the archives Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search!

IDA / ADIT Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search! Middle

IDA / ADIT Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search! Middle

IDA / ADIT Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search! Middle

IDA / ADIT Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search! Each time half of the documents are removed… Middle

IDA / ADIT Sort the archives according to the person number What is the time cost for finding the archive given a person number n?  Binary search! Each time half of the documents are removed… Middle Case 1

IDA / ADIT Now consider that we can put every 100 sorted archives into one box.  order remains! The number of boxes: /100=80000 Time cost for retrieving the document with a given person number?

IDA / ADIT Now consider that we can put every 100 sorted archives into one box.  order remains! The number of boxes: /100=80000 Time cost for retrieving the document with a given person number? Case 2

IDA / ADIT The number of boxes: Index entry contains: the smallest person number of the box, box number How about building Index for the boxes?

IDA / ADIT The number of boxes: Index entry contains: the smallest person number of the box, box number (i.e. the address of the box) Note that the person number entries are sorted. We store the index entries in boxes! Assume we can store 1000 index entries in each box. We need 80000/1000=80 boxes to store indexes box box2 … box box2 …

IDA / ADIT So we have data boxes + 80 index boxes. Searching? We search first the index boxes. Binary search, since they are ordered box box2 … box box2 … Index box Data box

IDA / ADIT So we have data boxes + 80 index boxes. Searching? We search first the index boxes. Binary search, since they are ordered. +1 means as soon as we find the entry in the index, we need to fetch the document box box2 … box box2 … Index box Data box Case 3

IDA / ADIT So we have data boxes + 80 index boxes. We can build index over index (second level index). Each index box  one entry So we need 80 index entries  that fits in one box box box2 … box box2 … Index box Data box

IDA / ADIT So we have data boxes + 80 index boxes. We can build index over index (second level index). Each index box  one entry So we need 80 index entries  that fits in one box. So we have data boxes + 80 index boxes + 1 index-index box box box2 … box box2 … Index-Index box Data box index box Index box2 … index box Index box2 … Index box

IDA / ADIT So we have data boxes + 80 index boxes. We can build index over index (second level index). Each index box  one entry So we need 80 index entries  that fits in one box. So we have data boxes + 80 index boxes + 1 index-index box. Search? box box2 … box box2 … Index-Index box Data box index box Index box2 … index box Index box2 … Index box Case 4

IDA / ADIT The number of boxes: Now we search for a document given only the name value  no person number. Time cost?? We have to go through all the boxes  worst case 80000, average 80000/2=

IDA / ADIT The documents are not sorted after name  we need for each document one entry in the index  ! Index entry contains: name, box number But we sort the names in the index of course! How about building index for the names?

IDA / ADIT Assume again we can store 1000 index entries in each box. We need /1000=8000 boxes to store indexes for the names  100 times more than the person number. So now we have data boxes index boxes for names. Search according to name? 21 Aalan John box764 Aalan Michael box964 … Aalan John box764 Aalan Michael box964 …

IDA / ADIT Assume again we can store 1000 index entries in each box. We need /1000=8000 boxes to store indexes for the names  100 times more than the person number. So now we have data boxes index boxes for names. Search according to name? 22 Aalan John box764 Aalan Michael box964 … Aalan John box764 Aalan Michael box964 … Case 5

IDA / ADIT So now we have data boxes index boxes for names. We can build a second level index  names are sorted in the indexes, so each index gets one entry in the second level index. Since each box fits 1000 entries, we need 8000/1000=8 second level index boxes  data boxes index boxes + 8 second level index boxes. 23 Aalan John box764 Aalan Michael box964 … Aalan John box764 Aalan Michael box964 …

IDA / ADIT Since each box fits 1000 entries, we need 8000/1000=8 second level index boxes  data boxes index boxes + 8 second level index boxes. Third level index  one more box. So searching on the 3 level indexes takes 3+1 times of fetching the boxes (3 times index box and 1 time document). 24 Aalan John box764 Aalan Michael box964 … Aalan John box764 Aalan Michael box964 … Case 6

IDA / ADIT 25 Storage hierarchy CPU Cache memory Main memory Disk Tape Primary storage (fast, small, expensive, volatile, accessible by CPU) Secondary storage (slow, big, cheap, permanent, inaccessible by CPU) Important because it effects query efficiency. Databases

IDA / ADIT 26Disk sector

IDA / ADIT 27 Disk Formatting divides the hard-coded sectors into equal-sized blocks. Block is the unit of transfer of data between disk and main memory, e.g. o Read = copy block from disk to buffer in main memory. o Write = the opposite way. o R/w time = seek time + rotational delay + block transfer time. search track search block 1-2 msec msec.

IDA / ADIT Inside a harddisk

IDA / ADIT 29 Files and records Data stored in files. File is a sequence of records. Record is a set of field values. For instance, file = relation, record = entity, and field = attribute. Records are allocated to file blocks.

IDA / ADIT 30 Files and records Let us assume o B is the size in bytes of the block. o R is the size in bytes of the record. o r is the number of records in the file. Blocking factor: Blocks needed: What is the space wasted per block ?

IDA / ADIT Files and records Wasted space per block = B – bfr * R. Solution: Spanned records. block i record 1 record 2 wasted block i record 1 record 2 record 3 p block i+1 record 3 record 4 record 5 block i+1 record 3 record 5 wasted Unspanned Spanned

IDA / ADIT 32 From file blocks to disk blocks Contiguous allocation: cheap sequential access but expensive record addition. Why ? Linked allocation: expensive sequential access but cheap record addition. Why ? Linked clusters allocation. Indexed allocation.

IDA / ADIT 33 File organization Heap files. Sorted files. Hash files. File organization != access method, though related in terms of efficiency.