Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.

Database Management 7. course

Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats

Today System catalogue Hash-based indexing – Static – Extendible – Linear Time-cost of operations

System catalogue Special table Indexes – Type of the data structure and search key Tables – Name, filename, file structure (e.g. heap) – Attribute names, types – Integrity constraints – Index names Views – Name and definition Statistics, permissions, buffer size, etc.

Attr_Cat(attr_name, rel_name, type, position)

Hash-based indexing

Basic thought Index for every search key Hash function ( f ) between search key ( K ) and memory address ( A ): A = f ( K ) Ideally bijective: key is the address

Hashing Ideal for joining tables Just for equality check Many versions

Static hashing File~collection of buckets Bucket: one primary page and overflow pages File has N buckets: 0..N-1 Data entries – Data records with key k –

To identify the bucket hash function h is applied. In the bucket alternative search is applied Insertion h is used to find the proper bucket If there is not enough space, create an overflow chain to the bucket

In case of deletion h is used to locate tha bucket If the deleted was the last record, than page is removed Bucket number: h ( value ) mod N h ( value ) = ( a * value + b ) a and b are constants

Primary pages stored sequentially on the disk If the file grows a lot – Long overflow chain – Worsens the search – Create new file with more buckets! If the file shrinks a lot – A lot of space is wasted – Merge buckets!

Solution Ideally – 80% of the buckets is used – no overflow Periodically rehash the file – Takes time – Index cannot be used during rehashing Use dynamic hashing – Extendible Hashing – Linear Hashing

Extendible hashing

Like Static Hashing If a new entry is to be inserted to a full bucket – Double the number of buckets – Use directory of pointers (only the directory file has to be doubled) – Split only the overflowed bucket

Example

Insert 20*

Result

Insert 9*

Split bucket B

If bucket gets empty Merging buckets is also possible Not always done Decrease local depth

Storage Typical: 100 MB file 100 bytes/data entry Page size: 4KB 1,000,000 data entries 25,000 elements in the directory High chance that it will fit in memory  speed=speed of Static Hashing Otherwise twice slow Collision: entries with the same hash values (overflow pages are needed)

Linear Hashing Family of hash functions: h 0, h 1, … Each function's range is twice that of its predecessor E.g. h i (value) = h(value) mod (2 i N). d o :number of bits of N’s representation d i :d o +i Example: N=32, d o =5, h 1 is h mod (2*32), d 1 =6

Basic idea Rounds of splitting Number of actual round is Level Only h Level and are h Level+1 in use At any given point within a round we have – splitted buckets – buckets yet to be splitted – buckets created by splits in this round

Searching h Level is applied – If it leads to an unsplitted bucket, we look there – If it leads to a splitted bucket, we apply h Level+1 to decide in which bucket our data is Insertion may need overflow page If the overflow chain gets big then split is triggered

Example Level=0 round number N Level =N*2 Level number of buckets at the beginning of the L th round (N 0 =N)

If split is triggered, actual (Next) bucket is split and redistributed by h L+1 The new bucket gets to the end of the buckets Next is incremented by 1 Apply h Level and if the searched hash value is before Next then apply h Level+1 Continue: insert 43*, 37*, 29*, 22*, 66*, 34*, and 50*.

22, 66, 34

Deletion If the last bucket is empty, it can be removed Merging can be triggered for not empty buckets New round, merging: empty buckets are removed, Level is decremented Next=N Level /2-1

Comparison If Linear hashing is stored as Extendible Hashing function is similar to Extendible hashing (h i  h i+1 ~ doubling the directory) Extendible hashing: reduced number of splits and higher bucket occupancy

Linear hashing – Avoids directory structure – Primary pages are stored consecutively. Quicker equality selection. – Skewed distribution results in almost empty buckets

If directory structure for Linear hashing: one bucket=one directory Overflow pages are stored easily Overhead of a directory level Costly for large, uniformly distributed files Improves space occupancy

File organizations

Cost model To analyze the (time) cost of the DB operations No. of data pages: B Records/page: R Time of reading/writing: D=15ms (dominant) Time of record processing: C=100nanos Time of hashing: H=100nanos

Reduced calculation just for the I/O time 3 basic file organization: – Heap files – Sorted files – Hashed files

File operations Scan Search with equality selection (=) Search with range selection (>,<) Insert Delete

Heap files Scan the file: B ( D + RC ) Search with equality selection: – One result: in average B ( D + RC ) / 2 – Several results: search the entire file, B ( D + RC ) Search with range selection: B ( D + RC ) Insert: fetch the last page, add record, write back, 2D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing

Sorted files Scan: B ( D + RC ) Search with equality selection: – One result: D log 2 B + C log 2 R – Several results: D log 2 B + C log 2 R + no. of results Search with range selection: D log 2 B + C log 2 R + no. of results Insert: find place, insert, move the rest, write pages, search position + B ( D + RC ) in average Delete: find record, delete, move the rest, write pages, cost of searching + B ( D + RC ) B data pages R records/page D time of reading/writing C time of record processing

Hashed files No overflow pages 80% occupancy of buckets Scan the file: 1.25 * B ( D + RC ) Search with equality selection: in average H + D + RC/2 Search with range selection: 1.25 * B ( D + RC ) Insert: locate page, add record, write back, search + D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing H time of hashing

Summary Heap file: Storage +, modifying +, searching - Sorted file: Searching +, modifying - Hashed file: Modifying +, range selection --, storage - TypeScanEq. SearchRange search InsertDelete HeapBDBD/2BD2D2DSearch + D SortedBDDlog 2 BDlog 2 B + #matches Search + BD Hashed1.25BDD 2D2DSearch + D

Thank you for your attention! Book is uploaded: R. Ramakrishnan, J. Gehrke: Database Management Systems, 2nd edition

Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.

Similar presentations

Presentation on theme: "Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.

Similar presentations

Presentation on theme: "Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats."— Presentation transcript:

Similar presentations

About project

Feedback