Download presentation
Presentation is loading. Please wait.
Published byAnnis Patrick Modified over 9 years ago
1
Database Management 7. course
2
Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats
3
Today System catalogue Hash-based indexing – Static – Extendible – Linear Time-cost of operations
4
System catalogue Special table Indexes – Type of the data structure and search key Tables – Name, filename, file structure (e.g. heap) – Attribute names, types – Integrity constraints – Index names Views – Name and definition Statistics, permissions, buffer size, etc.
5
Attr_Cat(attr_name, rel_name, type, position)
6
Hash-based indexing
7
Basic thought Index for every search key Hash function ( f ) between search key ( K ) and memory address ( A ): A = f ( K ) Ideally bijective: key is the address
8
Hashing Ideal for joining tables Just for equality check Many versions
9
Static hashing File~collection of buckets Bucket: one primary page and overflow pages File has N buckets: 0..N-1 Data entries – Data records with key k –
10
To identify the bucket hash function h is applied. In the bucket alternative search is applied Insertion h is used to find the proper bucket If there is not enough space, create an overflow chain to the bucket
11
In case of deletion h is used to locate tha bucket If the deleted was the last record, than page is removed Bucket number: h ( value ) mod N h ( value ) = ( a * value + b ) a and b are constants
13
Primary pages stored sequentially on the disk If the file grows a lot – Long overflow chain – Worsens the search – Create new file with more buckets! If the file shrinks a lot – A lot of space is wasted – Merge buckets!
14
Solution Ideally – 80% of the buckets is used – no overflow Periodically rehash the file – Takes time – Index cannot be used during rehashing Use dynamic hashing – Extendible Hashing – Linear Hashing
15
Extendible hashing
16
Like Static Hashing If a new entry is to be inserted to a full bucket – Double the number of buckets – Use directory of pointers (only the directory file has to be doubled) – Split only the overflowed bucket
17
Example
18
Insert 20*
19
Result
20
Insert 9*
21
Split bucket B
22
If bucket gets empty Merging buckets is also possible Not always done Decrease local depth
23
Storage Typical: 100 MB file 100 bytes/data entry Page size: 4KB 1,000,000 data entries 25,000 elements in the directory High chance that it will fit in memory speed=speed of Static Hashing Otherwise twice slow Collision: entries with the same hash values (overflow pages are needed)
24
Linear Hashing Family of hash functions: h 0, h 1, … Each function's range is twice that of its predecessor E.g. h i (value) = h(value) mod (2 i N). d o :number of bits of N’s representation d i :d o +i Example: N=32, d o =5, h 1 is h mod (2*32), d 1 =6
25
Basic idea Rounds of splitting Number of actual round is Level Only h Level and are h Level+1 in use At any given point within a round we have – splitted buckets – buckets yet to be splitted – buckets created by splits in this round
27
Searching h Level is applied – If it leads to an unsplitted bucket, we look there – If it leads to a splitted bucket, we apply h Level+1 to decide in which bucket our data is Insertion may need overflow page If the overflow chain gets big then split is triggered
28
Example Level=0 round number N Level =N*2 Level number of buckets at the beginning of the L th round (N 0 =N)
29
If split is triggered, actual (Next) bucket is split and redistributed by h L+1 The new bucket gets to the end of the buckets Next is incremented by 1 Apply h Level and if the searched hash value is before Next then apply h Level+1 Continue: insert 43*, 37*, 29*, 22*, 66*, 34*, and 50*.
30
43
31
37
32
29
33
22, 66, 34
34
50
35
Deletion If the last bucket is empty, it can be removed Merging can be triggered for not empty buckets New round, merging: empty buckets are removed, Level is decremented Next=N Level /2-1
36
Comparison If Linear hashing is stored as Extendible Hashing function is similar to Extendible hashing (h i h i+1 ~ doubling the directory) Extendible hashing: reduced number of splits and higher bucket occupancy
37
Linear hashing – Avoids directory structure – Primary pages are stored consecutively. Quicker equality selection. – Skewed distribution results in almost empty buckets
38
If directory structure for Linear hashing: one bucket=one directory Overflow pages are stored easily Overhead of a directory level Costly for large, uniformly distributed files Improves space occupancy
39
File organizations
40
Cost model To analyze the (time) cost of the DB operations No. of data pages: B Records/page: R Time of reading/writing: D=15ms (dominant) Time of record processing: C=100nanos Time of hashing: H=100nanos
41
Reduced calculation just for the I/O time 3 basic file organization: – Heap files – Sorted files – Hashed files
42
File operations Scan Search with equality selection (=) Search with range selection (>,<) Insert Delete
43
Heap files Scan the file: B ( D + RC ) Search with equality selection: – One result: in average B ( D + RC ) / 2 – Several results: search the entire file, B ( D + RC ) Search with range selection: B ( D + RC ) Insert: fetch the last page, add record, write back, 2D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing
44
Sorted files Scan: B ( D + RC ) Search with equality selection: – One result: D log 2 B + C log 2 R – Several results: D log 2 B + C log 2 R + no. of results Search with range selection: D log 2 B + C log 2 R + no. of results Insert: find place, insert, move the rest, write pages, search position + B ( D + RC ) in average Delete: find record, delete, move the rest, write pages, cost of searching + B ( D + RC ) B data pages R records/page D time of reading/writing C time of record processing
45
Hashed files No overflow pages 80% occupancy of buckets Scan the file: 1.25 * B ( D + RC ) Search with equality selection: in average H + D + RC/2 Search with range selection: 1.25 * B ( D + RC ) Insert: locate page, add record, write back, search + D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing H time of hashing
46
Summary Heap file: Storage +, modifying +, searching - Sorted file: Searching +, modifying - Hashed file: Modifying +, range selection --, storage - TypeScanEq. SearchRange search InsertDelete HeapBDBD/2BD2D2DSearch + D SortedBDDlog 2 BDlog 2 B + #matches Search + BD Hashed1.25BDD 2D2DSearch + D
47
Thank you for your attention! Book is uploaded: R. Ramakrishnan, J. Gehrke: Database Management Systems, 2nd edition
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.