Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.

Similar presentations


Presentation on theme: "Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats."— Presentation transcript:

1 Database Management 7. course

2 Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats

3 Today System catalogue Hash-based indexing – Static – Extendible – Linear Time-cost of operations

4 System catalogue Special table Indexes – Type of the data structure and search key Tables – Name, filename, file structure (e.g. heap) – Attribute names, types – Integrity constraints – Index names Views – Name and definition Statistics, permissions, buffer size, etc.

5 Attr_Cat(attr_name, rel_name, type, position)

6 Hash-based indexing

7 Basic thought Index for every search key Hash function ( f ) between search key ( K ) and memory address ( A ): A = f ( K ) Ideally bijective: key is the address

8 Hashing Ideal for joining tables Just for equality check Many versions

9 Static hashing File~collection of buckets Bucket: one primary page and overflow pages File has N buckets: 0..N-1 Data entries – Data records with key k –

10 To identify the bucket hash function h is applied. In the bucket alternative search is applied Insertion h is used to find the proper bucket If there is not enough space, create an overflow chain to the bucket

11 In case of deletion h is used to locate tha bucket If the deleted was the last record, than page is removed Bucket number: h ( value ) mod N h ( value ) = ( a * value + b ) a and b are constants

12

13 Primary pages stored sequentially on the disk If the file grows a lot – Long overflow chain – Worsens the search – Create new file with more buckets! If the file shrinks a lot – A lot of space is wasted – Merge buckets!

14 Solution Ideally – 80% of the buckets is used – no overflow Periodically rehash the file – Takes time – Index cannot be used during rehashing Use dynamic hashing – Extendible Hashing – Linear Hashing

15 Extendible hashing

16 Like Static Hashing If a new entry is to be inserted to a full bucket – Double the number of buckets – Use directory of pointers (only the directory file has to be doubled) – Split only the overflowed bucket

17 Example

18 Insert 20*

19 Result

20 Insert 9*

21 Split bucket B

22 If bucket gets empty Merging buckets is also possible Not always done Decrease local depth

23 Storage Typical: 100 MB file 100 bytes/data entry Page size: 4KB 1,000,000 data entries 25,000 elements in the directory High chance that it will fit in memory  speed=speed of Static Hashing Otherwise twice slow Collision: entries with the same hash values (overflow pages are needed)

24 Linear Hashing Family of hash functions: h 0, h 1, … Each function's range is twice that of its predecessor E.g. h i (value) = h(value) mod (2 i N). d o :number of bits of N’s representation d i :d o +i Example: N=32, d o =5, h 1 is h mod (2*32), d 1 =6

25 Basic idea Rounds of splitting Number of actual round is Level Only h Level and are h Level+1 in use At any given point within a round we have – splitted buckets – buckets yet to be splitted – buckets created by splits in this round

26

27 Searching h Level is applied – If it leads to an unsplitted bucket, we look there – If it leads to a splitted bucket, we apply h Level+1 to decide in which bucket our data is Insertion may need overflow page If the overflow chain gets big then split is triggered

28 Example Level=0 round number N Level =N*2 Level number of buckets at the beginning of the L th round (N 0 =N)

29 If split is triggered, actual (Next) bucket is split and redistributed by h L+1 The new bucket gets to the end of the buckets Next is incremented by 1 Apply h Level and if the searched hash value is before Next then apply h Level+1 Continue: insert 43*, 37*, 29*, 22*, 66*, 34*, and 50*.

30 43

31 37

32 29

33 22, 66, 34

34 50

35 Deletion If the last bucket is empty, it can be removed Merging can be triggered for not empty buckets New round, merging: empty buckets are removed, Level is decremented Next=N Level /2-1

36 Comparison If Linear hashing is stored as Extendible Hashing function is similar to Extendible hashing (h i  h i+1 ~ doubling the directory) Extendible hashing: reduced number of splits and higher bucket occupancy

37 Linear hashing – Avoids directory structure – Primary pages are stored consecutively. Quicker equality selection. – Skewed distribution results in almost empty buckets

38 If directory structure for Linear hashing: one bucket=one directory Overflow pages are stored easily Overhead of a directory level Costly for large, uniformly distributed files Improves space occupancy

39 File organizations

40 Cost model To analyze the (time) cost of the DB operations No. of data pages: B Records/page: R Time of reading/writing: D=15ms (dominant) Time of record processing: C=100nanos Time of hashing: H=100nanos

41 Reduced calculation just for the I/O time 3 basic file organization: – Heap files – Sorted files – Hashed files

42 File operations Scan Search with equality selection (=) Search with range selection (>,<) Insert Delete

43 Heap files Scan the file: B ( D + RC ) Search with equality selection: – One result: in average B ( D + RC ) / 2 – Several results: search the entire file, B ( D + RC ) Search with range selection: B ( D + RC ) Insert: fetch the last page, add record, write back, 2D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing

44 Sorted files Scan: B ( D + RC ) Search with equality selection: – One result: D log 2 B + C log 2 R – Several results: D log 2 B + C log 2 R + no. of results Search with range selection: D log 2 B + C log 2 R + no. of results Insert: find place, insert, move the rest, write pages, search position + B ( D + RC ) in average Delete: find record, delete, move the rest, write pages, cost of searching + B ( D + RC ) B data pages R records/page D time of reading/writing C time of record processing

45 Hashed files No overflow pages 80% occupancy of buckets Scan the file: 1.25 * B ( D + RC ) Search with equality selection: in average H + D + RC/2 Search with range selection: 1.25 * B ( D + RC ) Insert: locate page, add record, write back, search + D + C Delete: find record, delete, write page, cost of searching + C + D B data pages R records/page D time of reading/writing C time of record processing H time of hashing

46 Summary Heap file: Storage +, modifying +, searching - Sorted file: Searching +, modifying - Hashed file: Modifying +, range selection --, storage - TypeScanEq. SearchRange search InsertDelete HeapBDBD/2BD2D2DSearch + D SortedBDDlog 2 BDlog 2 B + #matches Search + BD Hashed1.25BDD 2D2DSearch + D

47 Thank you for your attention! Book is uploaded: R. Ramakrishnan, J. Gehrke: Database Management Systems, 2nd edition


Download ppt "Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats."

Similar presentations


Ads by Google