Advanced Database Systems: DBS CB, 2nd Edition

Advanced Database Systems: DBS CB, 2nd Edition
Disks and Files, Records and Blocks organization, Storage and Indices Ch Ch. 14

Course Content Cover advanced topics from the Textbook including:
Storage and Indices Query Processing & Optimization Concurrency Control Transaction Mgmt & Recovery Data Warehouse, OLAP, and Data Mining Parallel & Distributed Databases Cover briefly advanced research topics in RDBMS In addition, students (divided in groups) will research one of the above topics and produce a formal paper about their selected topic and gives a presentation to the class

Textbook Database Systems: The Complete Book, Second Edition, Authors: Hector Garcia-Molina, Jeffery Ullman, and Jennifer Widom Publisher: Pearson – Prentice Hall ISBN-13:

Outline Database Data Models Evolution Disks and Files
Records and Blocks Indices Single dimension indexes: B- tree and B+ tree Hash Tables: Extensible Hash Table, Linear Hash Table Multidimensional indexes: Hash structure: Grid Files, Partitioned Hash Tree Structure: Multi-key index, Kd-Tree, Quad Tree, and R-Tree Bitmap Indexes

Database Data Models Evolution
5 5 5

Database Data Models Evolution
Hierarchical 1960’s 1970's 1980's 1990’s 2000’s now Relational Object Bases Knowledge Bases Network XQuery SPARQL

Disks and Files 7 7 7

Overview of Storage and Indexing
Disk: components, access to disk block, arranging blocks on disk Disk RAID: concepts, levels 0 - 5 Disk Space Manager: lowest layer of DBMS; manages space on disk on one side and provides simple page model on the DB engine side Buffer Manager: pin/unpin, dirty-bit, Replacement policies (LRU, MRU, Clock, etc.) File, Page and Record Format: File Structure: Linked-list, Directory-based Page Structure: Fixed/Variable-length records Record Structure: Fixed-length/Variable-length System Catalog

Components of a Disk Platter: several platters (base). Platter has 2 surfaces. Disks are usually 3.5” (older disks are 5.25”). Track: co-centric ring on platter Disk Sector: usually 512B or 2048B for optical disks Disk Block (page): contiguous sequence of sectors; unit of R/W. Block size = N * sector size Cylinder: set of all tracks with the same diameter Disk Head: positioned over a Block before r/w function takes place. Controller: interfaces the disk drive to the computer Spindle: used to turn the disk (5400rpm – round per minute)

Accessing a Disk Block Seek time: varies between 1  20 msec
Rotational delay: varies between 0  10 msec Transfer rate: ~100+ MB/sec Key to lower I/O cost is to reduce seek/rotation delays Techniques: Sequential scan, pre-fetching several pages, etc. Disk Arrays: arrangement of several disks that gives Abstraction of a single large disk: Increase performance: build based on inexpensive and redundant components Increase Reliability: because of redundancy it can survive mechanical failures

Disk-RAID RAID 0: stripping RAID 1: Mirroring
RAID 0 +1: Striping Mirroring

Disk-RAID RAID 4: Block-Interleaved Parity
RAID 5: Block-Interleaved Distributed Parity

Files Buffer Manager: brings needed pages into RAM File:
Pages stays in RAM until released written back to disk Replacement policy decide which frame to replace Buffer Manager tries to pre-fetch several pages at a time File: Collection of pages, each contain a collection of records Must support Insert/Update/Delete operations Ability to read a particular record given a unique record-id Ability to scan all records possibly with some condition (predicate) on the records to be retrieved File Access Layer: Tracks pages in a file Supports abstraction of a collection of records Pages with free space are identified through a linked list or directory structure Indexes: provides an efficient retrieval of records based on value of some fields System Catalog: stores information about relations (tables), indexes and views within a schema

Files Types Unordered (Heap) File: simplest file structure containing records in no particular order. Can grow and shrinks. To support record level operations, we need to: Keep track of pages in a file Keep track of free space on pages Keep track of the records on a page The above can be done either using linked-list or directory-based organizations Linked-list Organization

File Types Directory-based Organization

Pages with Fixed-length Records
Packed: if record is deleted, move the last record in the page into the vacated slot Unpacked/Bitmap: keep M-Bitmap which indicates which sots are vacant

Pages with Variable-length Records
Can move records on the page w/o changing Record-id (RID); it is attractive for fixed length as well.

Schema and Records Fixed-format / Fixed-length Fields (for simplicity): Employee record (1) E#, 2 byte integer (2) E.name, 10 char Schema (3) Dept, 2 byte code 55 s m i t h 02 Records 83 j o n e s 01

Fields Management Within Records
Fixed-format / Fixed-length Fields: Information about field types are stored in the system catalog Finding specific field is simple (offset) Fixed-format / Variable-length Fields:

Variable-format / Variable-length Fields: Field name codes can be string as well Useful for: sparse records, repeating fields, and evolving format, but wastes space! 2 5 I 46 4 S 4 F O R D E# is Int type # Fields Code identifying E# value Code for Ename Ename is String type Length of string field as E#

Repeating fields do not imply: Variable format, nor Variable size Key is to allocate maximum number of repeating fields; NULL if not used Example: Variable format record with repeatable fields Variation of variable format: 3 E_name: Fred Child: Sally Child: Tom 5 27 Record type tells me what to expect (i.e., point to a schema) Record length

Variation of Fixed/Variable (Hybrid) Format: All employees have E#, name, Dept, and other fields may vary Variation of Record organization: Total size = 5 + ( ) = = 32

Records and Blocks 23 23 23

Storing Records in Disk Blocks
We have multiple options: Separating records Spanned vs. unspanned Mixed record types – clustering Split records Sequencing Indirection

1. Separating Records No need to separate fixed size records
length No need to separate fixed size records Use special marker for variable size records (delimiter) Give record length (or offset) Within each record In block header marker

2. Spanned vs. Unspanned Records
Unspanned records: must be within one block Spanned records: needs indication of partial record pointer to the rest. It is essential if record size > block size

3. Mixed Record Types - Clustering
Mixing records of different types that are frequently accessed together Compromise: no mixing but keep related records on the same cylinder

4. Split Records Typically for Hybrid format:

5. Sequencing Ordering records in file (and block) by some key value
Efficient for reading records sorted Options for implementing Sequencing: Next record is physically contagious Records are linked Overflow Area

6. Indirection Many options between Physical block address and Indirect block address Physical address: <Device-id, Cylinder#, Track#, Block#, Offset within block> Fully Indirect: RID is an arbitrary bit string Indirection in the Block Header: File-Id This Block-Id Record Directory Pointer to free space Type of block (data vs overflow) Pointer to other blocks in the file Timestamp

Records Operations Record Deletion: Record Insertion:
Immediately reclaim space Mark deleted; may need chain of deleted records Concern is dangling pointers Record Insertion: Sequenced records are a problem; use overflow if needed How much free space per block? Buffer Management: Pinned blocks Forced writing to disk (flushing) Which replacement policy to use? Double buffering Swizzling (pointer swizzling) Comparison of the Different Schemes:

Record Delete Operation
Dangling pointers! Solution – Tombstones: leave Mark in map or old location: Physical ID: Logical ID:

Record Insert Operation
How much free space to leave in each block, track, cylinder? How often do we need to reorganize file + overflow?

Record Buffer Management
Pointer Swizzling: ability to convert object reference by name to a memory address (obj name  mem addr), or reference object in memory by its disk block address (mem ref  disk block address) One option:

Record Buffer Management
In memory pointers (type bit): Swizzling (replacing) types: Automatic On-demand No swizzling / program control

Comparison of Schemes Strategy:
Compute space used for the expected data Expected time to: Fetch record given the key Fetch record with next key Insert record Append record Delete record Update record Read all file Reorganize the file How to lay out data on disk

Indexes: Single Dimensional Indexes
37 37 37

Single Dimensional Indexes Overview
20 10 40 30 60 50 80 70 100 90 110 120 Conventional Index: Sequential files Index file for Sequential files: dense, sparse Terms: Index file Search key: Primary index: index on the sorted key Secondary index Multi-level index Sparse: less index space (kept in memory) Sparse is better for insertion and primary index Dense: can tell if record exists w/o access to file Dense is better for secondary index B-tree and B+ tree Index Hash Table Index In SQL, cannot specify type of index (B-tree or hash) or parameters (size of hash, utilization factor, etc.)… Sequential File Dense Index Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index 110 130 150 170 190 210 230

Single Dimension: Conventional Index – Duplicate Keys
10 20 30 45 40 Better Dense Index Sequential File Dense Index 10 20 30 45 40 10 20 30 45 40 should this be 40? 10 20 30 45 40 careful if looking for 20 or 30! Sparse Index (pointer to each block) Sparse Index (place 1st new key from the block)

Single Dimension: Conventional Index – Deletion from Sparse Index
20 10 40 30 60 50 80 70 90 110 130 150 20 10 40 30 60 50 80 70 90 110 130 150 Delete record 40 Sparse Index 20 10 40 30 60 50 80 70 90 110 130 150 20 10 40 30 60 50 80 70 90 110 130 150 Delete record 30 Delete records 30 & 40

Single Dimension: Conventional Index – Deletion from Dense Index
20 10 40 30 60 50 80 70 20 10 40 30 60 50 80 70 Dense Index Delete record 30

Single Dimension: Conventional Index – Insert into Sparse/Dense Index
20 10 20 10 30 50 40 60 10 30 40 30 34 our lucky day! we have free space where we need it! Insert record 34 60 50 40 Sparse Index 60 20 10 30 50 40 60 25 overflow blocks (reorganize later...) 20 10 30 50 40 60 15 Insert record 15 Immediate reorganization: insert new block (chained file) update index Insert record 25

Single Dimension: Conventional Index – Secondary (not sorted) Indexes
Sequenced file 50 30 70 20 40 80 10 100 60 90 ... does not make sense! Sequenced file 50 30 70 20 40 80 10 100 60 90 ... sparse high level (Lowest level) Dense Secondary With Secondary Indexes: lowest level is dense, others are sparse

Single Dimension: Conventional Index – Secondary Indexes & Duplicates
10 20 40 30 50 60 ... buckets 10 20 40 30 50 60 ...  Problems: Need to add fields to records Need to follow chain to know records Why buckets are useful? Query: Get employees in (Toys Dept) ^ (2nd floor) Dept. index EMP Floor index Toy 2nd  Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s

Single Dimension: Conventional Index – Secondary Index - Inverted Indexes
Key word Index in Document Documents Inverted lists cat dog ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ... The idea is used in Information Retrieval Query: Find articles with “cat” and “dog” Find articles with “cat” or “dog” Find articles with “cat” and not “dog” Find articles with “cat” in “title” Find articles with “cat” and “dog” within 5 words

Single Dimension: Conventional Index – Secondary Index - Inverted Indexes
cat Title 5 100 Author 10 Abstract 57 12 d3 d2 d1 dog type position location Information Retrieval (IR) Discussion Stop words – words that are filtered out during NLP Stemming – is the process for reducing derived words to their stem. Search engines treat words with the same stem as synonyms Thesaurus – related words including synonyms, etc. Full text vs. abstract Vector model – algebraic model for representing a text document as vector of identifiers

Single Dimension: B-tree
B-tree of degree “t” will have minimum “t-1” keys and “t” children pointers B-tree of degree “t” will have maximum “2t-1” keys and “2t” children Root is an exemption, it can have minimum of “1” key and maximum “2t-1” keys Keys within a node are sorted in ascending order from left to right Leaf node has for every key a pointer to the record that contains that key, and pointer to right sibling. B-Tree non-leaf nodes have pointers to records as well Worst height of the tree = logt (N), where “N” is the total number of keys in the tree. Best height = log2t (N). B-tree is always balanced, i.e., equal height between root and any leaf On insertion to a full node having “2t-1” keys, the node is split first into two nodes each is “t-1” keys and the middle key gets promoted to the parent node

Number of keys/children for B-tree, degree “t”: Number of keys/children for B+ tree, order m (# of children) is defined in terms of “n” (# of keys per node); m = n + 1: Non-leaf (non-root) 2t 2t-1 t t - 1 Leaf Root 1 Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf Root 1 Max Max Min Min ptrs keys ptrsdata keys (n+1)/2

On key deletion in a minimum node with “t-1” keys, a node will merge with another node B-tree: intermediate node (non root or leaf) have keys B+ Tree: Record pointers are only in leafs B+ tree are perfect for range search as all keys are sorted in ascending order across leafs – left to right Leaf node has for every key a pointer to the record that contains that key, and also pointer to right sibling leaf node Root 100 120 150 180 30 3 5 11 35 101 110 130 156 179 200 B+ tree n = 3 Min = 1 key Max = 3 keys

B+ tree Operations (n = 3)
5 11 30 31 100 7 3 5 11 30 31 100 32 < ≥ Insert key = 7 (leaf overflow) Insert key = 32 ≥ < 10 20 30 1 2 3 12 25 32 40 45 Insert key = 45 new root 100 120 150 180 156 179 200 160 ≥ < ≥ < Insert key = 160 (Non-leaf overflow)

Single Dimension: Hash Table
Two basic alternative types of hashing: Example: Key = ‘x1 x2 … xn’ n byte character string Have b buckets H(key): add x1 + x2 + ….. + xn Compute sum modulo b Good hashing function produces equal number of keys per bucket Sorting keys within a bucket is not required; is a plus if search is frequent and Insert/Delete is not frequent Secondary Index record key 1 record . Buckets (typically 1 disk block) Key  h(key) Key  h(key)

Assume 2 records per bucket : Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If Util < 50% (waste), and if Util > 80% overflow is significant 1 2 3 d a c b INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1 e 1 2 3 a b c e d Delete: e f f g maybe move “g” up Overflow

How to deal with growth: Overflow and reorganization Dynamic hashing: Extensible hashing Use i of b-bits output by hash function: Use directory: Linear hashing Use i low order bits of hash: File grows linearly: b H(k)  . Use i  grows over time.. H(k)[i] To Bucket b grows i

Single Dimension: Extensible Hashing
Use i of b-bits output by hash function: Assume h(k) is (b = 4-bits) and 2 keys/bucket 0000 0111 0001 1 2 1001 1010 1100 Insert: 00 01 10 11 i = 1 0001 1001 1100 Insert 1010 1010 New directory 2 00 01 10 11 i= i = 1 i = 2

Single Dimension: Extensible Hashing
Use i of b-bits output by hash function (Contd.): Assume h(k) is (b = 4-bits) and 2 keys/bucket Deletion: No merging blocks Merge blocks and cut directory if possible (reverse of insert) 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100

Single Dimension: Extensible Hashing Summary
Can handle growing files - with less wasted space - with no full reorganizations Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) + -

Single Dimension: Linear Hashing
Linear hashing is another dynamic hashing type Since blocks cannot always split, overflow blocks are allowed. However, average # of overflow blocks per bucket will be much less than 1 The number of buckets “n” is always chosen so the average number of records/bucket is fixed fraction, e.g. 80% of # of records that fill the bucket Use “i” low order bits of b-bits output by the hash function. Assume “an…a2a1” treated as an integer = m The number of bits “i” used to number the bucket array is “i = log2 (n),” where n is the current number of buckets If (m < n) then bucket m exists If (n ≤ m < 2i) then bucket m does not exist, and we place the record in bucket (m – 2i-1), which is the bucket if we set a1 to 0

Use i low order bits of b-bits output hashing by the hash function: Assume h(k) is (b = 4-bits), i = 1, 2 keys/bucket, and threshold = 0.85 0101 1111 0000 1010 m =1 (max used bucket) Future growth buckets creates an overflow chains! insert 0101 0101 insert 0101 1111 0101 0000 1010 0101 10 11 Future growth buckets 1010 1111 m = 01 (max used bucket) Item inserted “0101” fits, but ratio = 4/2 = 2 which > 1.7; create a new bucket (n=3); i = log2(3)  2-bits Having item “0101” in bucket 1 fits, but ratio = 2/2 = 1 > 0.85 (exceed threshold) create an overflow block for bucket “1” i = 2 bits, buckets are (00, 01, 10, 11) Record “1111” in bucket “01” moves to bucket “11” Record “0101” moves back into bucket “01” Record “1010” moves to bucket “10” Ratio = 4/4 = 1 << 1.7; we are O.K. 

How to grow beyond 4 buckets? Expansion strategy: Keep track of U = # used slots total # of slots If U > threshold then increase m (and maybe i) 1111 1010 0101 0000 m = 11 (max used block) i = 2 3 . . . 100 101 For now, i =3 but # of buckets is 6 and not 8 yet

Indexes: Multidimensional Indexes
60 60 60

Multidimensional Indexes Overview
Single dimensional indexes assumes a single search key Even though search key can consist of multiple attributes, it remains that values must be provided for all attributes as the search key Multidimensional indexes do not assume that all values need to be provided for the search key attributes While possible to use in traditional RDBMS, specialized systems use data structures that support certain kinds of queries that are not common in SQL like GIS (Geographic Information System) Example: Find records where Dept = “Toy” AND Salary > $50K 1st strategy: use an index, say on Dept, and check salary for all output tuples 2nd strategy: use 2 indexes and manipulate pointers 3rd strategy: use multiple key index

Multiple key Index Name=Joe DEPT=Sales SAL=15k Art Sales Toy 10k 15k 17k 21k 12k 19k Dept Index Example Record Salary Index

GIS (Geographic data): Typical Multidimensional Queries: Partial match queries Range queries Nearest neighbor queries Where-am-I queries Queries: What city is at <Xi,Yi>? What is within 5 miles from <Xi,Yi>? Which is closest point to <Xi,Yi>? x y . . . DATA: <X1, Y1, attributes> <X2, Y2, attributes>

GIS Example: y h n b i a c o d e g f m l k j 25 15 35 20 40 30 10 h i a b d e n o j k Search points near f Search points near b Search for points with Yi > 20 Find points close to z = <12, 38> 5 15 X Y X X Y

Multidimensional Indexes: Hash Structures Key 2
Vn V1 V2 . X1 X2 …… Xn To records with key1=V3, key2=X2 Grid Files Index: Can quickly find records with: Key1=Vi ^ Key2=Xj Key1 = Vi Key2 = Xj Key1 ≥ Vi ^ Key2 < Xj How Grid Index stored on disk? Needs regularity to be able to compute position of <vi, Xj> Use Indirection: Grid can be regular w/o wasting space; penalty of indirection Grid can work on values range as well Like Array.. X1 X2 X3 X4 V1 V V3 -- X1 X2 X3 B u c k e t s V1 V2 V3 Grid only contains pointers to buckets Buckets

Multidimensional Indexes: Hash Structures
Multiple key Index Summary Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - -

Multidimensional Indexes: Hash Structures
Partition Hash Index: Example: h1(toy) =0 ……………… h1(sales) =1 ……………………… 001 h1(art) =1 ……………………… 010 . h2(10k) =01 … h2(20k) =11 ……………………… 101 h2(30k) =01 ……………………… 110 h2(40k) =00 ……………………… 111 Operations: Insert <Fred,toy,40k>, <Joe,sales,30k>, <Sally,art,40k> Find Emp with Dept = Sales ^ Sal = $40K  Sally Find Emp with Sal = $30K  Adam, Jan, Joe, John Key1 h1 h2 Key2 h1 h2 <Fred> <Joe><John> <Adam>, <Jan> <Mary> <Sally> <Tom><Bill> <Andy>

Multidimensional Indexes: Tree Structures
Multiple key Index: tree scheme where nodes at each level are indexes for one attribute, similar to example in pg 64 K-d trees (K-dimensional tree): main-memory data structure generalizing binary search tree to multidimensional data. Binary tree where interior nodes have an attribute (e.g., age, salary, etc.) and value V. The node splits the data into two parts: less than V and greater than V. Attributes are different in different levels of the tree are different. Original flavor allows data points to be placed in interior nodes. Another flavor allows data points only in leaf nodes! Salary, age, 47 Salary, 300 age, 38 Salary, 80 age, 60 70, , 140 50, , 120 45, 60 50, 75 25, 60 50, 275 60, 260 25, 400 30, 260

Quad trees: the interior node correspond to a square region in 2-dimensions or to a k-dimensional cube in k dimensions. If a node (block) can hold all the data points in its square, then the node is a leaf, otherwise, we treat the square as an interior nodes, with children corresponding to its fours quadrants Using the previous age, salary example 50, 200 400K SW SE NW NE 75, 100 25, 300 25, 60 45, 60 50, 275 60, 260 Salary SW SW SE NE NW NE NW SE 45, 60 50, 75 50, 120 70, 110 50, 275 60, 260 200K 30, 260 85, 40 <age, Salary> 50 100 Age

R-trees (region tree): it captures B-tree for multidimensional data. R-tree interior nodes instead of having keys it has subregions (not data regions) that represent the content of its children. Subregions are allowed to overlap. In principal, region can be of any shape. R-tree are ideal for “who-am-I” query type! We specify a data point and ask for which region or regions the point lie? Root is associated with the entire region. We search among children the node/region that contain the point we are looking for

Indexes: Bitmap Indexes
71 71 71

Bitmap Indexes Overview
Useful for indexing OLAP data Index on a particular column The unique values in a column are represented as a bit vector: bit-op is fast The height of the bit-vector: # of unique values for that column in the base table The appropriate bit in the ith bit-vector entry reflects the value in the ith row of the base table Not suitable for columns with high cardinality domains (attributes that has many discrete values) Index on Region Index on Type

Advanced Database Systems: DBS CB, 2nd Edition

Similar presentations

Presentation on theme: "Advanced Database Systems: DBS CB, 2nd Edition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Database Systems: DBS CB, 2nd Edition

Similar presentations

Presentation on theme: "Advanced Database Systems: DBS CB, 2nd Edition"— Presentation transcript:

Similar presentations

About project

Feedback