Session #, Speaker Name Indexing Chapter 8 11/19/2018
Database Index Indexes are used to find rows with specific column values quickly. Without an index, DBMS must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, DBMS can quickly determine the position to seek to in the middle of the data file without having to look at all the data. Most DBMS indexes (PRIMARY KEY, UNIQUE, INDEX) are stored in B-trees.
Create index An index can be created in a table to find data more quickly and efficiently. The users cannot see the indexes, they are just used to speed up searches/queries. Updating a table with indexes takes more time than updating a table without (because the indexes also need an update). So you should only create indexes on columns (and tables) that will be frequently searched against.
Create index SQL CREATE INDEX Syntax SQL CREATE UNIQUE INDEX Syntax Creates an index on a table. Duplicate values are allowed: CREATE INDEX index_name ON table_name (column_name) SQL CREATE UNIQUE INDEX Syntax Creates a unique index on a table. Duplicate values are not allowed: CREATE UNIQUE INDEX index_name ON table_name (column_name) The primary key of the table should have an index. The more times an attribute is used in a query, the better a candidate it is for an index.
Drop Index DROP INDEX <index name> Despite the importance of indexes to DBMS performance, indexes are not part of the SQL standard. The rationale behind this decision is that creating indexes is part of the physical storage and access of the data. The SQL standard is limited to the logical description of the data, so indexes are not included. However, any production-grade DBMS must have indexes, and most will have a mechanism for you to add your own. If the syntax presented here does not work for you, check your DBMS documentation.
B-Tree Example
Operations B-Tree of order 4 Each node has at most 4 pointers and 3 keys, and at least 2 pointers and 1 key. Insert: 5, 3, 21, 9, 1, 13, 2, 7, 10, 12, 4, 8 Delete: 2, 21, 10, 3, 4
Insert 5, 3, 21 * 5 * a * 3 * 5 * a * 3 * 5 * 21 * a
Insert 9 a * 9 * b c * 3 * 5 * * 21 * Node a splits creating 2 children: b and c
Insert 1, 13 a * 9 * b c * 1 * 3 * 5 * * 13 * 21 * Nodes b and c have room to insert more elements
Insert 2 a * 3 * 9 * b d c * 1 * 2 * * 5 * * 13 * 21 * Node b has no more room, so it splits creating node d.
Insert 7, 10 a * 3 * 9 * b d c * 1 * 2 * * 5 * 7 * * 10 * 13 * 21 * Nodes d and c have room to add more elements
Insert 12 a * 3 * 9 * 13 * b d c e * 1 * 2 * * 5 * 7 * * 10 * 12 * * 21 * Nodes c must split into nodes c and e
Insert 4 a * 3 * 9 * 13 * b d c e * 1 * 2 * * 4 * 5 * 7 * * 10 * 12 * * 21 * Node d has room for another element
Insert 8 a * 9 * f g * 3 * 7 * * 13 * b d h c e * 1 * 2 * * 4 * 5 * * 8 * * 10 * 12 * * 21 * Node d must split into 2 nodes. This causes node a to split into 2 nodes and the tree grows a level.
Delete 2 * 9 * a f g * 3 * 7 * * 13 * b d h c e * 1 * * 4 * 5 * * 8 * * 10 * 12 * * 21 * Node b can loose an element without underflow.
Delete 21 * 9 * a f g * 3 * 7 * * 12 * b d h c e * 1 * * 4 * 5 * * 8 * * 10 * * 13 * Deleting 21 causes node e to underflow, so elements are redistributed between nodes c, g, and e
Delete 10 * 3 * 7 * 9 * a d b h e * 1 * * 4 * 5 * * 8 * * 12 * 13 * Deleting 10 causes node c to underflow. This causes the parent, node g to recombine with nodes f and a. This causes the tree to shrink one level.
Delete 3 * 4 * 7 * 9 * a d b h e * 12 * 13 * * 1 * * 5 * * 8 * Because 3 is a pointer to nodes below it, deleting 3 requires keys to be redistributed between nodes a and d.
Delete 4 * 7 * 9 * a b h e * 1 * 5 * * 8 * * 12 * 13 * Deleting 4 requires a redistribution of the keys in the subtrees of 4; however, nodes b and d do not have enough keys to redistribute without causing an underflow. Thus, nodes b and d must be combined.
Ordered Index
Two Types of Indices Ordered index (Primary index or clustering index) – which is used to access data sorted by order of values. Hash index (secondary index or non-clustering index ) - used to access data that is distributed uniformly across a range of buckets.
Hash Index