Data Warehousing Seminar Chapter 13 Indexing the Warehouse M.S. 2 Hyeyoung Cho
Oracle Storage Database Tablespace Data file Logical Physical Segment Extent OS block Oracle block
Data block and Row Data block Row header Free space Data Row header Column length Column value
What Is an Index? A structure separate from the table Stores the location of rows based on the specified column values Speed up the retrieval of rows by using a pointer Reduce disk I/O by using rapid path access method to locate the data quickly Used and maintained automatically by the Oracle Server
When to create an Index? The table is large The columns are often used as a condition in the query The column contains a wide range of value or a large number of null value Most queries expected to retrieve less than 5% of the rows Automatically created when define PRIMARY KEY or UNIQUE constraint
Classification of Indexes Logical(application perspective) Single column or concatenated Unique or nonunique Function-based Physical(storing perspective) B-tree Normal or reverse key Bitmap Partitioned or nonpartitioned
Single-column and Composite index Single-column indexes one column in the index key Composite index multiple columns in the index key Max : 32 , 1/3 of the data block size create index purchase1 on purchase (purchase_id) storage (initial 2m next 2m pctincrease 0) tablespace purch_ind1; create index purchase1 on purchase (purchase_id, purchase_date, total_amt) storage (initial 2m next 2m pctincrease 0) tablespace purch_ind1;
Unique or nonunique Unique Nonunique A single key point to only one row Nonunique A single key associated with multiple rows
Function-based indexes(1/3) Oracle 8i new feature Query rewrite privilege required Using functions or expressions involve one or more columns in the table Precomputes the value of the function or expression store it in the index Created as either a B-tree or a Bitmap index
Function-based indexes(2/3) Example1. Client names in a mixed case Index creation Statement predicate create index billing_upcl on billing (upper(client)) storage(initial 20m next 80m maxextents unlimited pctincrease 0) tablespace my_indexes; select bill_id, client, state_nm from billing where upper(client) = ‘MONSANTO’ ;
Function-based indexes(3/3) Example2. Commission exceeded 25% of salary for the certain time period Index creation Statement predicate create index sale_bc_amt on sale (comm/(base+comm)*100) … storage(initial 20m next 80m maxextents unlimited pctincrease 0) tablespace my_indexes; select sum(comm) from sale where comm/(base+comm) * 100 > 25 and tr_date between to_date(’01-MAY-2002’, ‘DD-MON-YYYY’) and to_date (’30-JUN-2002’, ‘DD-MON-YYYY’);
B-tree indexes(1/3) Traditional indexing technique! Stores a list of ROWID for each key A hierarchy of highest-level and lower level index blocks(root> branch> leaf) Leaf Entry Format Header : chaining info, row lock status, number of columns Key column length and value pairs ROWID : the key values (block num. row num. file num) Simplicity, Easy maintenance,High cardinality columns Suitable for exact match query and range query S
B-tree indexes(2/3) Structure Root block Branch block Branch block Leaf blocks Leaf blocks Leaf blocks Index entry header Key column length Key column value ROWID Index entry
B-tree indexes(3/3) Creating Normal B-Tree Indexes Create index employee_last_name_idx on employee(last_name) pctfree 30 storage (initial 200k next 200k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] index [schema.] index on [schema.] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT]
Reverse Key indexes(1/3) Reverse the bytes of each column indexed (except the ROWID) Spreading the work load across multiple blocks Unsuitable for range queries Use the keyword reverse
Reverse Key indexes(2/3) Index on EMPLOYEE(ID) EMPLOYEE table KEY ROWID ID (BLOCK# ROW# FILE#) --------- ------------------------------------- 1257 0000000000F. 0002. 0001 2877 0000000000F. 0006. 0001 4567 0000000000F. 0004. 0001 6657 0000000000F. 0003. 0001 8967 0000000000F. 0005. 0001 … … ID FIRST_NAME JOB --------- --------------------- ----------------- 7499 ALLEN SALESMAN 7369 SMITH CLERK 7521 WARD SALESMAN 7566 JONES MANAGER 7654 MARTIN SALESMAN … … …
Reverse Key indexes(3/3) Creating Reverse key index Create unique index orders_id_idx on orders(id) reverse pctfree 30 storage (initial 200k next 200k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] index [schema.] index on [schema.] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT] REVERSE
Bitmap indexes(1/4) Stores a bitmap for each key value For Low cardinality columns Leaf Entry Format Header : chaining information, row lock status, number of columns Key column length and value pairs Start ROWID , End ROWID : the first row and the last row pointed by the bitmap (block num. row num. file num) Bitmap : a string of bits depending on key value Create bitmap index person_region on person (region);
Bitmap indexes(2/4) Structure File 3 Table Block 10 Block 11 Index Create bitmap index person_region on person (region); Key startROWID endROWID Bitmap <Blue 10. 0. 3, 12. 8. 3, 1000100100010> <Green 10. 0. 3, 12. 8. 3, 0001010000100> <Red 10. 0. 3, 12. 8. 3, 0100000011000> <Yellow 10. 0. 3, 12. 8. 3, 0010001000001>
Bitmap indexes(3/4) Creating Bitmap index Create bitmap index person_region on person(region) tablespace indexes_prd pctfree 30 storage (initial 200k next 200k pctincrease 0 maxextents 50) tablespace indx; Create [UNIQUE] BITMAP index [schema.] index on [schema.] table(column [ASC | DESC] [, column [ASC | DESC] ] … ) [TABLESPACE tablespace] [PCTFREE integer] [INITRANS integer] [MAXTRANS integer] [storage – clause] [LOGGING | NOLOGGING] [NOSORT]
Bitmap indexes(4/4) Example : a bitmap index on the PERSON table RESION Bitmap Index Row Region NorthBitmap EastBitmap WestBitmap SouthBitmap 1 North 1 0 0 0 2 East 0 1 0 0 3 West 0 0 1 0 4 West 0 0 1 0 East 0 1 0 0 West 0 0 1 0 South 0 0 0 1 North 1 0 0 0
B-Tree index VS Bitmap index Suitable for high-cardinality columns Row-level locking Bitmap-segment-level locking Update on keys relatively inexpensive Update on keys very expensive More storage Less storage Inefficient for queries using OR predicates Efficient for queries using OR predicates Useful for OLPT Useful for data arehousing
B-Tree space VS Bitmap space Bitmap index use 1/100 of the space of the B-tree index! Unique Column Values Cardinality(%) B-Tree Space Bitmap Space 500,000 50.00 15.29 12.35 100,000 10.00 15.21 5.25 10,000 1.00 14.34 2.99 100 0.01 13.40 1.38 5 < 0.01 0.78 Table with 1,000,000 rows
Index-organized tables(IOT)(1/3) Merge the data and index pieces into the same segment No duplication of the values for the Key column Faster key-based access for queries involving exact match and range searches Must have a primary key Specify an overflow tablespace name and percentage Secondary indexes(Oracle 8i new feature)
Index-organized tables(IOT)(2/3) Regular table access IOT access Only One scan! Index Index ROWID Non-key columns Key columns Row header Table
Index-organized tables(IOT)(3/3) Creating Index-organized table create table sales ( office_cd number(3), qtr_end date, revenue number(10,2), review varchar2(1000) constraint sales_pk PRIMARY KEY (office_cd, qtr_end)) ORGANIZATION INDEX tablespace indx PCTTHRESHOLD 20 INCLUDING revenue OVERFLOW tablespace user_data;
Indexes on Partitioned Tables(1/4) An index in several segments Spread across many tablespaces Decreasing contention for index lookup Increasing manageability and scalability Used with partitioned tables Creating Index partition for each table partition
Indexes on Partitioned Tables(2/4) Local index Partition keys of the index match its underlying table Global index Partition keys of the index differ from its underlying table Prefixed index Left-most column in a partitioned index matches the left-most column in that index’s partition key. Nonprefixed index Left-most column in a partitioned index differ from the left-most column in that index’s partition key.
Indexes on Partitioned Tables(3/4) Creating a local partitioned index create table rumors( thorn_id number(10), rumor_id number(4), …) partition by range (rumor_id) (partition rumors_p001 values less than(41), partition rumors_p002 values less than(50), … partition rumors_pmax values less than(maxvalue)); create unique index rumors_u1 on rumors(thorn_id, rumor_id) local (partition rumors_u1_p001, partition rumors_u1_p002, … partition rumors_u1_pmax);
Indexes on Partitioned Tables(4/4) Creating a global partitioned index create table billing(bill_id number(10), region_id varchar2(3), …) partition by range (bill_id) (partition bill_p001 values less than(90000), partition bill_p002 values less than(130000), … partition bill_pmax values less than(maxvalue)); create unique index billing_u1 on billing(bill_id) global partition by range (bill_id) (partition bill_u1_p001 values less than(100000), partition bill_u1_p001 values less than(200000), … partition bill_u1_pmax values less than(maxvalue) );
Optimizer Histograms(1/2) Rule-based optimizer Uses an set of rules for ranking access path Syntax- and data dictionary-driven Cost-based optimizer Chooses least-cost(resource, time) path Statistics-driven analyze table table_name compute statistics;
Optimizer Histograms(2/2) Describe the data distribution of a particular column in more detail Better predicate selectivity estimate for unevenly distributed data Bucket : the number of distinctive column value analyze table table_name compute statistics for table for all Indexed columns size 6;
Optimizer Histograms(2/2) http://www.akadia.com/services/oratips/costbased_optimizer/optm.htm
Guidelines(1) Use NOLOGGING for large indexes creation Rebuilding index Use Different tablespace Converting a index into a reverse key index Index build times with and without NOLOGGING Rows in Table Indexes With NOLOGGING Without NOLOGGING 46,34013 13 57s 3m57s 1,094,8146 6 10m5s 24m36s 4,4013,309 4 27m54s 60m48s ALTER INDEX orders_region_id_idx REBUILD (REVERSE) Tablespace indx02 NOLOGGING;
Guidelines(2) Temporary workspace Sort space parameter Create during the life of a create index statement Dropped after the activity completes Sort space parameter SORT_AREA_SIZE Shared pool parameter shared_pool_size = 10000000
SGA(System Global Area) Oracle Instance SMON DBW0 PMON CKPT LGWR Background processes Memory structures INSTANCE SGA(System Global Area) Data buffer Cache Redo log Buffer Shared pool Library Cache Data Dictionary cache Databae PGA(Program Global Area) sort area, cursor state, session info, stack space User process Server process sql