Statistics Profile For Query Optimization

Statistics Profile For Query Optimization
WENYI NI Spring 2004, CSE8330 Presentition 05/01/04

What is statistics profile?
Introduction What is statistics profile? Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. The status of the objects in a database is summarized by statistics. A statistics profile is the object that is used to make a collection those statistics. Spring 2004, CSE8330 Presentition 05/01/04

When DBMS use statistics profile?
From M.Tamer Oszu Cost Model Spring 2004, CSE8330 Presentition 05/01/04

What does statistics profile collect?
The central tendency of the data The range of the data The size of the data The distribution of the data Spring 2004, CSE8330 Presentition 05/01/04

Common types of statistics profile
Table profile Attribute profile Index profile Spring 2004, CSE8330 Presentition 05/01/04

Typical profiles Table profile Cardinality 500 Row size 30 Pages 100
Number of attributes 6 Attribute profile value 100 Max value Min value Size 5 Data distribution skew Index profile Pages 50 Size 5 Distinct values Spring 2004, CSE8330 Presentition 05/01/04

Three ways to collect statistics
Exhaustive accumulation Sampling Piggyback Spring 2004, CSE8330 Presentition 05/01/04

Exhaustive accumulation
Calculate every statistics describer through scanning the related object exhaustively Advantage Most Accurate Disadvantage Heavy system load Spring 2004, CSE8330 Presentition 05/01/04

Sampling Scan part of the related object. Estimate statistics through sample data Advantage Low system overhead Disadvantage Still have overhead. Statistics is not 100% accurate. Spring 2004, CSE8330 Presentition 05/01/04

Piggyback Collect statistics through data in memory. Slightly change SQL statement to make full use of these data. Types of piggyback Vertical piggyback Horizontal piggyback Mixed piggyback Spring 2004, CSE8330 Presentition 05/01/04

Vertical piggyback Include extra columns during query processing
Example: Select student.name from student; rewrite to: Select student.name,student.age Spring 2004, CSE8330 Presentition 05/01/04

No extra I/O, but extra cpu load. Solution: set piggyback level
AC1 = { x| x is a column in Table Ri referenced by Query Q} AC2 = { x| x is an index column in Table Ri } – AC1 AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC2 AC4 = { x| x is a column in Table Ri }-AC3 Advantage: Choose your piggyback level according to the CPU load Spring 2004, CSE8330 Presentition 05/01/04

Horizontal piggyback Include extra rows during query process Example:
Select student.name, student.score From student where score >60; Rewrite to: From student where score >60 or student.pid In (Select student.pid for student Where score>60); Advantage Spring 2004, CSE8330 Presentition 05/01/04

Mixed piggyback Use both vertical and horizontal piggyback method
Advantage Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

Value distribution Why we need it? Example: Select * from Student
Attribute profile: score Max 100 Min Size 10 Values 101 Distribution table 0~10: =1% 10~19: =1% 20~29: =1% 30~39: =3% 40~49: =6% 50~59: =10% 60~69: =10% 70~79: =31% 80~89: =30% 90~100: =10% Why we need it? Example: Select * from Student Where score>60; Size?? Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

Answer: Size = 500*0.81*30 = 121.5 Where 500 is the cardinality of the student table. 30 is the size of each record Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

How to get distribution table?
Histogram Equal width Equal height Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

Bucket number 1+ logn [rule of sturge 1927]
Example: student table ( 500 records) 1+log500 = 10 For equal width, put each value into the proper buckets For equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

Sampling How many sample do we need?
A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988) To gain same error rate for varies size of table, Sample rate drops when size of table grows. Drop rate: log(n)/n Example: 20 sample with 2%error rate on table with 100 records We need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records Vertical piggybacking increases the quantity of the updating statistics. Horizontal piggyback increases the quantity of the updating statistics Mixed piggyback expend the query with both direction. We can improve both the quantity and the quality in one query Spring 2004, CSE8330 Presentition 05/01/04

Summery & Future work Low overhead
Low error rate, still have room to improve The way to estimate the size of project and join operations with statistics still need be improved. Spring 2004, CSE8330 Presentition 05/01/04

The end Spring 2004, CSE8330 Presentition 05/01/04

Statistics Profile For Query Optimization

Similar presentations

Presentation on theme: "Statistics Profile For Query Optimization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistics Profile For Query Optimization

Similar presentations

Presentation on theme: "Statistics Profile For Query Optimization"— Presentation transcript:

Similar presentations

About project

Feedback