Download presentation
Presentation is loading. Please wait.
Published bySølvi Bakke Modified over 5 years ago
1
CSE 6392 – Data Exploration and Analysis in Relational Databases
January 31, 2006
2
Example Problem Suppose you had the following tables: Employee
Employee-Sample Gender Salary Gender Salary
3
Possible Queries Some possible queries to get the average salary of all females in the company: Select avg(salary) from Employee where gender = “F” Select avg(salary) from Employee-Sample where gender = “F” Select count(*) as C, sum(salary) as S, S/C from Employee-Sample where gender = “F” Is there a difference between 2 and 3 in terms of results? No.
4
Estimator What is an estimator? What is an unbiased estimator?
Ex. count of a sample * (population/count) On the previous slide, 2 and 3 are estimators for 1. What is an unbiased estimator? Basically, an estimator that is not tilted towards the lower or higher side of the estimation Formally: is the estimator for some quantity x is an unbiased estimator if E[ ] = x.
5
Unbiased Estimators Example EFC is an unbiased estimator
select count(*) as FC from Employee where gender = “F” select count(*) * (N/n) as EFC from Employee-Sample with gender = “F” EFC is an unbiased estimator (N/n) is called the ‘ratio scale’
6
Unbiased Estimators (1)
Example select sum(salary) as TFS from Employee where gender = “F” select sum(salary)*(N/n) as ETFS from Employee-Sample where gender = “F” ETFS is an unbiased estimator Note: This is important to statisticians, but secondary for our purposes; we are more concerned about the error
7
Unbiased Estimators (2)
Example Select avg(salary) as AFS from Employee where gender = “F” Select count(*) as C, sum(salary) as S, EAFS=S/C from Employee-Sample where gender = “F” Is EAFS unbiased? Not necessarily. The use of 2 unbiased estimators does not make it unbiased (ratio estimation).
8
Probability Example: roll a die. How many times will you get 1, 2, 3, 4, 5 or 6?
9
Probability Density What is the probability that a random number generator will generate .43 (of numbers between 0 and 1)? Answer: 0% (1/infinity) What about between .43 and .53? Answer: 10% (1/10) The probability density is the area under the curve (integral) = 1. Any single number has a 0% probability, but an interval has a chance.
10
Probability Density Function
Proper distribution if integral = 1
11
Probability Example How many female employees (out of 50K employees)?
12
Probability Sample If we sampled another company where the actual number of females is 5K, the variance would decrease:
13
Relative Error In Approximate Query Processing, people use absolute error statistically, but relative error practically. relative error2 = (ETFC – TFC)2 TFC2
14
Central Limit Theorem The main point of this theorem is that it does not matter how it was originally distributed – the sample distribution will be normal. Normal distribution:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.