Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.

Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change our estimates. So, we treat our estimates as random variables Want a measure of how confident we are in our estimate. Calculate “Confidence Interval”

What is it? If know how data sampled We can construct a Confidence Interval for an unknown parameter, . A 95% C.I. gives a range such that true  is in interval 95% of the time. A 100(1-) C.I. captures true  (1-) of the time. Smaller , more sure true  falls in interval, but wider interval.

Example 1: Lead in Water Lead in drinking water causes serious health problems. To test contamination, require a control site. Problems: Lead concentration in control site? Estimate 95% confidence interval

Example 2: Gas Market Recall U.S. gas market question: By how much does gas consumption decrease when price increases? Our linear model: Estimate of  1 : -.04237. How confident are we in this estimate? Construct 90% C.I. for this estimate

If Data ~N(, 2 ) Since we don’t know , use t- distribution. 95% C.I. for  s is standard error of mean. t 97.5 is critical value of t distribution Draw on board (Prob = 2.5%)

t-distribution Similar to Normal Distribution Requires “degrees of freedom”. df = (# data points) – (# variables). E.g. mean of lead concentration, 8 samples, one variable: d.f.=7. Higher d.f., closer t is to Normal distribution.

If Distribution Unknown Can use “Bootstrapping”. 1. Draw large sample with replacement 2. Calculate mean 3. Repeat many times 4. Draw histogram of sample means 5. Calculate empirical 95% C.I. Requires no previous knowledge of underlying process

Lead Concentration 8 lead measurements: Mean=51.39, s=5.75, t 97.5 =2.365 Lower=51.39-(5.75)(2.365) Upper= 51.39+(5.75)(2.365) C.I. = [37.8,65.0] Using bootstrapped samples: C.I. = [40.8,62.08]

Gas Regression: S-Plus Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0898134 0.0507787 -1.7687217 0.0867802 PG -0.0423712 0.0098406 -4.3057672 0.0001551 Y 0.0001587 0.0000068 23.4188561 0.0000000 PNC -0.1013809 0.0617077 -1.6429209 0.1105058 PUC -0.0432496 0.0241442 -1.7913093 0.0830122 Residual standard error: 0.02680668 on 31 degrees of freedom Multiple R-Squared: 0.9678838 F-statistic: 233.5615 on 4 and 31 degrees of freedom, the p- value is 0

Gas Price Response b 2 =-.04237, s=.00984 90% C.I.: t 95 =1.695 (d.f.=37-5=32) C.I. = [-.0591,-.0256] Using bootstrapped samples: C.I. = [-.063,-.026] Response is probably between 2.5 gallons and 6 gallons.

Interpretation & Other Facts There is a 95% chance that the true average lead concentration lies in this range. There is a 90% chance that the true value of  1 lies in this range. Also can calculate “confidence region” for 2 or more variables.

Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.

Similar presentations

Presentation on theme: "Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.

Similar presentations

Presentation on theme: "Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change."— Presentation transcript:

Similar presentations

About project

Feedback