We are interested in methods that produce an interval: Common interval methods for: Confidence intervals Prediction intervals Tolerance intervals Credibility/Probability.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Confidence Intervals This chapter presents the beginning of inferential statistics. We introduce methods for estimating values of these important population.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
Sampling Distributions. Review Random phenomenon Individual outcomes unpredictable Sample space all possible outcomes Probability of an outcome long-run.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
7-2 Estimating a Population Proportion
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Section 8.2 Estimating a Population Proportion
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics, A First Course.
Interval Estimates A point estimate gives a plausible single number estimate for a parameter. We may also be interested in a range of plausible values.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Introduction to Statistical Inferences
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Statistical Interval for a Single Sample
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Introduction  Populations are described by their probability distributions and parameters. For quantitative populations, the location and shape are described.
Estimation: Confidence Intervals Based in part on Chapter 6 General Business 704.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Chapter 8: Confidence Intervals based on a Single Sample
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chapter 13 Sampling distributions
Point Estimates. Remember….. Population  It is the set of all objects being studied Sample  It is a subset of the population.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Uncertainty and confidence If you picked different samples from a population, you would probably get different sample means ( x ̅ ) and virtually none.
Chapter 14 Single-Population Estimation. Population Statistics Population Statistics:  , usually unknown Using Sample Statistics to estimate population.
And distribution of sample means
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Confidence intervals for µ
Estimation.
ESTIMATION.
Confidence Interval Estimation
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Sampling Distributions and Estimation
Confidence Intervals for Proportions
Inference for Proportions
Week 10 Chapter 16. Confidence Intervals for Proportions
Chapter 9 Hypothesis Testing.
Ch. 8 Estimating with Confidence
CONCEPTS OF ESTIMATION
Chapter 8: Estimating with Confidence
Confidence Interval Estimation
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Confidence Intervals with Proportions
Chapter 8 Confidence Intervals.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Confidence Intervals
Confidence Intervals for Proportions
Chapter 8 Estimation.
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

We are interested in methods that produce an interval: Common interval methods for: Confidence intervals Prediction intervals Tolerance intervals Credibility/Probability intervals (Bayesian) Interval Estimation Given the assumptions of the methods are satisfied, the interval covers the true value of the parameter with (approximate) probability at least 1 – .

Confidence Intervals  is a parameter we are interested in and assume we don’t know its true value. e.g. a mean, a sd, a proportion, etc. Consider an experiment that will collect a sample of data. Then BEFORE we collect the data, we can devise procedure such that: Estimates we will get from the sample we have yet to collect

Confidence Intervals In order to get actual numerical values for and we perform the experiment and plug in the data The outcomes for this experiment are: Under the frequentist definition, probabilities (other than 0 or 1) only exist for outcomes of experiments that haven’t happened yet.

Confidence Intervals Once the data is collected we cannot say that  is in the specific, realized interval with probability greater than or equal to 1-  But that’s a mouthful, so let’s make up a new word: confidence The “probability” of the outcome is now: 0 (outcome did not happen) or 1 (outcome did happen). This is true even if you don’t know what the outcome was. For realized CIs something happened. We just can’t tell what the outcome was if we don’t know the true value of . What we could say is: “considering the data we’ve collected is a set of plausible values for  ”.

Confidence Intervals The CI’s level of confidence: (1 −  )×100% is the same “number” as the CI –method’s probability of producing an interval that covers , but… We are (1 −  )×100% confident that the true value of  is covered by Given a sample of data, the (1 −  )×100% confidence interval for a parameter estimate on the sample is: confidence is not probability Confidence says something about the “plausibility” of , being one of the values in the measured interval. The “amount of plausibility” is the confidence.

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Conceptually, if we are trying to estimate a parameter  with some estimator we have to know something about the sampling distribution of the estimator For large IID samples, one can show that is approximately normal: Approx. sampling dist. of an estimator (large IID sample assumed)

Confidence Intervals Since we don’t know  or we can plug in their sample estimates once we’ve collected a sample: plug in “Plausible” approximate sampling distribution considering the sample collected. So where do (say) 95% of the “plausible samples” fall…

Confidence Intervals Since we don’t know  or we can plug in their sample estimates once we’ve collected a sample: plug in “Plausible” approximate sampling distribution considering the sample collected. So where do (say) 95% of the “plausible samples” fall… …say symmetrically around the estimate???? About 95% falls between the first two standard devs for a normal density! Area between is ≈ 0.95 Two-sided equal tailed CI

Confidence Intervals Since we don’t know  or we can plug in their sample estimates once we’ve collected a sample: plug in “Plausible” approximate sampling distribution considering the sample collected. So where do (say) 95% of the “plausible samples” fall… …say up to the first 95% most plausible??? Area between –∞ and is ≈ 0.95 One-sided CI 95% of the “plausible samples” are lower than

Confidence Intervals Since we don’t know  or we can plug in their sample estimates once we’ve collected a sample: plug in “Plausible” approximate sampling distribution considering the sample collected. So where do (say) 95% of the “plausible samples” fall… …say highest 95% most plausible??? Area between and ∞ ≈ 0.95 One-sided CI 95% of the “plausible samples” are higher than

Confidence Intervals By “standardizing”: Z gives the number of s.d.s is from For :

4.11, 3.70, 3.36, 3.68, 4.42, 3.23, 4.03, 4.03, 3.52, 4.75, 5.09, 3.47, 3.02, 4.24, 4.74, 4.51, 2.90, 4.15, 3.54, 3.81, 2.98, 3.82, 4.32, 3.06, 4.00, 4.05, 3.19, 3.17, 3.67, 4.37 A the mass of an unknown powder was determined 30 times. The Results are shown below (units: mg): Compute the Confidence Intervals Compute: a.The sample mean: b.The sample sd: c.The estimated standard error of the mean: d.The number of estimated standard errors that cover 95% of the sampling distribution symmetrically about the sample mean: or

Compute the Confidence Intervals a. Sample mean = 3.83 b. Sample sd = 0.58 c. Est se of mean = 0.11 d. For 95%,  = % spread symmetrically about the mean we want z and z = ±

Compute the Confidence Intervals e. Compute the two-sided 95% CI for the mean given this data: Same thing but easier to typeset [ 3.83 – 1.96*0.11, *0.11 ]

Confidence Intervals Points of interest: (1 −  ) is called the level of confidence and is between 0 and 1 Common standard choices are 0.95, 0.99, 0.9  is called the significance level and is between 0 and 1 Common standard choices are 0.05, 0.01, 0.1 Estimate standard error of  with the bootstrap if: Sampling distribution of is not known Sample size is small Algorithm for  is very complicated Why not? You can always do it! For small sample sizes: Use Student-t based formulas (coming up) Bootstrap required estimates (below)

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 1a: (1 −  )×100% CIs for the mean  : Large sample n (at least 30), sd  X known: Two sided One sided, lower bound One sided, upper bound N(0,1) quantiles qnorm(1-a/2) or qnorm(1-a)

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 1b: (1 −  )×100% CIs for the mean  : Large sample n (at least 30), sd  X unknown: Two sided One sided, lower bound One sided, upper bound

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 1c: (1 −  )×100% CIs for the mean  : Small sample n, sd  X unknown: Two sided One sided, lower bound One sided, upper bound Student-t(n-1) quantiles qt(1-a/2,df=n-1) or qt(1-a,df=n-1)

A suspect, one Mr. B. Mayhew is captured by law enforcement officials in possession of 50 mini-Ziploc baggies containing what is determined to be very pure methamphetamine (“meth”). Under Federal statute 21 USC §§ 841(a), 841(b)(1)(B); § 2D1.1 the mandatory minimum sentence for possession and intent to distribute is 10 years if the amount is greater than or equal to 50g but 5 years for less than 50g. Considering the sentence differential it is important to determine the total mass as accurately as possible. The baggies are emptied and collected into one mass of crystals. 10 mass measurements are taken: g g g g g g g g Example: Confidence Intervals a.Compute the two-sided 99% CI for the mean mass b.Compute the one-sided 99% CI for the lower bound on the mean mass c.Compute the one-sided 99% CI for the upper bound on the mean mass The lab’s analytical balances have uncertainty in the 4 th decimal place. The lab policy is to round up if the fourth decimal place is greater than or equal to 5, e.g g will be reported as g while g will be reported as g.

Example: Confidence Intervals a.

Example: Confidence Intervals

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 2a: (1 −  )×100% CIs for a proportion p : Large sample n (at least 30), p not too close to 0 or 1: Two sided One sided, lower bound One sided, upper bound Remember:

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 2b: (1 −  )×100% CIs for a proportion p : Small sample n and/or p close to 0/1: Two sided Define Agresti, Coull :

Example: Confidence Intervals Saunders, Davis and Buscaglia define random match probability (RMP) in handwriting analysis as “[T]he chance of randomly selecting two individuals from some relevant population and then randomly selecting two writing samples, one from each individual’s available body of handwriting, that are declared to ‘‘match’’ on the basis of the chosen comparison procedure.” Say a suspect is apprehended in an a case and is alleged to have written a threatening letter. A database search yields 100 “best matching” individuals (one writing sample each). Assume this serves as a sample from a “relevant population”. It is known that none were actually produced by the suspect with the exception of the writing sample they produced. Each item in the sample compared to the others (n = 4950 comparisons) and two pairs are found to “match”. The estimated RMP is thus 2/4950 = Compute the estimated two sided CI (neglecting correlations) for this RMP at the 95% level of confidence.

Example: Confidence Intervals

Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? Case 3: (1 −  )×100% CIs for a Poisson mean counts : Large sample n (at least 30): Two sided

Bootstrap Confidence Intervals So how do we compute a (1 −  )×100% confidence interval given a set of data?? For any parameter, you can try to obtain bootstrap based CIs For a sample of size n: Obtain a bootstrap sampling distribution for  : boot.reps Find the (1 −  )×100% empirical percentiles: quantile(boot.reps, probs=c(a/2, 1-a/2)) Two sided One sided, lower bound One sided, upper bound quantile(boot.reps, probs=c(a)) quantile(boot.reps, probs=c(1-a))

Consider again the case of Mr. B. Mayhew with seizure mass measurements of: g g g g g g g g Example: Bootstrap Confidence Intervals a.Compute the 99% CI for the mean mass via the bootstrap. b.What is your bootstrap standard error estimate for the estimated mean? c.Approximately, with what level of confidence can you report the mean measurement is equal to or exceeds g?

Example: Bootstrap Confidence Intervals Look at what happens by just demanding a little more precision: