Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

Parameter Estimation Chapter 8 Homework: 1-7, 9, 10 Focus: when  is known (use z table)
Sampling: Final and Initial Sample Size Determination
Objectives (BPS chapter 18) Inference about a Population Mean  Conditions for inference  The t distribution  The one-sample t confidence interval 
Chapter 7. Exercise 1 Mean X*:7.6,8.1,9.6,10.2,10.7,12.3,13.4,13.9,14.6,15.2 Here the percentile BS method is applied, testing P is given by where is.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Independent Sample T-test Formula
The Normal Distribution. n = 20,290  =  = Population.
Final Jeopardy $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 LosingConfidenceLosingConfidenceTesting.
Statistics Are Fun! Analysis of Variance
Parameter Estimation Chapter 8 Homework: 1-7, 9, 10.
BHS Methods in Behavioral Sciences I
HIM 3200 Normal Distribution Biostatistics Dr. Burton.
Inference about a Mean Part II
BPS - 5th Ed. Chapter 171 Inference about a Population Mean.
Chapter 11: Inference for Distributions
Inference on averages Data are collected to learn about certain numerical characteristics of a process or phenomenon that in most cases are unknown. Example:
Standard error of estimate & Confidence interval.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
AM Recitation 2/10/11.
Hypothesis Testing – Examples and Case Studies
9 Confidence intervals Chapter9 p218 Confidence intervals Point estimation  The first method involves using the sample mean to estimate population mean.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
More About Significance Tests
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Chapter 8: Confidence Intervals
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
BPS - 5th Ed. Chapter 171 Inference about a Population Mean.
1 Happiness comes not from material wealth but less desire.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
The Robust Approach Dealing with real data. Estimating Population Parameters Four properties are considered desirable in a population estimator:  Sufficiency.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 7 - Sampling Distribution of Means.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
LSSG Black Belt Training Estimation: Central Limit Theorem and Confidence Intervals.
AP Statistics Chapter 10 Notes. Confidence Interval Statistical Inference: Methods for drawing conclusions about a population based on sample data. Statistical.
Essential Statistics Chapter 161 Inference about a Population Mean.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Lesson 9 - R Chapter 9 Review.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
© Copyright McGraw-Hill 2004
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 Probability and Statistics Confidence Intervals.
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Essential Statistics Chapter 171 Two-Sample Problems.
Inference for distributions: - Comparing two means.
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Inference for a population mean BPS chapter 16 © 2006 W.H. Freeman and Company.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Summary of t-Test for Testing a Single Population Mean (m)
Review of Power of a Test
Hypotheses and test procedures
Other confidence intervals
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Descriptive and inferential statistics. Confidence interval
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Normal Distribution The Bell Curve.
Statistical Inference for the Mean: t-test
Essential Statistics Inference about a Population Mean
Presentation transcript:

Chapter 4

Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the and quantiles of the sampling distribution of the sample statistic.

Exercise 2 C would be the 1-α/2 quantile on the normal distribution. From Table 1 or R function qnorm: For CI of 0.8, the 0.9 quantile is For CI of 0.92, the 0.96 quantile is For CI of 0.98, the 0.99 quantile is 2.326

Exercise 3 From Table 1:

Exercise 4 From Table 1 :

Exercise 5 μ=1200, σ=25, n=36 For CI 0f 95% The 95% CI for μ does not contain 1200, so the claim seems unreasonable

Exercise 6

Exercise 7 Random sampling requires: 1.That all observations are sampled from the same distribution 2.That the sampled observations are independent, meaning that the probability of sampling a given observation does not alter the probability of sampling another. (Note: this is not the same as equal probability)

Exercise 8 The sampling distribution is centered around population μ so it will be 9. The variance of the sampling distribution is given by In this case:

Exercise 9 X: P(x) So

Exercise 10 The expected value of the sample mean equals the population mean, so if you average 1000 sample means the grand average should approximately equal μ, in this case, 2.7.

Exercise 11 Based on the same principle, the expected value of the sample variance equals to the population variance, so if you average 1000 sample variances should approximately equal, in this case, 1.01

Exercise 12 a=c(2,6,10,1,15,22,11,29) n=8 var(a) [1] The variance of the sample mean is estimated by And standard error is estimated by

Exercise 13 The estimate of μ in this case would be based on a single observation = 32. With a single observation, it is not possible to estimate the standard error because there is no variance in the sample. As the sample size increases, the variance of the sampling distribution decreases (squared standard error). Note that n is in the denominator of the standard error. Lower variance in the sampling distribution means smaller standard error, a less error in the sample estimates.

Exercise 14 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) N=12 var(b) [1] Squared SE=

Exercise 15 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) > out(b) $out.val [1] These outliers substantially inflate the standard error, as they inflate the variance.

Exercise 16 c=c(6,3,34,21,34,65,23,54,23) n=9 var(c) [1] The squared SE is:

Exercise 17 No. An accurate estimate of the standard error requires independence among sampled observations.

Exercise 18 The variance of the mixed normal is 10.9, so the squared standard error for a sample of 25 would be 10.9/25=0.436, compared to 1/25=0.04 This means that under small departures from normality, the standard error can inflate more than 10 fold. The inflation greatly increases error, and the length of CIs.

Exercise 19 When sampling from a non-normal distribution, the sampling distribution of the mean no longer conforms to the probabilities that of the normal curve. In other words, the sampling distribution is no longer normal, so the se cannot be used accurately to determine probabilities and Cis.

Exercise 20 μ=30, σ=2, n=16, so SE=2/4=0.5. Determine Z, and consult Table 1, or use R. For pnorm(29,30,2/sqrt(16)) [1] For pnorm(30.5,30,2/sqrt(16)) [1] =0.159 For pnorm(31,30,2/sqrt(16)) [1] =0.955

Exercise 21 μ=30, σ=5, n=25, so SE=5/5=1. Determine Z, and consult Table 1, or use R. a.pnorm(4,5,1) [1] b. pnorm(7,5,1) [1] =0.023 c. pnorm(3,5,1) [1] =0.955.

Exercise 22 μ=100000, σ=10000, n=16, so SE=10000/4=2500 From Table 1 P< Using R: pnorm(95000,100000,10000/sqrt(16)) [1]

Exercise 23 μ=100000, σ=10000, n=16, so SE=10000/4=2500 Compute z scores for each value and consult Table 1. Or use R: pnorm(97500,100000,10000/sqrt(16)) [1] pnorm(102500,100000,10000/sqrt(16)) [1]

Exercise 24 μ=750, σ=100, n=9, so SE=100/3= Compute z scores for each value and consult Table 1. Or use R. > pnorm(700,750,100/sqrt(9)) [1] > pnorm(800,750,100/sqrt(9)) [1] =0.873

Exercise 25 μ=36, σ=5, n=16, so SE=5/4 pnorm(37,36,5/4) [1] pnorm(33,36,5/4) [1] =0.992 pnorm(34,36,5/4) [1] Use table 1 For p<-1.6 pnorm(37,36,5/4) [1] > pnorm(34,36,5/4) [1] =0.734

Exercise 26 μ=25, σ=3, n=25, so SE=3/5 a.pnorm(24,25,3/5) [1] b.pnorm(26,25,3/5) [1] c = d =0.903

Exercise 27 Heavy tailed distributions generally yield long CI for the mean because their large variance inflates the SE. Central limit thorem does not remedy this problem.

Exercise 28 Light tailed, symmetric distributions provide relatively accurate probability coverage for CI even with small sample sizes. Central limit theorem works relatively well in this case.

Exercise 29 C is the 1-α/2 quantile of a T distribution with n- 1 degrees of freedom. Look up c from Table 4, qantile with 9df Or use R:qt(0.975,9): [1] a. b. c.

Exercise 30 C is the 1-α/2 quantile of a T distribution with n- 1 degrees of freedom. Look up c from Table 4, qantile with 9df Or use R:qt(0.99,9): [1] 2.82 a. b. c.

Exercise 31 x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376, 428,515,666,1310,2611) The R function t.test(x) returns: One Sample t-test data: x t = , df = 18, p-value = alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates mean of x :

Exercise 32 y=c(5,12,23,24,18,9,18,11,36,15) The R function t.test(y) returns: One Sample t-test data: y t = 6.042, df = 9, p-value = alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates: mean of x 17.1

Exercise 33 Heavy tailed distributions inflate the standard error in a manner that changes the cumulative probabilities of the T distribution. In this situation, the new T quantiles correspond to values that are different than T under normality. The inflation of the SE, due the larger frequency of extreme values in the tails, leads to very long CI that far exceed the nominal value of the state probability coverage under normality. For example, the intended 95% CI will yield a range that in reality covers over 99% of the distribution. When distributions are skewed, T becomes skewed, off centered (mean and median no longer 0 – due to the dependency that is now created between the mean and SD), with values that do not correspond to the quantiles in Table 4. This results in highly inaccurate probability coverage for CIs.

Exercise 34 When the variance is estimated by the empirical sample in a light tailed skewed distribution, the t distribution markedly departs from the values to student t (becoming skewed and no longer centered around 0), so probability coverage is no longer accurate.

Exercise 35 a. c corresponds to the quantile of a T distribution with n-2g(g=.2n,rounded down)—1 df=24-24 ✕ 2 ✕ 0.2-1=15 qt(0.975,15) [1] b. Df=36-36 ✕ 2 ✕ 0.2-1=21 qt(0.975,21) [1] ✕ 2 ✕ 0.2-1=7 qt(0.975,7) [1]

Exercise 36 c corresponds to the quantile of a T distribution with n-2g(g=.2n,rounded down)—1 a. qt(0.99,15) [1] b. qt(0.99,21) [1] c. qt(0.99,7) [1]

Exercise 37 x=c(77,87,88,114,151,210,219,246,253,262,296, 299,306,376,428,515,666,1310,2611) The R function trimci(x) returns $ci [1]

Exercise 38 With trimmed means the CI is long With means it is 573.2, which is 2.34 times longer. The mean has a larger standard errors, resulting in larger CI.

Exercise 39 m=c(56,106,174,207,219,237,313,365,458,497,515,529,557,615,625,6 45,973,1065,3215). For mean: t.test(m) , For trimmed mean: trimci(m) , Checking for outliers: out(m) $out.val [1] 3215 The CI for trimmed mean is far shorter than the CI for the mean because the outlier (3213) inflates the SE. In the case of the trimmed mean, it is trimmed. Other values in the data set may have a smiliar effect.

Exercise 40 Under normality, the sample mean has the smallest standard error. So it is the only candidate for being ideal. But as we have seen, other estimators have a smaller standard error than the mean in other situations, so an optimal estimator does not exist across board.

Exercise 41 No, because what often appears to be normal is not normal. In addition, there are robust estimators that compare relatively well (although not as well) to the mean under normality but perform far better in situations that mildly depart from normality. In other word, under normality, the difference is small, under non-normality it can be very large.

Exercise 42 c=c(250,220,281,247,230,209,240,160,370,274,210,204,2 43,251,190,200,130,150,177,475,221,350,224,163,272,23 6,200,171,98) CI for the mean: t.test(c) 95 percent confidence interval: CI for the trimmed mean: trimci(c) [1]

Exercise 43 And outlier analysis reveals 4 outliers: out(c) $out.val [1] These increase the length of the CI foe the mean. They are trimmed with the trimmed mean CI.

Exercise 44 Even if the two measures are identical, outliers can largely inflate the CI based on means, rendering the outcome less informative.

Exercise 45 In this case we have 16 successes in 16 trials. The R function: binomci(16,16, alpha=0.01) $ci [1]

Exercise 46 In this case we have 0 successes in trials. The R function: binomci(0,200000) $ci [1] e e-05

Exercise 47 val=0 for(i in 1:5000) val[i]=median(rbinom(25,6,0.9)) splot(val) This is an example of how the sampling distribution of the median can largely depart from the expected bell curve dues to tied values. Each of the 5000 samples has many tied values because there are 25 trial in every sample and only 7 possible outcomes. Thus values are bound to repeat.