Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniela Stan Raicu School of CTI, DePaul University

Similar presentations


Presentation on theme: "Daniela Stan Raicu School of CTI, DePaul University"— Presentation transcript:

1 Daniela Stan Raicu School of CTI, DePaul University
CSC 323 Quarter: Spring 02/03 Daniela Stan Raicu School of CTI, DePaul University 5/8/2019 Daniela Stan - CSC323

2 Outline Chapter 6: Introduction to Inference
Introduction to statistical inference Confidence Intervals Confidence intervals for the population mean  SAS Procedure for confidence intervals 5/8/2019 Daniela Stan - CSC323

3 Introduction to Inference
Statistical inference provides methods to draw conclusions from sample data. Statistical inference uses the language of probability to say how trust worthy its conclusions are. When we use statistical inference, we assume that: Data come from a random sample or a randomized experiment We study two types of statistical inference: Confidence intervals for estimating the value of a population parameter Tests of significance which asses the evidence for a claim 5/8/2019 Daniela Stan - CSC323

4 Introduction to Inference
Example: You are in charge of quality control at Coca-Cola and are asked to determine the average amount µ dispensed by a certain bottling machine. You take a sample of size n = 49 bottles and record a sample average value of x = ml. How much confidence do you have in this estimate? 5/8/2019 Daniela Stan - CSC323

5 Introduction to Inference
Example (cont.): The law of large numbers says that x is close to µ so we expect that µ is “somewhere near” To make this more precise we ask: “How would x vary if we looked at many samples of 49 bottles?” The central limit theorem says that x has a normal distribution with mean µ and standard deviation σ/√49. Suppose you know that σ = 10 ml, then x has standard deviation 10/7 = 1.43 ml. 5/8/2019 Daniela Stan - CSC323

6 Estimating with Confidence
Population: mean µ unknown standard deviation  known How do we estimate µ? Take a sample: Its sample mean x is an unbiased estimator of µ. How reliable is this estimator? What is the variability of the estimator? What would happen if we use the statistics many times? 5/8/2019 Daniela Stan - CSC323

7 How do we estimate µ? SRS1 SRS2 SRSn The center of the distribution of the sample averages is the population average . A measure of the spread is the standard error. The central theorem says: In repeated sampling, the sample mean x follows the normal distribution centered at the unknown population mean and having standard deviation x = /(n1/2) x1 x2 xn 5/8/2019 Daniela Stan - CSC323

8 Confidence Intervals (CI)
Population: mean µ unknown Sample mean x is an unbiased estimator of µ. How reliable/accurate is this estimator? What is the variability of the estimator? The rule says: In the normal distribution with mean µ and standard deviation : approximately 95% of the observations fall within 2*  of µ Since x has a normal distribution the rule says: approximately 95% of all the samples will capture the true value µ in the interval (x -2 /(n1/2), x+2 /(n1/2)) x ~ N(µ, /(n)1/2) 5/8/2019 Daniela Stan - CSC323

9 95% Confidence Intervals
x1 x2 x3 1 out of 25 confidence intervals does not cover the true value of µ x4 x5 x25 5/8/2019 Daniela Stan - CSC323

10 Confidence Intervals (CI)
Example (cont.) In our sample, x = ; therefore, we are 95% confident that the unknown µ lies between = and = The range of numbers [ , ] is called a 95% confidence interval for µ. 5/8/2019 Daniela Stan - CSC323

11 Confidence Intervals (CI)
Example (cont.) Be sure you understand the grounds for your confidence. There are only two possibilities: 1. The interval [ , ] contains the true value µ. 2. Our SRS was one of the few samples for which x is not within 2.86 of the true µ. Only 5% of all samples give such inaccurate results. The interval [ , ] either contains µ or it doesn’t. The 95% confidence refers to the method used to create [ , ], not this particular result. 5/8/2019 Daniela Stan - CSC323

12 Confidence Intervals (cont.)
In the language of statistical inference, we say that we are 95% confident that the unknown population mean lies in the interval (x -2/(n1/2), x+2 /(n1/2)) 95% confidence interval for µ. In only 5% of all samples, the sample mean x is not in the above interval, that is 5% of all samples give inaccurate results. A Confidence Interval (CI) has the form: (estimate - margin of error, estimate + margin of error) estimate is the guess for the value of the unknown population parameter margin of error gives how accurate the we believe our guess is 5/8/2019 Daniela Stan - CSC323

13 Properties of the CI It is an interval of the form (a, b), where a and b are numbers computed from the data; its purpose is to estimate an unknown parameter with an indication of how accurate the estimate is and how confident we are that the result is correct. It has a property called a confidence level that gives the probability that the interval covers the parameter; we use C to denote the confidence level in decimal form; for example, 95% confidence level corresponds to C=0.95. Users can choose the confidence level, most often 90% or higher because we want to be quite sure about our conclusions. 5/8/2019 Daniela Stan - CSC323

14 CI for a population mean
Choose an SRS of size n from a population having unknown mean µ and known standard deviation . A level .95 confidence interval for µ is (x - 2/(n1/2), x + 2/(n1/2)) In general, a level C confidence interval for µ is (x - z* /(n1/2), x + z* /(n1/2)) The relationship between C and z*: P(x - z* /(n1/2)< µ <x + z*/(n1/2))= C 5/8/2019 Daniela Stan - CSC323

15 Other confidence levels
Other confidence levels are found by checking the normal table (Table D). z* C 90% 95% 99% 1.64 /(n1/2), Margin of error x 90% Confidence Interval 1.96 /(n1/2), x 95 % Confidence Interval 2.57 /(n1/2), x 99 % Confidence Interval 5/8/2019 Daniela Stan - CSC323

16 Confidence Intervals (CI)
Coca-Cola Example revisited: 90% confidence interval for µ in Coke problem: ± (1.645)(10/√49) = [ , ] 95% confidence interval for µ in Coke problem: ± (1.960)(10/√49) = [ , ] 99% confidence interval for µ in Coke problem: ± (2.576)(10/√49) = [ , ] 5/8/2019 Daniela Stan - CSC323

17 Confidence Intervals (CI)
Blood test Example: A test for the level of potassium in the blood is not perfectly precise. Moreover, the actual level of potassium in a person’s blood varies slightly from day to day. Suppose that repeated measurements for the same person on different days vary normally with σ = 0.2. (a) Julie’s potassium level is measured once. The result is x = 3.2. Give a 90% confidence interval for her mean potassium level µ. Solution: For 90% confidence we use z = Since σ = 0.2 and n = 1, we get a margin of error equal to (1.645)(0.2/√1) = .329. Therefore, the interval we want is 3.2 ± .329 = [2.871, 3.529]. 5/8/2019 Daniela Stan - CSC323

18 Confidence Intervals (CI)
Coca-Cola Example revisited: (b) If three measurements are taken on different days with a mean of x = 3.2, what is a 90% confidence interval for her mean potassium level µ? Solution: For 90% confidence we use z = Since σ = 0.2 and n = 3, we get a margin of error equal to (1.645)(0.2/√3) = Therefore, the interval we want is 3.20 ± .190 = [3.01, 3.39]. Notice that an interval based on a sample size of n = 3 is shorter (more accurate) than an interval based on a single n = 1 observation. 5/8/2019 Daniela Stan - CSC323

19 Remarks To make a margin of error smaller, you can:
1. Notice the trade off between the margin of error and the confidence level. The greater the confidence you want to place in your prediction, the larger the margin of error is (and hence less informative you have to make your interval). To make a margin of error smaller, you can: take a larger sample decrease the population standard deviation decrease the confidence level C 5/8/2019 Daniela Stan - CSC323

20 Remarks (cont.) 2. A C.I. gives the range of values for the unknown population parameter that are plausible, in the light of the observed sample parameter. The confidence level says how plausible. 3. A C.I. is defined for the population parameter, NOT the sample statistic. 4. The confidence intervals are approximate and holds in large samples. This is because they are defined using the normal approximation. 5/8/2019 Daniela Stan - CSC323

21 More on CI’s Example: The McClatchy News Service (San Luis Obispo Telegram-Tribune, June ) reported on a sample of prime time television hours Mean Number of Network Violent Acts/Hour ABC=15.6, CBS=11.9, FOX = 11.7, NBC = 11.0 Suppose that each sample mean was based on an SRS of n = 50 viewing hours and that σ = 5 is known. (a) Compute a 95% confidence interval for µABC, the true mean number of violent acts per hour on ABC. Solution: The confidence interval is given by ¯x ± z*σ/√n. For 95% confidence, we use z = We are given n = 50, σ = 5, and x = The confidence interval is 15.6 ± (1.96)*5/√50 = 15.6 ± = [14.214, ]. We are 95% confident that µABC is between and violent acts per hour. 5/8/2019 Daniela Stan - CSC323

22 More on CI’s Example: (b) Compute a 95% for the other three networks.
Solution: As before we obtain 95% confidence interval for µCBS: 11.9 ± (1.96)(5/√50) = [10.514, ] 95% confidence interval for µFOX: 11.7 ± (1.96)(5/√50) = [10.314, ] 95% confidence interval for µNBC: 11.0 ± (1.96)(5/√50) = [9.614, ] 5/8/2019 Daniela Stan - CSC323

23 More on CI’s Example: (c) The National Coalition on Television Violence claims that shows on ABC are more violent than on other networks. Based on the confidence intervals from parts (a) and (b), do you agree? Answer: The ¯x value for ABC is so large that its confidence interval lies entirely to the right of all other confidence intervals. Even taking random variation into account, the number of violent acts on ABC shows is clearly larger than other networks’ shows. 5/8/2019 Daniela Stan - CSC323

24 Summary for C.I.’s The formula for a C.I. is valid for estimates computed from a simple random sample. May not be valid for other types of samples. A C.I. is used when estimating an unknown parameter from sample data. The C.I. is an interval for the population parameter (the true value), NOT for the sample estimate. The C.I. gives a plausible range for the unknown parameter. The C.I. is constructed from the sample and depends on the sample! 5/8/2019 Daniela Stan - CSC323

25 SAS procedure for the C.I. of a population average
PROC MEANS DATA = data-name N MEAN STD CLM ALPHA=value MAXDEC = number; VAR measurement-variable; RUN; Where ALPHA=value is the (1-confidence level) value. (Thus ALPHA=0.05 for a 95% C.I., ALPHA =0.1. for a 90% CI, ALPHA=0.01 for a 99% C.I. ) CLM is the option for C.I.’s MAXDEC = number defines how many decimal numbers (typically 1 to 4) 5/8/2019 Daniela Stan - CSC323

26 C.I. for population average
SAS Example proc means data=dist n mean std clm alpha=0.05 maxdec=4 ; var x; title “C.I. for population average"; run; C.I. for population average The MEANS Procedure Analysis Variable : x Lower 95% Upper 95% N Mean Std Dev CL for Mean CL for Mean ________________________________________________________ ________________________________________________________ 5/8/2019 Daniela Stan - CSC323

27 More on CI’s Problems 6.7, 6.10/page 430 Choose the sample size:
In order to obtain both high confidence C and a small desired margin error m when calculating the confidence interval of a normal mean, the following equation should be solved: m = z* /(n1/2) Smaller m gives greater n Sample size: n = (z* /m) 2 Problems 6.16, 6.17/page432 5/8/2019 Daniela Stan - CSC323


Download ppt "Daniela Stan Raicu School of CTI, DePaul University"

Similar presentations


Ads by Google