Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Topics Today: Case I: t-test single mean: Does a particular sample belong to a hypothesized population? Thursday: Case II: t-test independent means: Are.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Chapter 11: Inference for Distributions
Chapter 9 Hypothesis Testing.
AM Recitation 2/10/11.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
+ DO NOW What conditions do you need to check before constructing a confidence interval for the population proportion? (hint: there are three)
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
MATB344 Applied Statistics I. Experimental Designs for Small Samples II. Statistical Tests of Significance III. Small Sample Test Statistics Chapter 10.
+ Unit 5: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 8: Estimating with Confidence
Chapter 9 -Hypothesis Testing
Review of Power of a Test
More on Inference.
CHAPTER 9 Testing a Claim
Inference for the Mean of a Population
Chapter 8: Estimating with Confidence
Copyright © Cengage Learning. All rights reserved.
Lecture Nine - Twelve Tests of Significance.
Assumptions For testing a claim about the mean of a single population
Chapter 4. Inference about Process Quality
Inference for Distributions
Chapter 8: Estimating with Confidence
Daniela Stan Raicu School of CTI, DePaul University
Hypothesis Tests for a Population Mean in Practice
Chapter 9 Hypothesis Testing.
More on Inference.
Hypothesis tests for the difference between two means: Independent samples Section 11.1.
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Chapter 9 Hypothesis Testing.
Daniela Stan Raicu School of CTI, DePaul University
LESSON 20: HYPOTHESIS TESTING
Problems: Q&A chapter 6, problems Chapter 6:
CHAPTER 9 Testing a Claim
Monday, October 19 Hypothesis testing using the normal Z-distribution.
Daniela Stan Raicu School of CTI, DePaul University
Daniela Stan Raicu School of CTI, DePaul University
Chapter 8: Estimating with Confidence
Interval Estimation and Hypothesis Testing
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
Lecture 10/24/ Tests of Significance
Inference for Who? Young adults. What? Heart rate (beats per minute).
What are their purposes? What kinds?
Chapter 8: Estimating with Confidence
Chapter 23 Inference About Means.
CHAPTER 9 Testing a Claim
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
Daniela Stan Raicu School of CTI, DePaul University
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 8: Estimating with Confidence
CHAPTER 9 Testing a Claim
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03 Daniela Stan, PhD Course homepage: http://facweb.cs.depaul.edu/Dstan/csc323 Office hours: (No appointment needed) M, 3:00pm - 3:45pm at LOOP, CST 471 W, 3:00pm - 3:45pm at LOOP, CST 471 11/7/2019 Daniela Stan - CSC323

Outline Chapter 7: Inference for Distributions Summary of tests of significance (Chapter 6) Inference for the mean of a population (Section 7.1) The t-distributions The one-sample t confidence interval The one-sample t significance test Robustness of the t procedures 11/7/2019 Daniela Stan - CSC323

General comments on stating hypotheses It is not easy to state the null and the alternative hypothesis! Often we set Ha first and then Ho is defined as the “opposite” statement! The hypotheses are statements on the population values. The alternative hypothesis Ha is often called “researcher hypothesis”, because it is the hypothesis we are interested about. A significance test is a test against the null hypothesis. 11/7/2019 Daniela Stan - CSC323

Significance levels & P-values If the p-value is small, then the null hypothesis should not be accepted (or should be rejected using the statistical terminology). In common statistical terminology: If P is less than  = 0.05, the null hypothesis is rejected at 5% significance level. The test result is called ‘statistically significant”. If P is less than  = 0.01, the null hypothesis is rejected at 1% significance level. The test result is called ‘highly significant’. If P is larger than 0.05, the null hypothesis cannot be rejected. The test is called “not significant”. 11/7/2019 Daniela Stan - CSC323

Assumptions when applying z-statistic 1. The population has a normal distribution with mean µ and standard deviation . 2. The standard deviation  is known 3. The size ‘n’ of the simple random sample (SRS) is large 4. The appropriate test statistic to use for inference about µ when  is known is the z statistic: where the expected value µ0 is the value assumed in the null hypothesis Ho. z has a normal distribution N(0,1) z = (x - µ0)  /  n 11/7/2019 Daniela Stan - CSC323

Assumptions when applying z-statistic Is z-statistic appropriate to use when: The sample size is small? 2. The population does not have a normal distribution? 3. The population has a normal distribution but the standard deviation  is unknown? When the standard deviation of a statistic (in our case x) is estimated from data, the result is called the standard error of the statistic: SE x = s/  n What is the distribution of (x - µ0) s/  n ? It is not normal! 11/7/2019 Daniela Stan - CSC323

The t-distributions (x -µ0) t = s /  n Suppose that an SRS of size n is drawn from an N(µ, ). Then the one-sample t statistic t = (x -µ0) s /  n has the t-distribution with n-1 degrees of freedom. - The degrees of freedom come from the standard deviation s in the denominator of t. 11/7/2019 Daniela Stan - CSC323

Inference on averages for small samples When the sample is small, say n<50, the z-test has to be modified. We need to use other methods! Consider the following example. A new type of keyboard has been developed. The producers want to test if the new design makes the data entry easier and faster. They take a random sample of 24 individuals and for each individual record the input time of a standard data entry task with the new keyboard. A previous study showed that the same data entry task was completed in 42.70 seconds (on average) using a current type of keyboard. We need to perform a test of significance on the hypotheses: Null hypothesis Ho:  = 47.20 sec Alternative hypothesis Ha: 11/7/2019 Daniela Stan - CSC323

Inference on averages for small samples (cont.) The number of observations (n=24) is so small that the normal approximation is not accurate to calculate the p-value! If data arise from a population with normal distribution, we can use a different curve, called t- distribution or Student’s curve. The t-distribution was discovered by W. S. Gosset (born on 13 June 1876 in Canterbury, England), the chief statistician of the Guinness brewery in Dublin, Ireland. He discovered the t-distribution in order to deal with small samples arising in statistical quality control. The brewery had a policy against employees publishing under their own names, thus he published his results about the t-distribution under the pen name "Student", and that name has become attached to the distribution. 11/7/2019 Daniela Stan - CSC323

Comparing the student’s curve and the standard normal curve d.f.=5 d.f.=15 t t Student’s curve Standard Normal curve Student’s curve has “fatter” tails. For d.f. around 30, the student’s curve is very similar to the standard normal curve. d.f.=30 11/7/2019 Daniela Stan - CSC323 t

Finding the p-value using the student’s curve There are many student’s curves! There is one student’s curve for each number of degrees of freedom; for tests on averages: Degrees of freedom = number of observations – 1 In the previous example we had 24 observations, therefore the degrees of freedom are d.f. = 24–1=23. The p-value is found using a table of values for the student’s curves or a statistical package such as SAS. The table for t-distribution (Table D) can be found on page T-11 in the appendix. 11/7/2019 Daniela Stan - CSC323

The one-sample t test Step 1: Specify the hypotheses in the significance test: Set up the null hypothesis Ho and the alternative hypothesis Ha Step 2: Compute the test statistic The test statistic measures the difference between the data and what is expected on the null hypothesis. Step 3: Determine the appropriate student’s curve The P-value is obtained NOT from the normal curve but from one of the Student’s curves, with degrees of freedom d.f.=number of observations – 1 11/7/2019 Daniela Stan - CSC323

The one-sample t test (cont.) Step 4: Compute the P-value Compute the p-value using the student’s curve with degrees of freedoms calculated in step 3. Step 5: Draw a conclusion about the test on the basis of the p-value. Small p-values are evidence against the null hypothesis; they indicate that the observed difference from Ho is NOT due just to chance. 11/7/2019 Daniela Stan - CSC323

When to use the t-test When should we use it? Each of the following conditions should hold: For computing a statistical test on averages. The sample is a simple random sample. The number of observations is small, the sample size n is less than 30. The distribution of the population is bell-shaped, it is not too different from the normal distribution. (Not easy to check, typically true for measurements!) 11/7/2019 Daniela Stan - CSC323

Tests on averages: z-test or t-test? If the amount of current data is large Small (n <50) Use the z-test & the normal curve The distribution of the population is Unknown but quite different from the normal curve Unknown but not different from the normal curve Use the t-test & the student’s curve Do not use the t-test! 11/7/2019 Daniela Stan - CSC323

Example: keyboard data Null hypothesis Ho:  = 47.20 sec Alternative hypothesis Ha:  = 47.20 sec Assume data are drawn from approximately normal population. The sample size is small, we use the t-test. Use SAS to compute the test. SAS only supports testing for zero population mean. Test H0:  =c, for some nonzero constant c, Transform the data by subtracting c The new null hypothesis is Ho: _new =0 The PROC UNIVARIATE computes the two-sided p-value "Pr>|T|" for the alternative hypothesis Ha: 11/7/2019 Daniela Stan - CSC323

SAS output Observed t = -2.22 < 0 Testing for Population Mean Completion Time of 47.20 The UNIVARIATE Procedure ……………………………………………………………………………………………. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t -2.21735 Pr > |t| 0.0368 Sign M -5 Pr >= |M| 0.0639 Signed Rank S -72 Pr >= |S| 0.0366 Observed t = -2.22 < 0 Two-sided p-value = 0.0368 (< 0.05). Therefore, it is a significant result, and thus, I can reject the null hypothesis and conclude that the new keyboard is better! Since t =-2.22 is negative, we can conclude that the average completion time is probably shorter! 11/7/2019 Daniela Stan - CSC323