1 G89.2228 Lect 6b G89.2228 Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Inference about a Mean Part II
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 11: Inference for Distributions
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter 13: Inference in Regression
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Hypothesis Testing.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Mid-Term Review Final Review Statistical for Business (1)(2)
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
The binomial applied: absolute and relative risks, chi-square.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Determination of Sample Size: A Review of Statistical Theory
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 9-1 Review and Preview.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Chapter 10 The t Test for Two Independent Samples
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
I. ANOVA revisited & reviewed
Chapter 9 Hypothesis Testing.
ESTIMATION.
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Elementary Statistics
Contingency Tables: Independence and Homogeneity
Statistical Inference for the Mean: t-test
Presentation transcript:

1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a single proportion –“Exact” binomial test –Large sample test: Normal –Chi Square test –Making the Binomial and Large sample tests agree Confidence Bound on proportion –symmetric bound –nonsymmetric bound Differences in proportions –Large sample test –Confidence bounds –Chi Square test (2  2 table)

2 G Lect 6b Generalizing from tests of quantitative variables to tests of categorical variables Binary variables (X=0 or X=1) resemble quantitative variables in several ways The mean or E(X) is in the range (0,1) –It is interpreted as a probability p The variance of X, computed the usual way, turns out to be: p(1-p) The sample mean,  is itself normally distributed for large sample sizes The logic for tests of binary means works in large samples the same way it does for continuous means

3 G Lect 6b Binomial Variation If we know E(X) we know V(X)

4 G Lect 6b Testing a hypothesis about a single proportion, H 0 : E(X)=p=k Example: Whether a population of musicians has the same proportion of left handed people as the population at large. –Sample 20 Juilliard musicians, and find that 5 are left-handed (fictional data) –If X=0 for right-handed and X=1 for left- handed, then  X=5/20=.25. –If p=.1, then is 5/20 an unusual event? Exact test: Application of binomial =1-( ) =.043 One tailed inference would call H 0 in question, but not two-tailed inference, although there are no relevant possibilities in the other tail!

5 G Lect 6b Large sample test: Normal While we can work with 20 subjects and 5 positive cases, what about 36, 48 or 180 subjects? Binomial calculations are often tedious. How about using the central limit theorem to test a z statistic? Assuming H 0 is true, we know that Z=.15/.067=2.24. Under the null hypothesis, such an extreme z would be observed only 13 times out of 1000 for a one tailed test, or 25 out of 1000 for a two-tailed test.

6 G Lect 6b Chi Square test Z 2 =(2.24) 2 =5.0 can also be evaluated as  2 (1). On page 672 of Howell we see that this value corresponds to a p of.025. Squaring the Z makes it implicitly a two-tailed test. Pearson showed that this same statistic can be computed by comparing the observed values, 5,15, to the expected frequencies under H 0, 2,18. Let O i represent the observed frequency and E i be the expected frequency. Pearson’s Goodness of fit Chi Square is:

7 G Lect 6b Making the Binomial and Large sample tests agree The one tailed p value for the binomial test was.043, while for the z (or  2 ), it was.013. Why the difference? The binomial has discrete jumps in probability as we consider the possibility of 0, 1,…,5 left-handed persons out of 20. The z and  2 tests make use of continuous distributions Yates suggested a correction The p value for this corrected z (one-tailed) is.031. It is called a “correction for continuity”, but its use is somewhat controversial.

8 G Lect 6b Confidence Bound on binomial proportion What procedure can be used to define a bound on µ that will contain the parameter 95% of the time it is used? So far, we have considered symmetric bounds of the form, where is the estimate of the parameter value . This general form does not always work well on parameters that are bounded, such as the mean of a binary variable. If p is in the range (.2,.8), we can usually get by with the symmetric form The continuity correction expands the bounds by 1/2n: ( , ) (.035,.465).

9 G Lect 6b A nonsymmetric CI for p Fleiss (1981) Statistical Methods for Rates and Proportions, 2nd Edition, gives better expressions for continuity- corrected bounds where q=1-p and k=1.96 for 95% bounds. Applying these formulas gives the bounds (.096,.49) Which bounds to use? Simulations show Fleiss’s to be best, but they are not necessarily most often used.

10 G Lect 6b Difference in proportions from independent samples Henderson-King & Nisbett had a binary outcome that is most appropriately analyzed using methods for categorical data. Consider their question, is choice to sit next to a Black person different in groups exposed to disruptive Black vs. White? In their study and. Are these numbers consistent with the same population mean? Consider the general large sample test statistic, which will have N(0,1), when the sample sizes are large:

11 G Lect 6b Differences in proportions Under the null hypothesis, the standard errors of the means are and, where  is the population standard deviation. Under the null hypotheses, the common proportion is estimated by pooling the data: The common variance is The Z statistic is then, The two-tailed p-value is.16. A 95% CI bound on the difference is.160±(1.96)(.114) = (-.06,.38). It includes the H 0 value of zero.

12 G Lect 6b Pearson Chi Square for 2  2 Tables The z test statistic (e.g. for the difference of two proportions) is a standard normal z 2 is distributed as  2 with 1 degree of freedom From the example, =1.96, p is between.1 and.25, and this is effectively a 2-tailed test Pearson’s calculation for this test statistic is: where O i is an observed frequency and E i is the expected frequency given the null hypothesis of equal proportions.

13 G Lect 6b Expected values for no association From the example: p 1 =11/37=.297, and p 2 =16/35=.457 The expected frequencies are based on a pooled p=(11+16)/(37+35)=.375

14 G Lect 6b Chi square test of independence vs. association, continued Marginal probabilities = pooled Expected joint probabilities|H0 = product of marginals (e.g..193 =.375*.514) E i = expected joint probability * n (e.g =.193*72=27*37/72)