Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #32 Contingency.

Slides:



Advertisements
Similar presentations
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Advertisements

Finish Anova And then Chi- Square. Fcrit Table A-5: 4 pages of values Left-hand column: df denominator df for MSW = n-k where k is the number of groups.
Chapter 13: The Chi-Square Test
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Homogeneity.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
Ch. 28 Chi-square test Used when the data are frequencies (counts) or proportions for 2 or more groups. Example 1.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
11-2 Goodness-of-Fit In this section, we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way.
Chi-Square Tests and the F-Distribution
Nonparametrics and goodness of fit Petter Mostad
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
1 Chapter 9 Inferences from Two Samples In this chapter we will deal with two samples from two populations. The general goal is to compare the parameters.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
Regression Analysis (2)
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. 1.. Section 11-2 Goodness of Fit.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Multinomial Distribution
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
1 Section 9-4 Two Means: Matched Pairs In this section we deal with dependent samples. In other words, there is some relationship between the two samples.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Slide Slide 1 Section 8-6 Testing a Claim About a Standard Deviation or Variance.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Section 8-5 Testing a Claim about a Mean: σ Not Known.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
AP Statistics Section 14.. The main objective of Chapter 14 is to test claims about qualitative data consisting of frequency counts for different categories.
Copyright © 2010 Pearson Education, Inc. Slide
Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Outline Goodness of Fit test Test of Independence.
1 Chapter 10. Section 10.1 and 10.2 Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lesson 12 - R Chapter 12 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 18 Chi-Square Tests.  2 Distribution Let x 1, x 2,.. x n be a random sample from a normal distribution with  and  2, and let s 2 be the sample.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
1 Opinionated in Statistics by Bill Press Lessons #15.5 Poisson Processes and Order Statistics Professor William H. Press, Department of Computer Science,
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Testing a Claim About a Mean:  Not Known
Exact Test Fisher’s Statistics
#8 Some Standard Distributions
Opinionated Lessons #13 The Yeast Genome in Statistics by Bill Press
Presentation transcript:

Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #32 Contingency Tables: A First Look

Professor William H. Press, Department of Computer Science, the University of Texas at Austin2 Contigency Tables, a.k.a. Cross-Tabulation Is alcohol implicated in malformations? This kind of data is often used to set public policy, so it is important that we be able to assess its statistical significance.

Professor William H. Press, Department of Computer Science, the University of Texas at Austin3 Contingency Tables (a.k.a. cross-tabulation) Ask: Is a gene is more likely to be single-exon if it is AT-rich? function table = contingencytable(rowcons, colcons) nrow = size(rowcons,2); ncol = size(colcons,2); table = squeeze(sum( repmat(rowcons,[1 1 ncol]).*... permute(repmat(colcons,[1 1 nrow]),[1 3 2]),1 )); rowcon = [(g.ne = = 1) (g.ne > 1)]; colcon = [(g.atf 0.6)]; table = contingencytable(rowcon,colcon) table = sum(table, 1) ans = ptable = table./ repmat(sum(table,1),[2 1]) ptable = my contingency table function: column marginals So can we claim that these are statistically identical? Or is the effect here also “significant but small”? (fewer genes AT rich than CG rich) <.4 >.6 1 >1 counts

Professor William H. Press, Department of Computer Science, the University of Texas at Austin4 nhtable = sum(table,2)*sum(table,1)/sum(sum(table)) nhtable = 1.0e+004 * chis = sum(sum((table-nhtable).^2./nhtable)) chis = p = chi2cdf(chis,1) p = table = d.f. = 4 – 2 – wow, can’t get less significant than this! No evidence of an association between single-exon and AT- vs. CG-rich.  notation: null hypothesis: the statistic is: Are the conditions for valid chi-square distribution satisfied? Yes, because number of counts in all bins is large. If they were small, we couldn’t use fix-the- moments trick, because small number of bins (no CLT). This occurs often in biomedical data. So what then? (We will return to this!) Chi-square (or Pearson) statistic for contingency tables expected value of N ij

Professor William H. Press, Department of Computer Science, the University of Texas at Austin5 When counts are small, some subtle issues show up. Let’s look closely. The setup is: “conditions”, e.g. healthy vs. sick “factors”, e.g. vaccinated vs. unvaccinated counts marginals (totals, dot means summed over) The null hypothesis is: “Conditions and factors are unrelated.” To do a p-value test we must: 1.Invent a statistic that measures deviation from the null hypothesis. 2.Compute that statistic for our data. 3.Find the distribution of that statistic over the (unseen) population. That’s the hard part! What is the “population” of contingency tables? We’ll soon see that it depends (maybe only slightly?) on the experimental protocol, not just on the counts!

Professor William H. Press, Department of Computer Science, the University of Texas at Austin6 Let’s review the hypergeometric distribution What is the (null hypothesis) probability of a car race finishing with 2 Ferraris, 2 Renaults, and 1 Honda in the top 5 if each team has 6 cars in the race and the race consists of only those teams? Out of N genes, m are associated with disease 1 and n with disease 2. What is the (null hypothesis) probability of finding r genes overlap? ¡ A a ¢¡ B b ¢¡ C c ¢ ¡ A + B + C a + b + c ¢ = ¡ 6 2 ¢¡ 6 2 ¢¡ 6 1 ¢ ¡ 18 5 ¢ = 0 : 1576 N m n r Hypergeometric probabilities have product of “chooses” in the numerator, and a denominator “choose” with sums of numerator arguments. ( N m )( m r )( N ¡ m n ¡ r ) ( N m )( N n ) = ( m r )( N ¡ m n ¡ r ) ( N n ) choose 1 st set choose overlap choose rest of 2 nd set choose each set independently Yes, it is symmetrical on m and n! = m ! n ! ( N ¡ m ) ! ( N ¡ n ) ! r ! ( m ¡ r ) ! ( n ¡ r ) ! ( N ¡ m ¡ n + r ) ! N ! ´ h yper ( r; N ; m ; n )

Professor William H. Press, Department of Computer Science, the University of Texas at Austin7 And now, review the multinomial distribution On each i.i.d. try, exactly one of K outcomes occurs, with probabilities p 1 ; p 2 ;:::; p K K X i = 1 p i = 1 For N tries, the probability of seeing exactly the outcome n 1 ; n 2 ;:::; n K K X i = 1 n i = N is probability of one specific outcome number of equivalent arrangements abcde fgh ijklmnop q rs tuvwxyz (12345)(123)( )(1)(12)( ) N=26: N! arrangements n 1 = 5 n 2 = 3 n 6 = 7 partition into the observed n i ’s P ( n 1 ;:::; n K j N ; p 1 ;:::; p K ) = N ! n 1 ! ¢¢¢ n K ! p n 1 1 p n 2 2 ¢¢¢ p n K K