Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Quantitative Skills 4: The Chi-Square Test
Hypothesis Testing IV Chi Square.
Statistical Inference for Frequency Data Chapter 16.
Midterm Review Session
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
The Normal Distribution. n = 20,290  =  = Population.
Final Review Session.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
8-3 Testing a Claim about a Proportion
11-2 Goodness-of-Fit In this section, we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way.
Hypothesis Testing Using The One-Sample t-Test
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Goodness of Fit Multinomials. Multinomial Proportions Thus far we have discussed proportions for situations where the result for the qualitative variable.
AM Recitation 2/10/11.
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Mid-Term Review Final Review Statistical for Business (1)(2)
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
ENGINEERING STATISTICS I
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
CHI SQUARE TESTS.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Chapter 11 Inferences about population proportions using the z statistic.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
© Copyright McGraw-Hill 2004
Statistics 300: Elementary Statistics Section 11-2.
Lecture 11. The chi-square test for goodness of fit.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
The Chi-square Statistic
Chapter 9: Non-parametric Tests
Hypothesis Testing I The One-sample Case
Chapter 4. Inference about Process Quality
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Chapter 9: Inferences Involving One Population
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Data Analysis for Two-Way Tables
Analyzing the Association Between Categorical Variables
Chi2 (A.K.A X2).
How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.
Presentation transcript:

Fitting probability models to frequency data

Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do two things: –Estimate a parameter with confidence interval –Test a hypothesis

Estimating a proportion

Confidence interval for a proportion* where Z = 1.96 for a 95% confidence interval * The Agresti-Couli method

Hypothesis testing Want to know something about a population Take a sample from that population Measure the sample What would you expect the sample to look like under the null hypothesis? Compare the actual sample to this expectation

weird not so weird

Sample Test statistic Null hypothesis Null distribution compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o

Binomial test

Test statistic For the binomial test, the test statistic is the number of successes

Binomial test

The binomial distribution

Binomial distribution, n = 20, p = 0.5 x

x Test statistic

P-value P-value - the probability of obtaining the data* if the null hypothesis were true *as great or greater difference from the null hypothesis

P-value Add up the probabilities from the null distribution Start at the test statistic, and go towards the tail Multiply by 2 = two tailed test

Binomial distribution, n = 20, p = 0.5 x P = 2*(Pr[16]+Pr[17]+Pr[18] +Pr[19]+Pr[20])

Sample Test statistic Null hypothesis Null distribution compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o

N =20, p 0 =0.5 This is a pain….

Calculating P-values By hand Use computer software like jmp, excel Use tables

Sample Test statistic Null hypothesis Null distribution compare How unusual is this test statistic? P < 0.05 P > 0.05 Reject H o Fail to reject H o

Discrete distribution A probability distribution describing a discrete numerical random variable

Discrete distribution A probability distribution describing a discrete numerical random variable Examples: –Number of heads from 10 flips of a coin –Number of flowers in a square meter –Number of disease outbreaks in a year

 2 Goodness-of-fit test Compares counts to a discrete probability distribution

Hypotheses for  2 test

Test statistic for  2 test

Hypotheses for day of birth example

DaySun.Mon.Tues.WedThu.Fri.Sat.Total Obs Exp

The calculation for Sunday

The sampling distribution of  2 by simulation Frequency 22

Sampling distribution of  2 by the  2 distribution

Degrees of freedom The number of degrees of freedom specifies which of a family of distributions to use as the sampling distribution

Degrees of freedom for  2 test df = Number of categories (Number of parameters estimated from the data)

Degrees of freedom for day of birth df = = 6

Finding the P-value

Critical value The value of the test statistic where P = .

12.59

P<0.05, so we can reject the null hypothesis Babies in the US are not born randomly with respect to the day of the week.

Assumptions of  2 test No more than 20% of categories have Expected<5 No category with Expected  1

 2 test as approximation of binomial test If the number of data points is large, then a  2 goodness-of-fit test can be used in place of a binomial test. See text for an example.

The Poisson distribution Another discrete probability distribution Describes the number of successes in blocks of time or space, when successes happen independently of each other and occur with equal probability at every point in time or space

Poisson distribution

Example: Number of goals per side in World Cup Soccer Q: Is the outcome of a soccer game (at this level) random? In other words, is the number of goals per team distributed as expected by pure chance?

World Cup 2002 scores

Number of goals for a team (World Cup 2002)

What’s the mean,  ?

Poisson with  = 1.26 XPr[X] 88 0

Finding the Expected XPr[X]Expected 8 } Too small!

Calculating  2 XExpectedObserved 

Degrees of freedom for poisson df = Number of categories (Number of parameters estimated from the data)

Degrees of freedom for poisson df = Number of categories (Number of parameters estimated from the data) Estimated one parameter, 

Degrees of freedom for poisson df = Number of categories (Number of parameters estimated from the data) = = 3

Critical value

Comparing  2 to the critical value So we cannot reject the null hypothesis. There is no evidence that the score of a World Cup Soccer game is not Poisson distributed.