Distributions of Nominal Variables

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Basic Statistics The Chi Square Test of Independence.
Hypothesis Testing IV Chi Square.
Chapter 10 Chi-Square Tests and the F- Distribution 1 Larson/Farber 4th ed.
PSY 307 – Statistics for the Behavioral Sciences
BHS Methods in Behavioral Sciences I
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Distributions of Nominal Variables 12/02. Nominal Data Some measurements are just types or categories – Favorite color, college major, political affiliation,
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
10.1: Multinomial Experiments Multinomial experiment A probability experiment consisting of a fixed number of trials in which there are more than two possible.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
GOODNESS OF FIT Larson/Farber 4th ed 1 Section 10.1.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Chi-Square Test James A. Pershing, Ph.D. Indiana University.
Non-parametric tests (chi-square test) Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics in IB Biology Error bars, standard deviation, t-test and more.
Statistics 300: Elementary Statistics Section 11-2.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Chi Square Analysis. What is the chi-square statistic? The chi-square (chi, the Greek letter pronounced "kye”) statistic is a nonparametric statistical.
Section 10.1 Goodness of Fit © 2012 Pearson Education, Inc. All rights reserved. 1 of 91.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Basic Statistics The Chi Square Test of Independence.
Distributions of Nominal Variables
Chapter 9: Non-parametric Tests
AP Biology Intro to Statistics
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Distributions of Nominal Variables
Test for Goodness of Fit
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Section 10-1 – Goodness of Fit
Data Analysis for Two-Way Tables
Elementary Statistics: Picturing The World
Chi-Square Test.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
MENDELIAN GENETICS CHI SQUARE ANALYSIS
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Chi-Square Analysis.
Chi-Square Test.
Goodness of Fit Test - Chi-Squared Distribution
Contingency Tables (cross tabs)
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Statistical Analysis: Chi Square
Chi-Square Test.
Day 66 Agenda: Quiz Ch 12 & minutes.
Chapter 26 Comparing Counts.
Chi-squared tests Goodness of fit: Does the actual frequency distribution of some data agree with an assumption? Test of Independence: Are two characteristics.
Fundamental Statistics for the Behavioral Sciences, 4th edition
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
The Binomial Distributions
Chapter 26 Comparing Counts.
Will use Fruit Flies for our example
Presentation transcript:

Distributions of Nominal Variables 12/03

Nominal Data Some measurements are just types or categories Favorite color, college major, political affiliation, how you get to school, where you’re from Minimal mathematical structure, but we can still do hypothesis testing Hypotheses about frequencies or probabilities Are all categories equally likely? Do two groups differ in their distributions? Are two nominal variables related or independent?

Extending the Binomial Test Frequency of observations in yes/true category Compare to prediction of null hypothesis Normal approximation Treat binomial distribution as Normal Convert frequency to z-score mfreq f

Extending the Binomial Test Multinomial test Count observations in every category Observed frequencies, f obs Convert each to z-score H0 predicts each z should be near 0 Chi-square statistic Sum of squared z-scores Measures deviation from null hypothesis p-value Probability of result greater than c2 Uses chi-square distribution df = k – 1 (k is number of categories) Counts are not independent; last constrained by rest

Details of z-score Expected frequencies: Standard error Frequency of each category predicted by H0 Expected value or mean of sampling distribution Category probability times number of observations (n) If all categories equally likely: Standard error Denominator of z formula Standard deviation of sampling distribution (adjusted for degrees of freedom) Equals square root of expected frequency:

Example: Favorite Colors Choices: Red, Yellow, Green, Blue, Purple Are they all equally popular? Null hypothesis: For each color, Deviances: Squared z-scores: Chi-square statistic: Critical value (df = 4, a = 5%): 9.49

Independence of Nominal Variables Are two nominal variables related? Same question as correlation, but need different approach Do probabilities for one variable differ between categories of another? Experimental condition vs. success of learning; sex vs. political affiliation; origin vs. major Independent nominal variables Probabilities for each variable unaffected by other Example: 80% from CO, 10% psych majors 80% of psych majors are from CO 80%10% = 8% both psych and from CO p(x & y) = p(x)p(y)

Chi-square Test of Independence Null hypothesis: Variables are independent Use H0 to calculate expected frequencies Find observed marginal frequencies for each variable Total count for each category, ignoring levels of other variable Multiply marginal frequencies to get expected frequency for combination Same formula as before: y1 y2 y3 x1 40 20 10 40 30 20 80 x2 30 30 80 60 10 30 120 x3 20 40 90 80 50 40 160 90 180 360

General Principles of Chi-square Tests Can use any prediction about data as null hypothesis Very general approach Measure goodness of fit Actually badness of fit Deviation of data from prediction Nominal data Calculate z-score for each frequency within its sampling distribution Observed minus expected frequency, divided by sqrt(f exp) Square zs and sum, to get c2 Distribution of one variable; dependence between two variables Compare GoF to chi-square distribution to get p-value p = p(c2df > c2) df comes from number of parameters constrained by H0 k – 1 for multinomial test; (kX – 1)(kY – 1) for independence test

Review Dogs in the U.S. are 30% Labrador 20% Chihuahua 15% German Shepherd 35% Other The local shelter has 200 dogs. How many of each would you expect if sheter dogs had the same distribution as the general population? 50 Lab, 50 Chihuahua, 50 GSD, 50 other 30 Lab, 20 Chihuahua, 15 GSD, 35 other 60 Lab, 40 Chihuahua, 30 GSD, 70 other 25 Lab, 25 Chihuahua, 25 GSD, 25 other

Review The actual breed counts are as follows: Labrador Chihuahua GSD Other Observed 65 50 20 65 Expected 60 40 30 70 Calculate a c2 statistic for testing whether this shelter reliably deviates from the breed distribution in the general population. 0.74 6.61 7.77 250 p = .085

Review Next we count how many get adopted in the next month: Labrador Chihuahua GSD Other Total Adopted 20 35 4 21 80 Not 45 15 16 44 120 Total 65 50 20 65 200 If being adopted is independent of breed, how many German Shepherds would you have expected to be adopted? 8 10 20 25