Distributions of Nominal Variables 12/02. Nominal Data Some measurements are just types or categories – Favorite color, college major, political affiliation,

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Basic Statistics The Chi Square Test of Independence.
Chi Square Tests Chapter 17. Nonparametric Statistics A special class of hypothesis tests Used when assumptions for parametric tests are not met –Review:
Hypothesis Testing IV Chi Square.
Statistical Inference for Frequency Data Chapter 16.
Chapter 10 Chi-Square Tests and the F- Distribution 1 Larson/Farber 4th ed.
Statistics for AP Biology. Understanding Trends in Data Mean: The average or middle of the data Range: The spread of the data Standard deviation: Variation.
PSY 307 – Statistics for the Behavioral Sciences
PSY 307 – Statistics for the Behavioral Sciences
Introduction to Chi-Square Procedures March 11, 2010.
BHS Methods in Behavioral Sciences I
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
AS 737 Categorical Data Analysis For Multivariate
11.4 Hardy-Wineberg Equilibrium. Equation - used to predict genotype frequencies in a population Predicted genotype frequencies are compared with Actual.
Chi-Squared Test.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
10.1: Multinomial Experiments Multinomial experiment A probability experiment consisting of a fixed number of trials in which there are more than two possible.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Chi-Square Test.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 17 l Chi-Squared Analysis: Testing for Patterns in Qualitative Data.
GOODNESS OF FIT Larson/Farber 4th ed 1 Section 10.1.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Chi-Square Test James A. Pershing, Ph.D. Indiana University.
Non-parametric tests (chi-square test) Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Statistics in IB Biology Error bars, standard deviation, t-test and more.
AGENDA:. AP STAT Ch. 14.: X 2 Tests Goodness of Fit Homogeniety Independence EQ: What are expected values and how are they used to calculate Chi-Square?
Statistics 300: Elementary Statistics Section 11-2.
Warm-up Wednesday, You are a scientist and you finished your experiment. What do you do with your data? Discuss with your group members and we.
Lecture 11. The chi-square test for goodness of fit.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chi Square Analysis. What is the chi-square statistic? The chi-square (chi, the Greek letter pronounced "kye”) statistic is a nonparametric statistical.
Section 10.1 Goodness of Fit © 2012 Pearson Education, Inc. All rights reserved. 1 of 91.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Distributions of Nominal Variables
Chapter 9: Non-parametric Tests
Distributions of Nominal Variables
Distributions of Nominal Variables
Chi-Square Test.
Chi-Square Test.
Chi-Square Test.
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
The Binomial Distributions
Presentation transcript:

Distributions of Nominal Variables 12/02

Nominal Data Some measurements are just types or categories – Favorite color, college major, political affiliation, how you get to school, where you’re from Minimal mathematical structure, but we can still do hypothesis testing Hypotheses about frequencies or probabilities – Are all categories equally likely? – Do two groups differ in their distributions? – Are two nominal variables related or independent?

Extending the Binomial Test Binomial test – Frequency of observations in yes/true category – Compare to prediction of null hypothesis Normal approximation – Treat binomial distribution as Normal – Convert frequency to z-score  freq f

Extending the Binomial Test Multinomial test – Count observations in every category Observed frequencies, f obs – Convert each to z-score – H 0 predicts each z should be near 0 Chi-square statistic – Sum of squared z-scores – Measures deviation from null hypothesis p-value – Probability of result greater than  2 – Uses chi-square distribution – df = k – 1 (k is number of categories) – Counts are not independent; last constrained by rest

Details of z-score Expected frequencies: – Frequency of each category predicted by H 0 – Expected value or mean of sampling distribution – Category probability times number of observations (n) – If all categories equally likely: Standard error – Denominator of z formula – Standard deviation of sampling distribution (adjusted for degrees of freedom) – Equals square root of expected frequency:

Example: Favorite Colors Choices: Red, Yellow, Green, Blue, Purple Are they all equally popular? Null hypothesis: For each color, Deviances: Squared z-scores: Chi-square statistic: Critical value (df = 4,  = 5%): 9.49

Independence of Nominal Variables Are two nominal variables related? – Same question as correlation, but need different approach – Do probabilities for one variable differ between categories of another? – Experimental condition vs. success of learning; sex vs. political affiliation; origin vs. major Independent nominal variables – Probabilities for each variable unaffected by other – Example: 80% from CO, 10% psych majors 80% of psych majors are from CO 80%  10% = 8% both psych and from CO – p(x & y) = p(x)  p(y)

Chi-square Test of Independence Null hypothesis: Variables are independent – Use H 0 to calculate expected frequencies Find observed marginal frequencies for each variable – Total count for each category, ignoring levels of other variable Multiply marginal frequencies to get expected frequency for combination Same formula as before: y1y1 y2y2 y3y3 x1x x2x x3x

General Principles of Chi-square Tests Can use any prediction about data as null hypothesis – Very general approach Measure goodness of fit – Actually badness of fit – Deviation of data from prediction Nominal data – Calculate z-score for each frequency within its sampling distribution Observed minus expected frequency, divided by sqrt(f exp ) – Square zs and sum, to get  2 – Distribution of one variable; dependence between two variables Compare GoF to chi-square distribution to get p-value – p = p(  2 df >  2 ) df comes from number of parameters constrained by H 0 – k – 1 for multinomial test; (k X – 1)(k Y – 1) for independence test

Review Dogs in the U.S. are 30% Labrador 20% Chihuahua 15% German Shepherd 35% Other The local shelter has 200 dogs. How many of each would you expect if sheter dogs had the same distribution as the general population? A.50 Lab, 50 Chihuahua, 50 GSD, 50 other B.30 Lab, 20 Chihuahua, 15 GSD, 35 other C.60 Lab, 40 Chihuahua, 30 GSD, 70 other D.25 Lab, 25 Chihuahua, 25 GSD, 25 other

Review The actual breed counts are as follows: LabradorChihuahuaGSDOther Observed Expected Calculate a  2 statistic for testing whether this shelter reliably deviates from the breed distribution in the general population. A.0.74 B.6.61 C.7.77 D.250 p =.085

Review Next we count how many get adopted in the next month: LabradorChihuahuaGSDOtherTotal Adopted Not Total If being adopted is independent of breed, how many German Shepherds would you have expected to be adopted? A.8 B.10 C.20 D.25