Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 5 Association between Categorical Variables.

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Copyright © 2011 Pearson Education, Inc. Association between Categorical Variables Chapter 5.
AP Statistics Section 4.2 Relationships Between Categorical Variables.
Exploring Two Categorical Variables: Contingency Tables
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Active Learning Lecture Slides
11-3 Contingency Tables In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data.
Statistics: Categorical Variables. Do Now:  Give the context/ label the variables for the following situation:  The Federal Aviation Administration.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.3 Determining.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.3 Determining.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
The Three Rules of Data Analysis
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 Displaying and Describing Categorical Data.
Copyright © 2012 Pearson Education. Chapter 4 Displaying and Describing Categorical Data.
Copyright © 2011 Pearson Education, Inc. Describing Categorical Data Chapter 3.
Copyright © 2010 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 8 Conditional Probability.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 3- 1.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Active Learning Lecture Slides For use with Classroom Response Systems Chapter 5 Association between Categorical.
Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents.
CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. - use pie charts, bar graphs, and tables to display data Chapter 3: Displaying and Describing Categorical.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 3 Describing Categorical Data.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 6 Association between Quantitative Variables.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Chapter 3: Descriptive Study of Bivariate Data. Univariate Data: data involving a single variable. Multivariate Data: data involving more than one variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Slide 3-1 Copyright © 2004 Pearson Education, Inc.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
CHAPTER 6: Two-Way Tables*
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Unit 6, Module 15 – Two Way Tables (Part I) Categorical Data Comparing 2.
Chapter 3 Displaying and Describing Categorical Data Math2200.
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Copyright © 2009 Pearson Education, Inc. Chapter 3 Displaying and Describing Categorical Data.
Copyright © 2011 Pearson Education, Inc. Association between Quantitative Variables Chapter 6.
Displaying and Describing Categorical Data Chapter 3.
C HAPTER 5: A SSOCIATION BETWEEN C ATEGORICAL V ARIABLES H OMEWORK #4.
Second factor: education
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
Data Analysis for Two-Way Tables
Chapter 1 Data Analysis Section 1.1 Analyzing Categorical Data.
Relations in Categorical Data
AP Statistics Chapter 3 Part 2
Second factor: education
Section 4-3 Relations in Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Relations in Categorical Data
Presentation transcript:

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 5 Association between Categorical Variables

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Which hosts send more buyers to Amazon.com?  To answer this question we must gather data on two categorical variables: Host and Purchase  Host identifies the originating site: Comcast, Google, or Nextag; Purchase indicates whether or not the visit results in a sale

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Consider Two Categorical Variables Simultaneously  A table that shows counts of cases on one categorical variable contingent on the value of another (for every combination of both variables)  Cells in a contingency table are mutually exclusive

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Contingency Table for Web Shopping

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Marginal and Conditional Distributions Marginal distributions appear in the “margins” of a contingency table and represent the totals (frequencies) for each categorical variable separately Conditional distributions refer to counts within a row or column of a contingency table (restricted to cases satisfying a condition)

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Conditional Distribution of Purchase for each Host (Column Counts and Percentages)

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Conditional Distribution Reveals that Comcast has the highest rate of purchases, more than twice that of Nextag Host and Purchase are associated

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Stacked Bar Charts Used to display conditional distributions Divides the bars in a bar chart proportionally into segments corresponding to the percentage in each group of a second variable

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Contingency Table of Purchase by Region

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Stacked Bar Chart Shows No Association

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Mosaic Plots  Alternative to stacked bar chart  A plot in which the size of each “tile” is proportional to the count in a cell of a contingency table

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Contingency Table of Shirt Size by Style

Copyright © 2014, 2011 Pearson Education, Inc Contingency Tables Mosaic Plot Shows Association

Copyright © 2014, 2011 Pearson Education, Inc. 14 4M Example 5.1: CAR THEFT Motivation Should insurance companies vary the premiums for different car models (are some cars more likely to be stolen than others)?

Copyright © 2014, 2011 Pearson Education, Inc. 15 4M Example 5.1: CAR THEFT Method Data obtained from the National Highway Traffic Safety Administration (NHTSA) on car theft for seven popular models (two categorical variables: type of car and whether the car was stolen).

Copyright © 2014, 2011 Pearson Education, Inc. 16 4M Example 5.1: CAR THEFT Mechanics

Copyright © 2014, 2011 Pearson Education, Inc. 17 4M Example 5.1: CAR THEFT Mechanics

Copyright © 2014, 2011 Pearson Education, Inc. 18 4M Example 5.1: CAR THEFT Message The Dodge Intrepid is more likely to be stolen than other popular models. The data suggest that higher premiums for theft insurance should be charged for models that are more likely to be stolen.

Copyright © 2014, 2011 Pearson Education, Inc Lurking Variables and Simpson’s Paradox Association Not Necessarily Causation  Lurking Variable: a concealed variable that affects the apparent relationship between two other variables  Simpson’s Paradox: a change in the association between two variables when data are separated into groups defined by a third variable

Copyright © 2014, 2011 Pearson Education, Inc Lurking Variables and Simpson’s Paradox Hidden Lurking Variable (Weight)

Copyright © 2014, 2011 Pearson Education, Inc Lurking Variables and Simpson’s Paradox Adjusted for Lurking Variable (Weight)

Copyright © 2014, 2011 Pearson Education, Inc. 22 4M Example 5.2: AIRLINE ARRIVALS Motivation Does it matter which of two airlines a corporate CEO chooses when flying to meetings if he wants to avoid delays?

Copyright © 2014, 2011 Pearson Education, Inc. 23 4M Example 5.2: AIRLINE ARRIVALS Method Data obtained from US Bureau of Transportation Statistics on flight delays for two airlines (two categorical variables: airline and whether the flight arrived on time).

Copyright © 2014, 2011 Pearson Education, Inc. 24 4M Example 5.2: AIRLINE ARRIVALS Mechanics

Copyright © 2014, 2011 Pearson Education, Inc. 25 4M Example 5.2: AIRLINE ARRIVALS Mechanics – Is destination a lurking variable?

Copyright © 2014, 2011 Pearson Education, Inc. 26 4M Example 5.2: AIRLINE ARRIVALS Mechanics – This is Simpson’s Paradox

Copyright © 2014, 2011 Pearson Education, Inc. 27 4M Example 5.2: AIRLINE ARRIVALS Message The CEO should book on US Airways as it is more likely to arrive on time regardless of destination.

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Chi-Squared Statistic  A measure of association in a contingency table  Calculated based on a comparison of the observed contingency table to an artificial table with the same marginal totals but no association

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Contingency Table

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Calculating the Chi-Squared Statistic

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Calculating the Chi-Squared Statistic

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Cramer’s V  Derived from the Chi-Squared Statistic  Ranges in value from 0 (variables are not associated) to 1(variables are perfectly associated)

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Calculating Cramer’s V ν = 0.20 for our example There is a weak association between group (students or staff) and attitude toward sharing copyrighted music

Copyright © 2014, 2011 Pearson Education, Inc Strength of Association Checklist: Chi-Squared and Cramer’s V  Verify that variables are categorical  Verify that there are no obvious lurking variables

Copyright © 2014, 2011 Pearson Education, Inc. 35 4M Example 5.3: REAL ESTATE Motivation Do people who heat their homes with gas prefer to cook with gas as well? What heating systems and appliances should a developer select for newly built homes?

Copyright © 2014, 2011 Pearson Education, Inc. 36 4M Example 5.3: REAL ESTATE Method The developer contacts homeowners to obtain the data. Two categorical variables: type of fuel used for home heating (gas or electric) and type of fuel used for cooking (gas or electric).

Copyright © 2014, 2011 Pearson Education, Inc. 37 4M Example 5.3: REAL ESTATE Mechanics Chi-Squared = 98.62; Cramer’s V = 0.47

Copyright © 2014, 2011 Pearson Education, Inc. 38 4M Example 5.3: REAL ESTATE Message Homeowners prefer gas to electric heat by about 2 to 1. The developer should build about two-thirds of new homes with gas heat. Put electric appliances in all homes with electric heat and in half of the homes with gas heat (assuming that buyers for new homes have the same preferences).

Copyright © 2014, 2011 Pearson Education, Inc. 39 Best Practices  Use contingency tables to find and summarize association between two categorical variables.  Be on the lookout for lurking variables.  Use plots to show association.  Exploit the absence of association.

Copyright © 2014, 2011 Pearson Education, Inc. 40 Pitfalls  Don’t interpret association as causation.  Don’t display too many numbers in a table.