Looking at Data - Relationships Data analysis for two-way tables

Slides:



Advertisements
Similar presentations
Data Analysis for Two-Way Tables
Advertisements

Displaying & Describing Categorical Data Chapter 3.
AP Statistics Section 4.2 Relationships Between Categorical Variables.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Relations in Categorical Data 1. When a researcher is studying the relationship between two variables, if both variables are numerical then scatterplots,
AP Statistics Section 14.2 A. The two-sample z procedures of chapter 13 allowed us to compare the proportions of successes in two groups (either two populations.
Displaying & Describing Categorical Data Chapter 3.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 Chapter 2 Displaying and Describing Categorical Data.
HW#8: Chapter 2.5 page Complete three questions on the last two slides.
Analysis of Two-Way tables Ch 9
Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.
Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
Aim: How do we analyze data with a two-way table?
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In prior chapters we studied the relationship between two quantitative variables with.
AP Statistics Section 4.2 Relationships Between Categorical Variables
Chapter 1.1 – Analyzing Categorical Data A categorical variable places individuals into one of several groups of categories. A quantitative variable takes.
CHAPTER 6: Two-Way Tables*
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Chapter 1: Exploring Data
Second factor: education
Statistics 200 Lecture #7 Tuesday, September 13, 2016
Analyzing Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
The Practice of Statistics in the Life Sciences Third Edition
CHAPTER 1 Exploring Data
CHAPTER 11 Inference for Distributions of Categorical Data
AP Statistics Chapter 3 Part 3
Chapter 1: Exploring Data
Analysis of two-way tables - Data analysis for two-way tables
Second factor: education
The Practice of Statistics in the Life Sciences Fourth Edition
Data Analysis for Two-Way Tables
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Statistics Chapter 3 Part 2
Chapter 1: Exploring Data
Second factor: education
CHAPTER 11 Inference for Distributions of Categorical Data
Displaying and Describing Categorical Data
Good Morning AP Stat! Day #2
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Section 4-3 Relations in Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 1 Exploring Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 1 Exploring Data
Displaying and Describing Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Section Way Tables and Marginal Distributions
Displaying and Describing Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Displaying and Describing Categorical Data
Relations in Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Analysis of two-way tables
Displaying and Describing Categorical Data
Presentation transcript:

Looking at Data - Relationships Data analysis for two-way tables IPS Chapter 2.5 © 2009 W.H. Freeman and Company

Objectives (IPS Chapter 2.5) Data analysis for two-way tables Two-way tables Joint distributions Marginal distributions Relationships between categorical variables Conditional distributions Simpson’s paradox

Charting Categorical Data RECALL For two quantitative variables, we usually graph them using a scatterplot. If you have one categorical and one quantitative, we use the various graphs discussed in chapter 1 (e.g. histograms, bar charts, etc) If you have two categorical variables, we usually plot them using “Two-Way Tables”

Two-way tables for Categorical Data Factors & Levels When examining the relationship between two categorical variables, we can’t use a scatterplot. However, we can use a plain old-fashioned “two-way table”. Factors: In this table, we have two factors: Age and Education. Levels: Age has three levels, and education has four levels. First factor: age Second factor: education Data obtained from 2000 U.S. Census

Respect tables! Treat tables with respect! While they may seem simple at first glance, they sometimes contain all kinds of information that may not be apparent without some careful examination. By the same token, watch out for traps – there are all kinds of ways of misinterpreting tables!!

Frequency Table When each cell in the table contains a simple frequency count, we call it a frequency table.

Marginal Frequencies Sometimes, we include the total for each row and each column in the “margins”. These numbers are known as marginal frequencies. These are pretty useful to include in your tables. Marginal frequencies are sometimes expressed in percentages (thought not in this case). 2000 U.S. census

As always, YOU get to decide how to look at the data As always, YOU get to decide how to look at the data. For example, you might want to look at each of the two marginal distributions separately. In that case, you might create a separate bar graph for each. In the case shown here, the authors decided to make it even easier to compare the levels within each factor by converting the frequency to a percentage of the total. 58,077 / 175,230 = 33.1

Conditional Frequency Table For each individual cell, we sometimes compute a proportion by dividing each cell by the sum of all values in the original table. A new table with the collection of these proportions is called a conditional frequency table. In this example, the 25-34/No-H.S. group cell is calculated by dividing 4,459 by 175,230 to give us 0.0254. In other words, of all the people in this study, 2.54% of them fell into the 25-34, no HS category. We can show conditional frequencies for the entire table (as shown here), or by a particular row or column.   25-34 35-54 >55 No HS 0.0254 0.0524 0.0812 HS 0.0660 0.1510 0.1145 <4 Coll 0.0610 0.1292 0.0635 4+ Coll 0.0632 0.1322 0.0605

Conditional Frequency by Level Sometimes we wish to look at conditional frequencies for each level of a factor. For example, in the table below, the 25 to 34 age group occupies the first column. To find the conditional distribution of education in this age group (i.e. for this particular level), look only at that column. Compute each count as a percent of the column total. (Next slide). These percents should add up to 100% because all persons in this age group fall into one of the education categories. These four percents together are “the conditional distribution of education, given the 25 to 34 age group” – see next slide. 2000 U.S. census

Conditional distributions For example, the percentage of college graduates given the 25-34 age group is 29.30%. The percentage of college graduates given the 35-54 age group is 28.44%. Etc… Note that the conditional distributions in this particular table were calculated for examination of the age category. If you were interested in the education category, you’d need to create a separate table. Here the percents are calculated by age range (columns). skip to Simpson’s 29.30% = 11071 37785 = cell total . column total

Here, the percents are calculated by age range (columns). The conditional distributions can be graphically compared using side by side bar graphs of one variable for each value of the other variable. Here, the percents are calculated by age range (columns).

Example A study was done to establish the preferred activities among men and women between TV, sports, and dancing. A random sample of 30 men and 20 women were asked their preferences. At first glance, we might be temped to say that all three activities were about equal (18, 16, and 16). However, upon closer examination, we see some major differences. Eg: Women overwhelmingly preferred dance relative to men. In fact, the effect is even more pronounced: Note that while 16 women preferred dance to only 2 men – this was 2 men out of 30 while there were 16 women out of 20. This is why it is important to go beyond basic frequencies, and look at conditional frequencies. Key Point: With tables – as with just about anything in statistics, If you just look at pieces without careful examination of the relationship to the whole picture, you run the risk of drawing entirely flawed conclusions!!! Dance Sports TV Total Men 2 11 7 30 Women 16 5 9 20 18 50

Music and wine purchase decision What is the relationship between type of music played in supermarkets and type of wine purchased? We want to compare the conditional distributions of the response variable (wine purchased) for each value of the explanatory variable (music played). Therefore, we calculate column percents. 30 = 35.7% 84 = cell total . column total Calculations: When no music was played, there were 84 bottles of wine sold. Of these, 30 were French wine. 30/84 = 0.357  35.7% of the wine sold was French when no music was played. Note how both variables are categorical: Music (French composer, italian composer, etc) Wine (from France, Italy, etc) We calculate the column conditional percents similarly for each of the nine cells in the table:

For every two-way table, there are two sets of possible conditional distributions. Does background music in supermarkets influence customer purchasing decisions? In this case, we can look at either (or both) of the two following distributions: Wine purchased for each kind of music played (column percents) Music played for each kind of wine purchased (row percents)

** Simpson’s paradox Combining groups together can lead to inaccurate conclusions. Example: Hospital death rates On the surface, Hospital B would seem to have a better record. But once patient condition is taken into account, we see that hospital A has in fact a better record. In fact, for both patient conditions!

** Simpson’s paradox In this case, the misleading information occurred because all patients were grouped together instead of stratifying them by the condition in which they were first admitted. (This is a great example of a ‘lurking variable’!) Example: Hospital death rates But once patient condition is taken into account, we see that hospital A has in fact a better record for both patient conditions! The Moore textbook has a very good example using airlne arrival times. Here, patient condition was the lurking variable.