Topic 4: Exploring Categorical Data

Slides:



Advertisements
Similar presentations
Introduction to Stats Honors Analysis. Data Analysis Individuals: Objects described by a set of data. (Ex: People, animals, things) Variable: Any characteristic.
Advertisements

QM 1 - Intro to Quant Methods Graphical Descriptive Statistics Charts and Tables Dr. J. Affisco.
Is it what it is. Depending on the data type, we can use different types of display. When dealing with categorical (nominal) data we often use a.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Displaying & Describing Categorical Data Chapter 3.
Exploring Data Section 1.1 Analyzing Categorical Data.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Active Learning Lecture Slides For use with Classroom Response Systems Chapter 5 Association between Categorical.
Aim: How do we analyze data with a two-way table?
Tabular and Graphical Representations of Data 8/24/11.
Chapter 3: Descriptive Study of Bivariate Data. Univariate Data: data involving a single variable. Multivariate Data: data involving more than one variable.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
Categorical Data! Frequency Table –Records the totals (counts or percentage of observations) for each category. If percentages are shown, it is a relative.
Objectives Given a contingency table of counts, construct a marginal distribution. Given a contingency table of counts, create a conditional distribution.
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data.
Displaying and Describing Categorical Data Chapter 3.
 Display Categorical Variables appropriately  Find and interpret marginal and conditional distributions for 2 Categorical Variables  Determine if 2.
Smart Start In June 2003, Consumer Reports published an article on some sport-utility vehicles they had tested recently. They had reported some basic.
Displaying and describing categorical data
Descriptive Statistics: Tabular and Graphical Methods
Descriptive Statistics: Tabular and Graphical Methods
Organizing Qualitative Data
Topic 6: Proportions and Probabilities
Statistics 200 Lecture #7 Tuesday, September 13, 2016
Displaying and Describing Categorical Data
BUSINESS MATHEMATICS & STATISTICS.
Displaying and Describing Categorical Data
The Practice of Statistics in the Life Sciences Third Edition
3 2 Chapter Organizing and Summarizing Data
Lesson 13: Things To Watch out for
CHAPTER 1 Exploring Data
AP Statistics Chapter 3 Part 3
The percent of Americans older than 18 who don’t use internet.
Chapter 2 Describing Data: Graphs and Tables
Analysis of two-way tables - Data analysis for two-way tables
The Practice of Statistics in the Life Sciences Fourth Edition
CATEGORICAL DATA CHAPTER 3
Data Analysis for Two-Way Tables
AP Exam Review Chapters 1-10
Topic 5: Exploring Quantitative data
Bivariate Testing (Chi Square)
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
Multivariate Data Summary
Chapter 1 Data Analysis Section 1.1 Analyzing Categorical Data.
Relations in Categorical Data
Analyzing Categorical Data
Treat everyone with sincerity,
Unit 2: Analyzing Univariate Data
Displaying and Describing Categorical Data
Types of Graphs… and when to use them!.
Categorical Data National Opinion Research Center’s General Social Survey In 2006 a sample of 1928 adults in the U.S. were asked the question “When is.
CHAPTER 1 Exploring Data
Week 3 Lecture Notes PSYC2021: Winter 2019.
Organizing Qualitative Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Displaying and Describing Categorical Data
Displaying Data – Charts & Graphs
Experimental Design Experiments Observational Studies
Displaying and Describing Categorical data
Displaying and Describing Categorical Data
Histograms.
Unit 1: Analyzing Data – Describing Distributions
Displaying and Describing Categorical Data
Displaying and Describing Categorical Data
DATA TABLES.
Chapter 11 Lecture 2 Section: 11.3.
Presentation transcript:

Topic 4: Exploring Categorical Data

Frequency tables and bar plots

Data matrix for emails Rows 1, 2, 3, and 3921 of a data matrix are displayed below. It contains data collected on 3,921 emails that were received. Spam Num_char Line_breaks Format Number 1 no 21706 551 html small 2 7011 183 big 3 yes 631 28 text none . 3921 2225 65 Variable Description Spam Specifies whether the email is spam Num_char Number of characters in email Line_breaks Number of line breaks in email Format Specifies whether email was in html or text format Number Indicates if email contained no number, a small number (under 1,000,000), or a big number

Data matrix for emails Categorical variables Rows 1, 2, 3, and 3921 of a data matrix are displayed below. It contains data collected on 3,921 emails that were received. Spam Num_char Line_breaks Format Number 1 no 21706 551 html small 2 7011 183 big 3 yes 631 28 text none . 3921 2225 65 Categorical variables

Frequency Table A table that summarizes data for a single categorical variable is called a frequency table. A frequency table can display raw counts, proportions, or both. Examples for the variable number are below. raw count None Small Big Total 549 2827 545 3921

Frequency Table A table that summarizes data for a single categorical variable is called a frequency table. A frequency table can display raw counts, proportions, or both. Examples for the variable number are below. raw count None Small Big Total 549 2827 545 3921 proportion None Small Big Total 0.14 0.72 1

Frequency Table A table that summarizes data for a single categorical variable is called a frequency table. A frequency table can display raw counts, proportions, or both. Examples for the variable number are below. raw count both None Small Big Total 549 2827 545 3921 Count Proportion None 549 0.14 Small 2827 0.72 Big 545 Total 3921 1 proportion None Small Big Total 0.14 0.72 1

Bar plot A bar plot is a graphical representation of a frequency table. raw count proportion None Small Big Total 549 2827 545 3921 None Small Big Total 0.14 0.72 1

The order of the bars There is often a natural ordering for the bars, such as by class year in the example below.

Changing the order of the bars When the bars are ordered from highest count to lowest count, it is sometimes called a Pareto chart.

Bar plot vs. pie chart Pie charts are another way to graphically represent a frequency table. They are well known, but generally not as useful as bar plots.

Categorical data pairs: contingency tables, side-by-side bar plots, segmented bar plots, and mosaic plots

Recall the data matrix for emails Rows 1, 2, 3, and 3921 of a data matrix are displayed below. It contains data collected on 3,921 emails that were received. Spam Num_char Line_breaks Format Number 1 no 21706 551 html small 2 7011 183 big 3 yes 631 28 text none . 3921 2225 65 Categorical variables

Pairing two categorical variables Rows 1, 2, 3, and 3921 of a data matrix are displayed below. It contains data collected on 3,921 emails that were received. Spam Num_char Line_breaks Format Number 1 no 21706 551 html small 2 7011 183 big 3 yes 631 28 text none . 3921 2225 65

Contingency Table A table that summarizes data for two categorical variables is called a contingency table.

Row and column proportions Row proportions are computed using row totals, and column proportions using column totals. None Small Big Total Spam 149/367 = 0.406 168/367 = 0.458 50/367 = 0.136 1.000 Not spam 400/3554 = 0.113 2657/3554 = 0.748 495/3554 = 0.139 549/3921 = 0.140 2827/3921 = 0.721 545/3921 = 0.139 None Small Big Total Spam 149/549 = 0.271 168/2827 = 0.059 50/545 = 0.092 367/3921 = 0.094 Not spam 400/549 = 0.729 2657/2827 = 0.941 495/545 = 0.908 3684/3921 = 0.906 1.000

Segmented bar plot vs. side-by-side bar plot

Segmented bar plot: count vs. proportion

Mosaic Plot

Mosaic Plot

Simpson’s Paradox

Example: long-term study on smoking A survey of 1,314 women in the United Kingdom during 1972-1974 asked each woman whether she was a smoker. Twenty years later, a follow-up survey observed whether each woman was dead or still alive. Below is a summary of the results. Survival Status Dead Alive Total Smoking Status Smoker 139 (23.88%) 443 (76.12%) 582 (100%) Non-smoker 230 (31.42%) 502 (68.58%) 732 369 (28.08%) 945 (71.92%) 1314

Example: long-term study on smoking A survey of 1,314 women in the United Kingdom during 1972-1974 asked each woman whether she was a smoker. Twenty years later, a follow-up survey observed whether each woman was dead or still alive. Below is a summary of the results. Survival Status Dead Alive Total Smoking Status Smoker 139 (23.88%) 443 (76.12%) 582 (100%) Non-smoker 230 (31.42%) 502 (68.58%) 732 369 (28.08%) 945 (71.92%) 1314

Example: long-term study on smoking A survey of 1,314 women in the United Kingdom during 1972-1974 asked each woman whether she was a smoker. Twenty years later, a follow-up survey observed whether each woman was dead or still alive. Below is a summary of the results.