Download presentation
Presentation is loading. Please wait.
1
Displaying and Describing Categorical Data
Plots, Graphs, Pictures and Line Graphs
2
Well-Designed Statistical Pictures
“Out of the clutter, find simplicity” – Albert Einstein “Simplicity is the ultimate sophistication.” - Leonardo DaVinci Basic Characteristics: 1. Data should stand out clearly from background. 2. Clear labeling that indicates a. title or purpose of picture. b. what each axis, bar, pie segment, …, denotes. c. scale of each axis, including starting points. 3. Source for the data. 4. As little “chart junk” (extraneous material) as possible.
3
Thought Question 1. Here is a plot that has some problems. Give two reasons why this is not a good plot.
4
Pictures of Categorical Data
Three common pictures: Pie Charts Bar Graphs Pictograms
5
Pie Charts When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category.
6
What Can Go Wrong? While some people might like the pie chart on the left better, it is harder to compare fractions of the whole, which a well- done pie chart does.
7
Nightingale Graph preventable diseases (in blue), the results of wounds (in red), due to other causes (in black)
8
Frequency Tables: Making Piles
We can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names. A relative frequency table is similar, but gives the percentages (instead of counts) for each category.
9
Conditional Distributions
The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions:
10
Contingency Tables A contingency table allows us to look at two categorical variables together. Example: we can examine the class of ticket and whether a person survived the Titanic:
11
Conditional Distributions
A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The following is the conditional distribution of ticket Class, conditional on having survived:
12
Are Class and Survival Independent?
We see that the distribution of Class for the survivors is different from that of the nonsurvivors. This leads us to believe that Class and Survival are associated, that they are not independent.
13
SAS Windowing Environment
For the class, we will be using the SAS Windowing Environment ….
14
SAS - The FREQ Procedure
The FREQ procedure can do the following: produce one-way to n-way frequency and crosstabulation (contingency) tables compute chi-square tests for one-way to n-way tables and measures of association and agreement for contingency tables automatically display the output in a report and save the output in a SAS data set General form of the FREQ procedure: PROC FREQ DATA=SASdataset <option(s)>; TABLES variable(s) </ option(s)>; RUN;
15
one-way frequency tables two-way frequency table
The TABLES Statement Note: Place at top of Program libname orion 'C:\Columbia\classes\stat1111\Spring_2012\SAS\pgms_data'; The TABLES statement specifies the frequency and crosstabulation tables to produce. An asterisk between variables requests a n-way crosstabulation table. proc freq data=orion.sales; tables Gender Country; run; one-way frequency tables proc freq data=orion.sales; tables Gender*Country; run; two-way frequency table p112d01
16
The TABLES Statement A one-way frequency table produces frequencies, cumulative frequencies, percentages, and cumulative percentages. proc freq data=orion.sales; tables Gender Country; run;
17
The TABLES Statement An n-way frequency table produces cell frequencies, cell percentages, cell percentages of row frequencies, and cell percentages of column frequencies, plus total frequency and percent. proc freq data=orion.sales; tables Gender*Country; run; rows columns
18
The TABLES Statement
19
Physicians’ Health Study (1988)
5-year randomized experiment 22,071 male physicians (40 to 84 years old). Group 1: took ordinary aspirin tablet every other day. Group 2: took placebo (looked like aspirin but no active ingredients).
21
Bar Charts A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. Thus, a better display for the ship data is:
22
Bar Charts A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages in the ship data:
23
Bar Charts Percentage of men and women 16 and over in the labor force
Show what percentage or frequency of the whole fall into each category – can be used for two or three variables simultaneously.
24
Bar Charts – Botox Study Example
25
Segmented Bar Charts A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Each bar is treated as the “whole” and is divided proportionally into segments corresponding o the percentage in each group. Here is the segmented bar chart for ticket Class by Survival status:
26
Bar Graphs Energy Data
27
Bar Graphs
28
Bar Graphs
29
Bar Graphs
30
Bar Graphs
31
Pictograms Percentage of Ph.D.s earned by women.
Bar graph that uses pictures related to topic. Left pictogram: Misleading because eye focuses on area rather than just height. Right pictogram: Visually more accurate, but less appealing.
32
Study: Hematocrit was not validated as a surrogate end point for survival among epoetin-treated hemodialysis patients – Journal of Clinical Epidemiology, 2004
33
Producing Bar Charts in SAS
goptions reset=all; pattern value = solid color = blue; title "Generating a Bar Chart - Using PROC GCHART"; proc gchart data=store; vbar Region; run; quit;
35
Producing Bar Charts in SAS
title "Generating a Bar Chart - Using PROC SGPLOT"; proc sgplot data=store; vbar Region; run;
37
Producing Bar Charts in SAS
38
The Kids Are More Than All Right NY Times – Feb 2nd, 2012
Every few years, parents find new reasons to worry about their teenagers. And while there is no question that some kids continue to experiment with sex and substance abuse, the latest data point to something perhaps more surprising: the current generation is, well, a bit boring when it comes to bad behavior.
39
The Kids Are More Than All Right NY Times – Feb 2nd, 2012
40
Challenger Shuttle Disaster
The morning of 27 January 1986 was particularly cold in the USA. Several engineers raised concerns over the low temperature at Kennedy Space Center, particularly the rubber O-rings. No one had tested the O-rings at such low temperatures. However management needed to get the mission underway, as soon as possible. The night before, the decision was made to go ahead with the launch.
41
Challenger Shuttle Disaster
42
Challenger Shuttle Disaster
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.