Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Displaying and Describing Categorical Data.

Similar presentations


Presentation on theme: "Chapter 3 Displaying and Describing Categorical Data."— Presentation transcript:

1 Chapter 3 Displaying and Describing Categorical Data

2 Categorical variable There are only finite number of possible values

3 Categorical variable There are only finite number of possible values –Gender –Car size –Course grade

4 Example: Titanic At 11:40pm on Apr 14, 1912 Over 1500 passengers and crewmembers died How can we process the information more efficiently? SurvivedAgeSexClass DeadAdultMaleThird DeadAdultMaleCrew DeadAdultMaleThird DeadAdultMaleCrew DeadAdultMaleCrew DeadAdultMaleCrew AliveAdultFemaleFirst DeadAdultMaleThird DeadAdultMaleCrew

5 Three rules of data analysis 1.Make a picture 2.Make a picture 3.Make a picture

6 Three rules of data analysis 1.Make a picture A picture can reveal the pattern and relationship hidden in your data 2.Make a picture A picture can show extraordinary data values or unexpected patterns 3.Make a picture Fast for readers to understand

7 Florence Nightingale Founder of modern nursing First female member of British Statistical Society Used a picture to argue forcefully for better hospital conditions for soldiers

8

9 Different colors represent deaths caused by different reasons. The blue area corresponds to death due to preventable or mitigable Zymotic disease. The picture showed that in the Crimean war, far more soldiers died of illness and infection than died of battle wounds. Her campaign succeeded in improving hospital conditions and nursing for soldiers.

10 Frequency tables Making piles: count the number of cases corresponding to each category and pile them up Titanic Example: by ticket class ClassCount First325 Second285 Third706 Crew885

11 Relatively frequency table Proportion: divide counts by the total number of cases Percentage: multiply by 100 The frequency table or relative frequency table describe the distribution of a categorical variable

12 Relatively frequency table Proportion: divide counts by the total number of cases Percentage: multiply by 100 The frequency table or relative frequency table describe the distribution of a categorical variable ClassPercentage First14.77% Second12.95% Third32.08% Crew40.21%

13 What is your feeling about the proportion of crew members on board?

14 Why is the picture misleading?

15 The length of each ship corresponds to the number of people in each category Our eyes tend to be more impressed by the area than by other aspects of the image. Even though the length of the ship is about 3 times, but the area is about 9 times. And that is misleading.

16 The area principle the area occupied by a part of the graph should correspond to the magnitude of the value it represents.

17 Bar chart Display of counts of a categorical variable with bars

18 Bar chart Display of counts of a categorical variable with bars

19 Pie Charts

20

21 When you make a bar chart or pie chart, pay attention to the following

22 Make sure the variable is indeed categorical

23 When you make a bar chart or pie chart, pay attention to the following Make sure the variable is indeed categorical Your data are counts or percentages of cases in categories

24 When you make a bar chart or pie chart, pay attention to the following Make sure the variable is indeed categorical Your data are counts or percentages of cases in categories Make sure that the categories do not overlap

25 Was there a relationship between the kind of ticket a passenger held and the passenger’s chances of making it into the lifeboat? What table should we make to answer this question?

26 Contingency table A two-way table The table shows how the subjects are distributed along each variable, contingent on the value of the other variable FirstSecondThirdCrewtotal Alive203118178212711 Dead1221675286731490 Total3252857068852201

27 Add relative frequencies FirstSecondThirdCrewtotal Alive Counts203118178212711 % of Row28.55%16.60%25.04%29.82%100.00% % of Column62.46%41.40%25.21%23.95%32.30% % of Table9.22%5.36%8.09%9.63%32.30% Dead Counts1221675286731490 % of Row8.19%11.21%35.44%45.17%100.00% % of Column37.54%58.60%74.79%76.05%67.70% % of Table5.54%7.59%23.99%30.58%67.70% Total Counts3252857068852201 % of Row14.77%12.95%32.08%40.21%100.00% % of Column100.00% % of Table14.77%12.95%32.08%40.21%100.00%

28 A simplified table FirstSecondThirdCrewtotal Alive9.22%5.36%8.09%9.63%32.30% Dead5.54%7.59%23.99%30.58%67.70%

29 Percent of what? What percent of the survivors were in second class? What percent were second-class passengers who survived? What percent of the second-class passengers survived?

30 Percent of what? What percent of the survivors were in second class? –118/711 What percent were second-class passengers who survived? –The Who is everyone on board, i.e., 2201 is the denominator –118/2201 What percent of the second-class passengers survived? –118/285

31 Marginal distribution In the margins of a contingency table, the frequency distribution of one of the variables is called its marginal distribution

32 Conditional distribution FirstSecondThirdCrewtotal Alive 203118178212711 28.55%16.60%25.04%29.82%100.00% Dead 1221675286731490 8.19%11.21%35.44%45.17%100.00%

33 Pie chart for conditional distributions

34 Bar chart for conditional distributions

35 Segmented Bar Chart

36 What can go wrong? Do not violate the area principle

37 What can go wrong? Keep it honest –Pay attention to labels –Whether all percentages add up to 1? Do not confuse similar-sounding percentages –The percentage of passengers who were both in first class and survived –The percentage of the first class passengers who survived –The percentage of the survivors who were in first class

38 What can go wrong? Do not forget to look at the variables separately, too. –Look at both conditional and marginal distributions Be sure to use enough individuals Do not overstate your case

39 What can go wrong? Be careful with averages of proportion across several different groups Simpson’s Paradox On-time record for two pilots DayNightOverall Moe90/100=90%10/20=50%100/120=83% Jill19/20=95%75/100=75%94/120=78%


Download ppt "Chapter 3 Displaying and Describing Categorical Data."

Similar presentations


Ads by Google