Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data.

Similar presentations


Presentation on theme: "1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data."— Presentation transcript:

1 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data

2 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 2 Objectives The student will be able to: 5. Appropriately display categorical data using a frequency table, bar chart, segmented bar chart, or pie chart. 6. Using a contingency table, determine marginal and conditional distributions. 7. Use conditional distributions to make conclusions about the independence of two categorical variables.

3 3 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 2.1 Summarizing and Displaying a Single Categorical Variable

4 4 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Three Rules of Data Analysis Make a Picture: Helps you Think clearly about patterns and relationships hidden in the data table Make a Picture: Shows the important features of the data Make a Picture: Tells others about the data

5 5 Copyright © 2014, 2012, 2009 Pearson Education, Inc. A Titanic Misconception Were most members of the Titanic crew members? Three times as many crew members as second-class passengers The eyes are tricked by the area being nine times as large for the crew.

6 6 Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Area Principle The Area Principle: The area occupied by a part of the graph should correspond to the magnitude of the value it represents. Bars should have equal widths in a bar chart. Be cautious when using two- dimensional pictures to exhibit one-dimensional data.

7 7 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 7 Context Lets consider the people on board the Titanic on April 14, 1912 Who – People on the Titanic What – Survival status, age, sex, ticket class When – April 14, 1912 Where – North Atlantic How – A variety of sources and Internet sites Why – Historical interest

8 8 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 8 Frequency Tables: Making Piles Rather than looking at each case (each person aboard the Titanic) we can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names.

9 9 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Frequency Tables A frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s frequency. If there are many distinct outcomes, then combining them into a few categories is recommended.

10 10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Relative Frequency Tables A relative frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s relative frequency. The relative frequency table is similar to the frequency table, but it displays relative frequencies rather than frequencies.

11 11 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Bar Charts A bar chart displays the frequency or relative frequency of each category. All bars must have the same width. Good for general audience Frequency Bar Chart Relative Frequency Bar Chart

12 12 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Pie Charts A pie chart presents each category as a slice of a circle so that each slice has a size that is proportion to the whole in each category. Pie charts are also good for a general audience. Pie charts help to display the fraction of the whole that each category represents.

13 13 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Think Before You Draw Choose the chart that best tells the story of your data. Think about the intended audience to select a chart that is best for them. Charts often work better when the categories do not overlap. Don’t try to fool your audience, just give a chart that honestly expresses the interesting features of the data.

14 14 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Practice Create a frequency table for majors (use categories: liberal arts, social sciences, sciences, business, health, other/undecided) Sketch a bar chart and a relative frequency bar chart Slide 3- 14

15 15 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 2.2 Exploring the Relationship Between Two Categorical Variables

16 16 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 16 Contingency Tables A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. Example: we can examine the class of ticket and whether a person survived the Titanic:

17 17 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 17 Contingency Tables (cont.) The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable. The marginal distribution of Survival is represented by the last column: The marginal distribution of Class is represented by the last row:

18 18 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 18 Contingency Tables (cont.) Each cell of the table gives the count for a combination of values of the two values. For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk.

19 19 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Table of Percents A table of percents can be misleading. Looking at “Alive”, was it better to have a second- or third-class ticket? 8.1% were third-class survivors, 5.4% were second- class survivors. What is wrong with just comparing these percentages?

20 20 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Conditional Distributions A conditional distribution provides the percent of one variable satisfying the conditions of another. 25.2% of all third-class ticket holders survived. Was it better to have a second- or third-class ticket?

21 21 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 21 Conditional Distributions A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The following is the conditional distribution of ticket Class, conditional on having survived:

22 22 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 22 Conditional Distributions (cont.) The following is the conditional distribution of ticket Class, conditional on having perished: Note: we isolated the rows of the contingency table and divided each value by the total to get the percentages of these conditional distributions

23 23 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Conditional Distribution: Rows or Columns The “Condition” can either be based on rows or columns. This table shows that the highest percent of survivors were crew members. The highest percent of the dead were also crew members.

24 24 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Conditional Distributions as Pie Charts Pie charts can give a visual representation of the conditional distributions. Compare how the first- class ticket holders were represented amongst the survivors vs. the dead.

25 25 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 25 Segmented Bar Charts A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Here is the segmented bar chart for ticket Class by Survival status:

26 26 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 26 Conditional Distributions (cont.) We see that the distribution of Class for the survivors is different from that of the nonsurvivors. This leads us to believe that Class and Survival are associated, and thus are are not independent of one another. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable.

27 27 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Independence Independence: The distribution of one variable is the same for all categories of another. For dependent variables, there is an association between the two variables.

28 28 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Independence Example Is there an association between gender and interest in Super Bowl TV Coverage? Large difference for men between watching the game and commercials Smaller difference for women There is an association between gender and interest.

29 29 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Practice Create a conditional distribution of areas of study conditional on being female Create a conditional distribution of areas of study conditional on being male Create a segmented bar chart or segmented pie chart of each distribution Do you think these distributions are independent or not? What are some limitations to drawing a conclusion? Slide 3- 29

30 30 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 30 Examples 200 adults shopping at a supermarket were asked about the highest level of education they had completed and whether or not they smoke cigarettes. Results are summarized below. Is there an association between education level and smoking? Think – What is the context? Show - If there is no association between smoking and education level then the conditional distributions should be the same between smokers and non-smokers. Calculate the conditional distribution of education by smoking status Make a graph (e.g. a segmented bar graph) Tell – It seems that smokers tend to have a lower education level than non- smokers. 64% of the smokers had only a high school education, compared to 40% of nonsmokers. And nonsmokers were almost twice a likely as smokers (48% to 26%) to have a completed at least 4 years of college. SmokerNonsmokerTotal High School326193 2-yr college51722 4+ year college137285 Total50150200

31 31 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 31 Example (#32 in text) A survey of autos parked in student and staff lots at a large university classified the brands by country of origin, as seen in the table: a) What % of cars surveyed were foreign? b) What % of the American cars were owned by students? c) What percent of the students owned American cars? d) What is the marginal distribution of origin? e) What are the conditional distributions of origin by driver classification? f) Do you think that the origin of the car is independent of the type of driver? Studentstaff American107105 European3312 Asian5547

32 32 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 32 A quick Intro to Stat Crunch MyLab -> Multimedia Library Select StatCrunch and then hit “FindNow” Then select StatCrunch StatCrunch is also located at http://campusweb.howardcc.edu/statcrunch/http://campusweb.howardcc.edu/statcrunch/ passcode:stats However, the chapter data sets are only available using the access through Course Compass. Find the car data set under Chapter 3 (use the navigation bar on the left) Create a bar chart (bar plot) of car origin Create 2 separate bar charts of car origin grouped by driver Create a pie chart of car origin grouped by driver What observations can you make?

33 33 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Words of Caution Don’t confuse percents of the whole with marginal percents. Don’t leave out marginal percents. Don’t make conclusions based on only a handful of individuals. Don’t make independence conclusions where there is only a small difference.

34 34 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 3- 34 What Can Go Wrong? (cont.) Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. Example: The percentage of the passengers who were in first class and who survived: 203/2201 or 9.4% The percentage of first class passengers who survived: 203/325 or 62.5%

35 35 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Simpson’s Paradox Which pilot had a better on-time flight record? Moe was better overall. Jill was better for both day and night flights. Simpson’s Paradox: One is higher overall while the other is higher in every category. Number of On-Time Flights

36 36 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Learning Objectives Summarize categorical data by counting cases and expressing the results as percents. Create and interpret bar charts, pie charts and contingency tables. Interpret marginal and conditional distributions. Make conclusions about independence and associations from analyzing conditional distributions.


Download ppt "1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data."

Similar presentations


Ads by Google