Presentation is loading. Please wait.

Presentation is loading. Please wait.

Objectives Identify a variable as categorical and choose an appropriate display for it. Use a frequency table to summarize categorical data. Describe and.

Similar presentations


Presentation on theme: "Objectives Identify a variable as categorical and choose an appropriate display for it. Use a frequency table to summarize categorical data. Describe and."— Presentation transcript:

1 Objectives Identify a variable as categorical and choose an appropriate display for it. Use a frequency table to summarize categorical data. Describe and use the area principle. Display the distribution of a categorical variable with a bar chart or pie chart. Summarizing and Displaying Categorical Data

2 Summarizing Categorical Data : Frequency Tables Summarizing Categorical Data Using Frequency Tables After High School Plans Counts Four Year College Two Year College Enlist Join the Workforce Other A frequency table list the categories of a categorical variable and give the count of observations for each category

3 Summarizing Categorical Data : Relative Frequency Tables After High School PlansRelative Frequency Four Year College Two Year College Enlist Join the Workforce Other Total Students In a relative frequency table, we are interested in the fraction or proportion of data in each category, so we divide the counts by the total number of cases.

4 Displaying Categorical Data: The Area Principle Area Principle – The area occupied by part of the should correspond to the magnitude of the value it represents

5 Displaying Categorical Data: Bar Charts

6

7 Bell Work 2/12 Give three conclusions each that can be drawn from the graphs provided. Average time How Other Drivers Irk Us

8 1) The pie chart summarizes the genres of 120 first-run movies released in 2005. Which genre was least common? 3) The pie chart shows the ratings assigned to 120 first-run movies released in 2005 Which was the most common rating? 2) Here is a bar chart summarizing thev2005 movie genres, as seen in the pie chart in above. a)Which genre was most common? b) Is it easier to see that in the pie chart or the bar chart? Explain. Bell Work 1/13

9 Objectives Make and examine a contingency table. Make and examine displays of the conditional distributions. Compare conditional and marginal percentages. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Summarizing Categorical Data : Contingency Tables

10 Contingency Tables We have already looked at how to summarize one categorical variable using a frequency or relative frequency table When we are interested in looking at a possible relationship between two variables we organize data into a two-way table called a contingency table Gender After High School Plans 4 Year College 2 Year College EnlistTotal Female 42511 Male 4127 Total 83718

11 Row, Column, Table Percentages After High School Plans 4 Year College 2 Year College EnlistTotal Female Row % Column % Table % 42511 Male Row % Column % Table % 4127 Total Row % Column % Table % 83718 Gender 18.2% 66.7% 11.1% 45.5% 71.4% 27.8% 14.3% 33.3% 05.6% 28.6% 11.1% 100% 38.9% 16.7% 100% 16.7% 38.9% 100% 38.9% 100% 100% 61.1%

12 Percent of what? What percent of the seniors are female? What percent of the seniors are planning to attend a 4-year college? What percent of the seniors are female and plan to attend a 4-year college? What percent of the female seniors are planning to attend a 4-year college? What percent of seniors planning to attend a 4-year college are female? 11/18 x 100% ≈ 61.1% 8/18 x 100% ≈ 44.4% 4/18 x 100% ≈ 22.2% 4/11 x 100% ≈ 36.4% 4/8 x 100% ≈ 50%

13 Prior to graduation, a high school class was surveyed about its plans. The following table displays the results for white and minority students (the “Minority” group included African-American, Asian, Hispanic, and Native American students): Percent of what? a)What percent of the seniors are white? b) What percent of the seniors are planning to attend a 2-year college? c) What percent of the seniors are white and planning to attend a 2-year college? d) What percent of the white seniors are planning to attend a 2-year college? e) What percent of the seniors planning to attend a 2-year college are white? Seniors WhiteMinorityTotal 4-year college19844242 2-year college36642 Enlist415 Employment14317 Other16319 Total26857325 Plans 268/ 325 x 100% ≈ 82.5% 42/325 x 100% ≈ 12.9% 36/325 x 100% ≈ 11.1% 36/268 x 100% ≈ 13.4% 36/42 x 100% ≈ 85.7%

14 Percent of what? An article in the Winter 2003 issue of Chance magazine reported on the Houston Independent School District’s magnet schools programs. The Chance magazine article examined the impact of an applicant’s ethnicity on the likelihood of admission to the Houston Independent School District’s magnet schools programs. Those data are summarized in the table below: a)What percent of all applicants were Asian? b) What percent of the students accepted were Asian? c) What percent of Asians were accepted? d) What percent of all students were accepted? 292/1755 x 100% ≈ 16.6% 110/931 x 100% ≈ 11.8% 110/292 x 100% ≈ 37.6% 931/1755 x 100% ≈ 53%

15 Conditional Distributions Conditional distributions show the relative frequency distribution of one variable for just those cases that satisfy the condition on another variable Gender After High School Plans 4 Year College 2 Year College EnlistTotal Female4 36.4% 2 18.2% 5 45.5% 11 100% Male4 57.1% 1 14.3% 2 28.6% 7 100% Total8 44.4% 3 16.7% 7 38.9% 18 100% Conditional distributions can be looked at in one of two ways Here we will condition on gender, so we will see how after-high school plans are distributed across the males and females

16 Graphical Interpretation Using Pie Charts In a contingency table, if the distribution of one variable is the same for each category of the other, the variables are said to be independent. Otherwise, we say there is an association between the variables.

17 Conditional Distributions Gender After High School Plans 4 Year College 2 Year College EnlistTotal Male4 50% 2 66.7% 5 71.4% 11 61.1% Female4 50% 1 33.3% 2 28.6% 7 38.9% Total8 100% 3 100% 7 100% 18 100% Conditional distributions can be looked at in one of two ways Here we will condition on after high school plans, so we will look at how gender is distributed across each category of after-high school plans

18 Graphical Interpretation Using Bar Graphs

19 After High School Plans 4 Year College 2 Year College EnlistTotal Female36.4%18.2%45.5%100% Male57.1%14.3%28.6%100% Graphical Interpretation Segmented Bar Charts Segmented bar charts treat each bar as a whole and divide it proportionally into segments corresponding to the percentage in each group.

20 Graphical Interpretation Segmented Bar Charts Segmented bar charts treat each bar as a whole and divide it proportionally into segments corresponding to the percentage in each group.

21 Students in an Intro Statistics Course were asked to describe their politics as “Liberal,” “Moderate,” or “Conservative.” Here are the results Politics LMCTotal Female3536677 Male504421115 Total858027192 Gender a)What percent of the class is male? b) What percent of the class considers themselves to be “Conservative”? c) What percent of the males in the class consider themselves to be “Conservative”? d) What percent of all students in the class are males who consider themselves to be “Conservative”?

22 Politics LMCTotal Female3536677 Male504421115 Total858027192 Conditional Distributions: Checking for independence a) Find the conditional distributions (percentages) of political views for the females. b) Find the conditional distributions (percentages) of political views for the males. c) Make a graphical display that compares the two distributions. d) Do the variables Politics and Sex appear to be independent? Explain. Gender

23 Politics LMCTotal Female Row % 3536677 Politics LMCTotal Male Row % 504421115 a) Find the conditional distributions (percentages) of political views for the females. b) Find the conditional distributions (percentages) of political views for the males. 45% 47% 8%100% 43%38%18% Conditional Distributions: Checking for independence

24 c) Make a graphical display that compares the two distributions. d) Do the variables Politics and Sex appear to be independent? Explain. Conditional Distributions: Checking for independence No, about the same fraction of males and female consider themselves to be “Liberals”, but females are slightly more likely to be “Moderate”. A much higher percentage of male(18%) considered themselves to be “Conservative” than did female (8%).

25 Conditional Distributions Examine the table about ethnicity and acceptance for the Houston Independent School District’s magnet schools program. Does it appear that the admissions decisions are made independent of the applicant’s ethnicity?

26 Admission Decision Accepted Turned Away Wait- listed Total Black / Hispanic93.81%6.19%0%100 % Asian37.67%45.55%16.78%100 % White35.52%37.95%26.53%100% Total53.05%29.86%17.09%100% Conditional Distributions: Admission Decision’s Distribution Across Ethnicity Ethnicity

27 Pie Chart of Distribution of Admission Across Ethnicity

28 Segmented Bar Graph of conditional Distribution of Admission Across Ethnicity

29 Conditional Distribution: Distribution of Ethnicity for each category of Admission Decision Admission Decision Accepted Turned Away Wait- listed Total Black / Hispanic52.09%6.11%0.00%29.46% Asian11.82%25.38%16.33%16.64% White36.09%68.51%83.67%53.90% Total100%

30 Bar Chart of Distribution of Ethnicity Across Admission Decisions

31 Bell Work2/14 For the research examples given below, determine whether the study is an observational study or an experiment. If the study is observational, determine the whether it was prospective or retrospective, and identify the subjects and the variables. If the study is an experiment identify the subjects, treatments, and response variable. 1) Researchers investigating appetite control as a means of losing weight found that female rats ate less and lost weight after injections of the hormone leptin, while male rats responded better to insulin. (Science News, July 20, 2002) 2) Some doctors have expressed concern that men who have vasectomies seemed more likely to develop prostate cancer. Medical researchers used a national cancer registry to identify 923 men who had had prostate cancer and 1224 men of similar ages who had not. Roughly one quarter of the men in each group had undergone a vasectomy, many more than 25 years before the study. The study’s authors concluded that there is strong evidence that having the operation presents no long-term risk for developing prostate cancer. (Science News, July 20, 2002)

32 Objectives Make and examine a contingency table. Make and examine displays of the conditional distributions. Compare conditional and marginal percentages. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Summarizing Categorical Data : Contingency Tables

33 For a period of five years, physicians at McGill University Health Center followed more than 5000 adults over the age of 50. The researchers were investigating whether people taking a certain class of antidepressants (SSRIs) might be at greater risk of bone fractures. Their observations are summarized in the table: Conditional Distributions: Checking for Independence Between Variables Do these results suggest there’s an association between taking SSRI antidepressants and experiencing bone fractures? Explain. Taking SSRINo SSRITotal Experienced Fractures14244258 No Fractures12346274750 Total13748715008

34 Taking SSRINo SSRITotal Experienced Fractures No Fractures Total 10.2% 89.8% 100% 05% 95% 100% 05.2% To determine if there is an association between taking SSRI antidepressants and experiencing bone fractures, lets look at the distribution of experiencing bone fractures or no fractures in the SSRI group and the No SSRI group. 94.8% 100% Taking SSRINo SSRITotal Experienced Fractures14244258 No Fractures12346274750 Total13748715008 Conditional Distributions: Checking for Independence Between Variables

35 Make and examine displays of the conditional distributions. Taking SSRINo SSRITotal Experienced Fractures No Fractures Total 10.2% 89.8% 100% 05% 95% 100% 05.2% 94.8% 100%

36 Make and examine displays of the conditional distributions. Segmented bar graph.

37 Hearing anecdotal reports that some patients undergoing treatment for the eating disorder anorexia seemed to be responding positively to the antidepressant Prozac, medical researchers conducted an experiment to investigate. They found 93 women being treated for anorexia who volunteered to participate. For one year, 49 randomly selected patients were treated with Prozac and the other 44 were given a placebo. At the end of the year, patients were diagnosed as healthy or relapsed, as summarized in the table: ProzacPlaceboTotal Healthy353267 Relapse141226 Total494493 Do these results provide evidence that Prozac might be helpful in treating anorexia? Explain. Conditional Distributions: Checking for Independence Between Variables Diagnosis Treatment

38 Conditional Distributions: Checking for Independence Between Variables ProzacPlaceboTotal Healthy353267 Relapse141226 Total494493 ProzacPlaceboTotal Healthy Relapse Total Lets look at the conditional distribution of diagnoses within each treatment group (The conditional distribution of diagnosis conditioned on taking Prozac or a placebo) 71.4% 72.7% 28.6%27.3% 100% 72% 28%

39 ProzacPlaceboTotal Healthy Relapse Total 71.4% 72.7% 28.6%27.3% 100% 72% 28% Make and examine displays of the conditional distributions.

40

41 ProzacPlaceboTotal Healthy353267 Relapse141226 Total494493 Conditional Distributions: Checking for Independence Between Variables ProzacPlaceboTotal Healthy Relapse Total Lets look at the conditional distribution of treatments for each diagnoses ( The conditional distribution of treatment conditioned on being healthy or experiencing relapse) 52.2% 47.8% 53.8%46.2% 100% 52.7% 47.3% 100%

42 In 2000, the Journal of the American Medical Association (JAMA) published a study that examined pregnancies that resulted in the birth of twins. Births were classified as preterm with intervention (induced labor or cesarean), preterm without procedures, or term/post-term. Researchers also classified the pregnancies by the level of prenatal medical care the mother received (inadequate, adequate, or intensive). The data, from the years 1995–1997, are summarized in the table below. Figures are in thousands of births. Bell work2/19 a)What percent of these mothers received inadequate medical care during their pregnancies? b) What percent of all twin births were preterm? c) Among the mothers who received inadequate medical care, what percent of the twin births were preterm? 63/278 x 100% ≈ 22.7% (76 + 71)/278 x 100% ≈ 52.9% (12 + 13)/63 x 100% ≈ 39.7%

43 Objectives Make and examine a contingency table. Make and examine displays of the conditional distributions. Compare conditional and marginal percentages. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Summarizing Categorical Data : Contingency Tables

44 Medical researchers followed 6272 Swedish men for 30 years to see if there was any association between the amount of fish in their diet and prostate cancer. Their results are summarized in the table. Is there an association between fish consumption and prostate cancer? Prostate Cancer NoYes Never/Seldom 11014 Small Part of Diet 2420201 Moderate Part 2769209 Large Part 50742 Fish Consumption Conditional Distributions: Checking for Independence Between Variables

45 Prostate Cancer NoYesTotal Never/Seldom 11014124 Small Part of Diet 24202012621 Moderate Part 27692092978 Large Part 50742549 Total 58064666272 Fish Consumption Conditional Distributions: Checking for Independence Between Variables Lets look at the conditional distribution of Fish Consumption for each category of Prostate Cancer No Never/Seldom Small Part of Diet Moderate Part Large Part YesTotal Conditioned on No Cancer Conditioned on Cancer Marginal Distribution of Fish Consumption 1.89% 41.7% 47.7% 8.73% 3% 43.1% 44.8% 9% 1.98% 41.8% 47.5% 8.8%

46 Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Segmented Bar Charts of the conditional distributions of Fish Consumption for those diagnosed with prostate cancer and those cancer free. Prostate Cancer NoYesTotal Never/Seldom1.89%3.00%1.98% Small Part41.68%43.13%41.79% Moderate Part47.69%44.85%47.48% Large Part8.73%9.01%8.75% Total100%

47 Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Segmented Bar Charts of the conditional distributions of Fish Consumption for those diagnosed with prostate cancer and those cancer free.

48 Conditional Distributions: Checking for Independence Between Variables Prostate Cancer NoYesTotal Never/Seld om 11014124 Small Part of Diet 24202012621 Moderate Part 27692092978 Large Part 50742549 Total 58064666272 Fish Consumption Lets look at the distribution of Prostate Cancer within each category of fish consumption ( Conditional distribution of prostate cancer conditioned on each category of fish consumption) Prostate Cancer NoYes Never/Seldom Small Part of Diet Moderate Part Large Part Total Conditioned on Never/Seldom Conditioned on Small Part of Diet Conditioned on Moderate Part Conditioned on Large Part Marginal Distribution of Prostate Cancer 88.7%11.3% 92.3%7.7% 93% 7.02% 92.3%7.7% 92.6% 7.43%

49 Prostate Cancer NoYesTotal Never/Seldom88.71%11.29%100% Small Part92.33%7.67%100% Moderate Part92.98%7.02%100% Large Part92.35%7.65%100% Total92.57%7.43%100% Fish Consumption Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Bar Chart of the Conditional Distribution of Prostate Cancer conditioned on each category of Fish Consumption:

50 Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Bar Chart of the Conditional Distribution of Prostate Cancer conditioned on each category of Fish Consumption:

51 Bell Work2/20 Is your birth order related to your choice of major? A Statistics professor at a large university polled his students to find out what their majors were and what position they held in the family birth order. The results are summarized in the table. a)What percent of these students are oldest or only children? b) What percent of Humanities majors are oldest children? c) What percent of oldest children are Humanities students? d) What percent of the students are oldest children majoring in the Humanities? 113/223 x 100% ≈ 50.6% 15/43 x 100% ≈ 34.9% 15/113 x 100% ≈ 13.2% 15/223 x 100% ≈ 6.7%

52 Objectives Make and examine a contingency table. Make and examine displays of the conditional distributions. Compare conditional and marginal percentages. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable. Summarizing Categorical Data : Contingency Tables

53 Class 123CTotal Alive203118178212711 Dead1221675286731490 Total3252857068852201 Survival Conditional Distributions: Checking for Independence Compare conditional and marginal percentages.

54 Class 123CTotal Alive Class 123CTotal Dead Total Class 123CTotal Alive203118178212711 Dead1221675286731490 Total3252857068852201 Conditional Distributions: Checking for Independence Compare conditional and marginal percentages. Survival Conditional distribution of Ticket Class conditioned on surviving Conditional distribution of Ticket Class conditioned on not surviving Marginal distribution of Ticket Class 28.6% 16.6% 25%29.8% 100% 8.2%11.2%35.4%45.2%100% 14.8% 12.9% 32.1% 40.2% 100%

55 Pie charts of the conditional distributions of ticket Class for the survivors and non-survivors: Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable.

56 Class 123CTotal Alive 28.6%16.6%25 %29.8%100% Dead 8.2%11.2%35.4%45.2%100% Total 14.8%12.9%32.1%40.2%100% Segmented Bar Charts of the conditional distributions of ticket Class for the survivors and non-survivors: Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable.

57 Segmented Bar Charts of the conditional distributions of ticket Class for the survivors and non-survivors

58 Class 123CTotal Alive203118178212711 Dead1221675286731490 Total3252857068852201 Conditional Distributions: Checking for Independence Survival 1 Alive Dead Total 2 Alive Dead Total 3 Alive Dead Total C Alive Dead Total Alive Dead Total Conditional Distribution of Survival For Each Ticket Class Marginal Distribution of Survival 62.5% 37.5% 100% 41.4% 58.6% 100% 25.2% 74.8% 100% 24% 76% 100% 32.3% 67.7% 100%

59 Make and examine displays of the conditional distributions. Segmented bar graph. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable.

60

61 Conditional Distributions: Checking for Independence In 2000, the Journal of the American Medical Association (JAMA) published a study that examined pregnancies that resulted in the birth of twins. Births were classified as preterm with intervention (induced labor or cesarean), preterm without procedures, or term/post-term. Researchers also classified the pregnancies by the level of prenatal medical care the mother received (inadequate, adequate, or intensive). The data, from the years 1995–1997, are summarized in the table below. Figures are in thousands of births. Preterm (Induced or cesarean) Preterm (without Procedures) Term or post-term Total Intensive18152861 Adequate464365154 Inadequate12133863 Total7671131278 a)Find the conditional distribution of the outcome of these pregnancies conditioned on level of care. Create an appropriate graph of the conditional distributions. b) Write a few sentences describing the association between these two variables.

62 Conditional Distributions: Checking for Independence Preterm (Induced or cesarean) Preterm (without Procedures) Term or post- term Total Intensive18152861 Adequate464365154 Inadequate12133863 Total7671131278 Find the conditional distribution of the outcome of these pregnancies conditioned on level of care. Create an appropriate graph of the conditional distributions. Preterm (Induced or cesarean) Preterm (without Procedures) Term or post- term Intensive Adequate Inadequate Total Conditioned on Intensive level of care Conditioned on Adequate level of care Conditioned on Inadequate level of care Marginal distribution of birth outcomes 29.5%24.6% 45.9% 29.9%27.9%42.2% 19.05%20.6% 60.3% 27.3%25.5% 47.1%

63 Preterm (Induced or Cesarean) Preterm (Without Procedures) Term or post-term Intensive29.51%24.59%45.90% Adequate29.87%27.92%42.21% Inadequate19.05%20.63%60.32% Total27.34%25.54%47.12% Make and examine displays of the conditional distributions. Segmented bar graph. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable

64

65 A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level Conditional Distributions: Checking for Independence Age Under 30 30 - 49Over 50 Total Low27373195 Normal489193232 High235173147 Total98179197474 Blood Pressure

66 Conditional Distributions: Checking for Independence Find the conditional distribution of blood pressure level within each age group. Age Under 30 30 - 49Over 50 Total Low27373195 Normal489193232 High235173147 Total98179197474 Blood Pressure Age Under 30 30 - 49Over 50 Low Normal High Total Blood Pressure Under 30 30 - 49Over 50 Total Find the marginal distribution of blood pressure levels. 20.6%37.8%41.6% 27.6% 49% 23.5% 100% 20.7% 50.8% 28.5% 100% 15.7% 47.2% 37% 100%

67 Make and examine displays of the conditional distributions. Segmented bar graph of conditional distribution of blood pressure levels within each age group. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable Age Under 3030 - 49Over 50 Low27.55%20.67%15.74% Normal48.98%50.84%47.21% High23.47%28.49%37.06% Total100% Blood Pressure

68 Make and examine displays of the conditional distributions. Segmented bar graph of conditional distribution of blood pressure levels within each age group. Describe any patterns, anomalies, or extraordinary features revealed by the display of a variable

69 To determine if people’s preference in dogs had changed in the recent years, organizers of a local dog show asked people who attended the show to indicate which breed was their favorite. This information was compiled by dog breed and gender of the people who responded. The table summarizes the responses. Conditional Distributions: Checking for Independence

70 FemaleMaleTotal Yorkshire Terrier7359132 Dachshund494796 Golden Retriever583391 Labrador374178 Dalmatian452873 Other Breeds8667153 Total348275623 Conditional Distributions: Checking for Independence Female Yorkshire Terrier Dachshund Golden Retriever Labrador Dalmatian Other Breeds Total Male Total Conditional distribution of preferred dog breeds among women Conditional distribution of preferred dog breeds among men Marginal distribution of preferred dog breeds 20.98% 14.1% 16.7% 10.6% 12.9% 24.7% 100% 21.5% 17.1% 12% 14.9% 10.2% 24.4% 100% 21.2% 15.4% 14.6% 12.5% 11.7% 24.6% 100%

71 FemaleMaleTotal Yorkshire Terrier7359132 Dachshund494796 Golden Retriever583391 Labrador374178 Dalmatian452873 Other Breeds8667153 Total348275623 Do you think the breed selection is independent of gender? Give statistical evidence to support your conclusion. Conditional Distributions: Checking for Independence Conditional distribution of Gender for each category of preferred dog breed FemaleMale Yorkshire Terrier Dachshund Golden Retriever Labrador Dalmatian Total Marginal Distribution of Gender 55.3% 44.7% 51%48.9% 63.7%32.6% 47.4% 52.6% 61.6%38.3% 56.2%43.7% Other Breeds 44.1% 55.9%


Download ppt "Objectives Identify a variable as categorical and choose an appropriate display for it. Use a frequency table to summarize categorical data. Describe and."

Similar presentations


Ads by Google