Download presentation
Presentation is loading. Please wait.
Published byGustav Gregersen Modified over 6 years ago
1
Lecture 1. Introduction Outlines for Today 1.Types of Variables
2. Categorical Data 3. SA3202 Data Sets 4. Categorical Data Analysis 1/18/2019 SA3202, Lecture 1
2
a). Binary Variable: take only two possible values
Types of Variables: a). Binary Variable: take only two possible values e.g. Sex e.g. Marital Status b). Nominal Variable: take several unordered values e.g. Nationality e.g. Race c). Ordinal Variable: take several ordered values e.g. Grade e.g. Social class e.g. Political view 1/18/2019 SA3202, Lecture 1
3
d). Discrete Variable: take a countable number of possible values
e.g. # of students in a class e.g. # of fishes in a pond e). Continuous Variable: take any possible value in a interval e.g. Height e.g. Income Categorical Variable: take a finite number of possible values, including types of a)-c), and possibly d). The possible values of a categorical variable is referred to as its categories or levels. e.g. Type of blood e.g. Grade of a class 1/18/2019 SA3202, Lecture 1
4
Contingency Table: A table presenting the categorical data
Categorical Data: data collected based on one or several categorical variables. Other terms: count data, frequency data,discrete data, qualitative data, cross-classified data. Contingency Table: A table presenting the categorical data Cells-----correspond to different combinations of the categories (levels) Entry in a cell----- frequency of the cell Dimension # of the categorical variables e.g One-way Table e.g Two-way Table 1/18/2019 SA3202, Lecture 1
5
SA Data Sets 1. Random Number Data The following table shows the frequency of each digit when 100 “random digits” were generated on a pocket calculator: Digit Total Frequency 2. Suicide Data The following table shows the classification of suicides in France by day of the week. Based on these data, Durkheim (1897) concludes that suicide diminishes at the end of the week, beginning on Friday. He also notes that the suicide rate is not lower on Sunday than on Saturday. Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total # of Suicides 3. Homicide Data The following table shows the monthly distribution of homicides in the USA in 1970. Month Jan. Feb. Mar. Apr. May Jun. Jul. Aug Sept. Oct. Nov. Dec. Total # of Homicides 1/18/2019 SA3202, Lecture 1
6
4. Political Views Data The following table shows the political views of 1397 Americans in 1975.
Response Code Frequency Extremely Liberal Liberal Slightly Liberal Moderate Slightly Conservative Conservative Extremely Conservative Total 5. Number of Boys Data The following table shows the number of boys among the first 4 children in 3343 Swedish families of size 4 or more. Number of boys Total Frequency 6. World Cup Data The following table shows the number of goals scored per team per game, for the 32 matches played in the 1996 Football World Cup. Number of Goals Total Frequency 1/18/2019 SA3202, Lecture 1
7
7. Vitamin C Data The following table is based on 1961 French study regarding the therapeutic value of ascorbic (Vitamin C). The study was double blind, with one group of 140 skiers receiving a placebo while a second group of 139 received 1 gram of ascorbic acid per day. Of interest is the relative occurrence of colds for the two groups Placebo Vitamin C Cold Not Cold Total 8. Seal Belt Data The following table is based on the records of accidents in Florida, USA, in 1988. Safety Equipment Injury Seal Belt None Total Fatal Nonfatal Total 9. Death Penalty Data The following table is based on a study concerning the effects of racial characteristics on whether individuals convicted of homicide receive the death penalty . It shows that the defendant’s race (white, black) and the verdict (death penalty, no death penalty) in 326 cases of homicide in Florida, USA during Defendant’s Race Death Penalty White Black Total Yes No Total 1/18/2019 SA3202, Lecture 1
8
10. University Admission Data The following table shows admission results for the six largest graduate departments at the University of California at Berkeley, for the fall 1973 session. Applicant’s Gender Whether Admitted Male Female Yes No Total 11. Smoking and Lung Cancer Data The following table is based on a retrospective study of lung cancer and tobaco smoking among patients in hospitals in serveral English cities. The table compares male lung cancer patients with control patients having other diseases, according to the average number of cigarettes smoked daily over a ten-year period preceding the onsets of the disease Daily Average # of Cigarettes Lung Cancer Patients Control Patients None < 12. Smoking Habit Data The following table is from a study concerning smoking habits of high school students in Arizona, USA Student Smokes Students Not Smoke Both parents smoke One parent smokes Neither parent smokes 1/18/2019 SA3202, Lecture 1
9
13. Income and Job Satisfaction Data The following table is taken from the 1984 General Social Survey of the National Data program in the US. The variables are income and job satisfaction. Income has four levels: <$6000, between $6000 and $15000, between $15000 and $25000, and over $ Job satisfaction g=has four levels: very dissatisfied (VD), little dissatisfied (LD), moderately satisfied (MS), and very satisfied (VS): Job Satisfaction Income VD LD MS VS < > British Social Mobility Data The following table relates father’s and son’s occupational status for a sample of 3500 British father-son pairs. Son’s Status Father’s Status 1/18/2019 SA3202, Lecture 1
10
15. Danish Social Mobility Data The following table presents data on intergenerational mobility in Denmark, similar to the British Social Mobility Data: Son’s Status Father’s Status The “Complete” Death Penalty Data. The following table gives the victim’s race (black, white), as well as the defendant’s race and the verdict, for the Death Penalty Data presented earlier. Defendant’s Race Victim’s Race Death Penalty Not Death Penalty White White Black Black White Black The “Complete” University Admission Data. The following table gives the admission decisions for each of the six largest graduate departments for the University Admission Data. Male ( Whether Admitted) Female Department Yes No Yes No A B C D E F Total 1/18/2019 SA3202, Lecture 1
11
Method 1: Pearson’s Goodness of Fit Test The test statistic is
Categorical Data Analysis: the analysis of the categorical data, usually referred to fitting a statistical model to the data: first postulate a model for the underlying population via formulating a statistical hypothesis, and then test whether or not the model fits the data. This is also known as the hypothesis test of contingency table: compare the observed frequencies with their “expected frequencies” (the frequencies expected under the model) to see how close they are. Method 1: Pearson’s Goodness of Fit Test The test statistic is Degrees of freedom: 1/18/2019 SA3202, Lecture 1
12
Method 2: Wilk’s Likelihood Ratio Test The test statistic is
Degrees of freedom: Method 1 and Method 2 are asymptotically equivalent. 1/18/2019 SA3202, Lecture 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.