Presentation is loading. Please wait.

Presentation is loading. Please wait.

9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”

Similar presentations


Presentation on theme: "9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”"— Presentation transcript:

1 9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key” characteristics included in the report depend upon the context of the research, but often include demographics like sex, age, race, education, etc. When the characteristic is measured by a qualitative variable, we summarize the data with frequency distributions and contingency tables.

2 9/23/2015Slide 2 The frequency distribution provides a profile of an individual categorical variable: the number and percent that fall in each category. In SPSS, a frequency distribution looks like the following: The categories of the variable are listed in the first column. I prefer to list both code numbers and value labels. The counts are listed in the second column, but counts communicate less information that percentages because we don’t know how to translate the number in the count to the population from which our sample was drawn. The third and fourth columns both show percents. The third column includes the missing data in the total. The fourth column shows percentages when missing cases are excluded. We generally report the percentage from the fourth column.

3 9/23/2015Slide 3 The percentage in a category is often reported as the proportion of cases in that category. The percentage is also the probability that an individual subject fell in a specific category. Based on the number or percentage, we can describe one category as the most likely (the modal category). Based on the numbers or percentages across categories, we can describe one category as more probable or more likely than another.

4 9/23/2015Slide 4 Based on the numbers or percentages across categories, we can also describe the odds of falling in one category or another, e.g. subjects were about 1.4 times more likely to be religiously moderate rather than fundamentalist. The odds are computed by dividing the number in one category (e.g. 101) by the number in the second category (e.g. 71). The odds of being in the first category rather than the second is 1.4 to 1.

5 9/23/2015Slide 5 Odds are generally expressed as some number to 1, and the “to 1” is not stated. The odds of being in the second category rather than the first are: 0.70 to 1 (71 ÷ 101). Note that the odds in opposite directions are reciprocals, i.e. 1 ÷ 1.4 =.70 and 1 ÷.70 = 1.4. If the odds are greater than 1, we characterize it as “more likely.” If the odds are less than 1, we characterize it as “less likely.” If the odds are very close to 1, we characterize it as “equally likely.”

6 9/23/2015Slide 6 It is easier to communicate odds that are more likely rather than less likely, so we should change the direction of the odds when we have a result that is less than one. While odds may seem out of place in a context other than betting on horse races, it is a construct of increasing importance in social and behavioral research because it gives us a strategy for predicting which category a subject is likely to belong to, using techniques like logistic regression (which we cover in spring).

7 9/23/2015Slide 7 The introductory statement in the question indicates: The data set to use (GSS200R) The statistic to use (Frequency distribution) The variable to analyze (degree)

8 9/23/2015Slide 8 The first statement for us to evaluate concerns the number of valid and missing cases. To answer this question, we produce the frequency distribution in SPSS.

9 To compute a frequency distribution in SPSS, select the Descriptive Statistics > Frequencies command from the Analyze menu. 9/23/2015Slide 9

10 First, in the Frequencies dialog box, scroll down the list of variables and click on degree. Second, click on the arrow button to move degree to the list box for variables. 9/23/2015Slide 10 The OK button is inactive because we have not yet provided enough information to compute the statistic.

11 Though we don’t need it to solve the problem, we will request a bar chart to display the variable. We can use the bar chart to confirm the statistical facts that we are including in our description. For example, it we describe one category as being the most likely, it should have the tallest bar in the chart. If we say that one category is twice as likely as another, its bar should look twice as tall as the other. 9/23/2015Slide 11

12 First, click on the option button for Bar charts. Second, click on the Continue button to close the dialog box. 9/23/2015Slide 12 We have a choice of plotting frequency counts or percentages. Which this affects the numeric values of the chart, the size of the bars will be exactly the same in either option.

13 Click on the OK button to obtain the output. When one or more variables are included in the Variable(s) list box, the OK button becomes active. 9/23/2015Slide 13

14 More respondents had a Bachelor degree than a Junior College degree. Clearly, the most frequent response was a High School degree. 9/23/2015Slide 14 Before we start answering the questions in the problem, we will look at the statements that are supported by the bar chart. It looks like respondents were about twice as likely to have a Bachelor degree than a Junior College degree. We will check the numeric results to make more precise statements.

15 The output for the Frequencies command consists of two tables. The second table shows the count and percents for each of the values of the variable. 9/23/2015Slide 15 The first table lists the statistics for N, the number of cases. The number of valid cases and the number of cases with missing data are included. Missing cases can be either “System” missing (the data was not entered into SPSS) or “user-defined” missing data (the researcher designated certain codes as representing missing data, e.g. 9 for NA.

16 The information about the number of cases missing data appears in two places in the output. First, in the statistics table, the total number of cases with Missing data is listed as 3. Second, the number and codes for missing data are listed at the bottom of the frequency table. If there were any “System” missing data, it would also be listed here. 9/23/2015Slide 16 If there were no missing data, the Missing number would be 0 and there would be no entry for Missing in the frequency table.

17 The statement that there were 248 cases available for the analysis; 22 cases were missing data is not correct. The frequency table in the SPSS output showed the total number of valid cases to be 267 and that there was a total of 3 cases in multiple categories of missing data. The check box is not marked. 9/23/2015Slide 17

18 The next statement indicates the frequency count for the number of subjects with a junior college degree. 9/23/2015Slide 18

19 The third row of the frequency table shows the number of subjects with a junior college degree to be 19. 9/23/2015Slide 19

20 The statement that in this sample, there were 19 survey respondents who had a junior college degree is correct. In the frequency table, the count of cases in the 'Frequency' column for JUNIOR COLLEGE was 19. The statement is correct, so the check box is marked. 9/23/2015Slide 20

21 The next statement indicates the proportion or percentage of respondents who had not graduated from high school. 9/23/2015Slide 21

22 The percent of cases in the first row of the table for less than high school under the Valid Percent column is 16.9%. Remember that we use the Valid Percent column and not the Percent column. 9/23/2015Slide 22

23 The check box for this statement is marked. The statement that in this sample, the proportion of survey respondents who had not graduated from high school was 16.9% is correct. In the frequency table, the percent of cases in the 'Valid Percent' column for LT HIGH SCHOOL was 16.9 9/23/2015Slide 23

24 The next statement in the problem indicates the probability that a subject in this sample was in the category that graduated from high school. We convert the valid percent from the frequency table to a probability by moving the decimal point two places to the left, in effect, dividing by 100. 9/23/2015Slide 24

25 The valid percent for the High School category is 54.7. Moving the decimal two place to the left results in a probability of 0.547. We could have produced the same result by dividing by 100: 54.7 ÷ 100 = 0.547. 9/23/2015Slide 25

26 The statement that in this sample, the probability that a survey respondent had graduated from high school was 0.541 is not correct. In the frequency table, the percent of cases in the 'Valid Percent' column for HIGH SCHOOL was 54.7, not 54.1. The probability is computed by dividing this percent by 100 (54.7÷100=0.547. Note: 0.541 was computed by incorrectly using the entry in the Percent (54.1) column instead of the Valid Percent column (54.7) The check box for the statement is not marked. 9/23/2015Slide 26

27 The next statement in the problem requires us to identify the category with the largest number or percentage. The largest category is also identified as the mode of the distribution. The mode is the preferred measure of central tendency for nominal level variables. 9/23/2015Slide 27

28 The category with the largest number of cases, and correspondingly, the largest percentage of cases is High School. 9/23/2015Slide 28

29 Note: it is possible for a distribution to have more than one mode, i.e. several categories can have the same number of cases that is larger than the number for all other categories. The distribution would be called bi-modal or multi-modal and the check box would not be marked because the statement ignores an important fact about the distribution. The statement that Survey respondents were most likely to have graduated from high school is correct. The category HIGH SCHOOL had the largest percentage of cases (54.7%), making it the modal category. The check box for the statement is marked. 9/23/2015Slide 29

30 The final question asks us to compute the odds of being in one category (graduated from high school) versus another (graduated from junior college). The odds are the ratio of the two percentages or numbers. We interpret odds as the likelihood of being in one category rather than the other. Since there are more than two categories for this variable, the problem also identifies that only 165 cases are used to compute the odds instead of the entire 267 cases. 9/23/2015Slide 30

31 First, we can sum the number of cases in the high school and junior college categories to make certain the number of cases in the subset is stated correctly: 146 + 19 = 165 9/23/2015Slide 31

32 The number of cases in the subset is correctly stated to be 165. Before we can mark the check box, we must make certain that the odds are correctly stated as about seven and two-thirds. 9/23/2015Slide 32

33 We can use Excel to compute the ratio of the number of cases in the two groups. The numerator for the ratio is the number for the group that is mentioned first, i.e. high school. The number for the group mentioned second is the denominator. 9/23/2015Slide 33

34 In Excel, I enter the formula =146/19 in cell A1 and press the enter key. The result appears in cell A1. With cell A1 selected, the formula appears in the formula bar, so I can double check that I entered the correct numbers. You could, of course, have used the computer calculator or a hand calculator to do the arithmetic. 9/23/2015Slide 34

35 The difference between the two answers is due to the fact the percents have been rounded (i.e. 54.7 and 7.1). Dividing the frequency numbers avoids the problem with rounding and provides a more precise answer. I could have used the percents from the Valid Percent column and gotten a similar answer. 9/23/2015Slide 35

36 The ratio which we computed was 7.68, which is very close to seven and two-thirds (7.67). The number of cases in the subset (165) used for the analysis is correct. The statement that for the subset of 165 cases who had graduated from high school or had a junior college degree, survey respondents were about seven and two thirds times more likely to have graduated from high school than to have a junior college degree is correct. The odds are computed by dividing the 'Frequency' for HIGH SCHOOL by the 'Frequency' for JUNIOR COLLEGE (146÷19=7.68). Since both the number of cases in the subset and the odds are correct, we mark the check box for the statement. 9/23/2015Slide 36

37 9/23/2015Slide 37 The homework problems translate some of the decimal fractions for odds and odds ratios from numbers to text. The following table shows the translations used. If the odds are: Homework problems will describe the likelihood as: Examples: 0.95 through 1.05about equally likely 0.95,0.96,0.97,0.98,0.99,1.00,1.01,1.02,1.03,1.04,1.05 1.95 through 2.05about twice as likely 1.95,1.96,1.97,1.98,1.99,2.00,2.01,2.02,2.03,2.04,2.05 2.95 through 3.05about three times as likely 2.95,2.96,2.97,2.98,2.99,3.00,3.01,3.02,3.03,3.04,3.05 3.95 through 4.05about four times as likely 3.95,3.96,3.97,3.98,3.99,4.00,4.01,4.02,4.03,4.04,4.05 4.95 through 5.05about five times as likely 4.95,4.96,4.97,4.98,4.99,5.00,5.01,5.02,5.03,5.04,5.05 5.95 through 6.05about six times as likely 5.95,5.96,5.97,5.98,5.99,6.00,6.01,6.02,6.03,6.04,6.05 6.95 through 7.05about seven times as likely 6.95,6.96,6.97,6.98,6.99,7.00,7.01,7.02,7.03,7.04,7.05 7.95 through 8.05about eight times as likely 7.95,7.96,7.97,7.98,7.99,8.00,8.01,8.02,8.03,8.04,8.05 8.95 through 9.05about nine times as likely 8.95,8.96,8.97,8.98,8.99,9.00,9.01,9.02,9.03,9.04,9.05 9.05 through 10.05about ten times as likely 9.95,9.96,9.97,9.98,9.99,10.00,10.01,10.02,10.03,10.04,10.05 and so on…

38 9/23/2015Slide 38 If the decimal fraction for the odds is: Homework problems will describe the likelihood as: Examples: 0.20 through 0.30 and a quarter times more likely 3.21 three and a quarter times more likely Greater than 0.30 and less than 0.37 and a third times more likely3.36 three and a third times more likely 0.45 through 0.55and a half times more likely3.49 three and a half times more likely Greater than 0.63 and less than 0.70 and two thirds times more likely 3.69 three and two thirds times more likely 0.70 through 0.80 and three quarter times more likely 3.70 three and three quarters times more likely otherwise reported as a number rounded to one decimal place 3.423.4 times more likely The homework problems translate some of the decimal fractions for odds and odds ratios from numbers to text. The following table shows the translations used.

39 To save the answers we have marked for the question, click on the Save button. 9/23/2015Slide 39 We have now evaluated all of the questions for this problem.

40 When we have finished all of the questions, we click on the Submit at the bottom of the assignment. 9/23/2015Slide 40

41 After BlackBoard grades the assignment, it will give you an option to review the results. For this problem, we received the full 10 points because we marked all of the correct answers and did not mark any of the incorrect answers. Note: this version of BlackBoard does not give partial credit. 9/23/2015Slide 41

42 The feedback after the graded answer explains what the correct answer should have been. 9/23/2015Slide 42


Download ppt "9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”"

Similar presentations


Ads by Google