9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”

Slides:



Advertisements
Similar presentations
Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined.
Advertisements

Central Tendency- Nominal Variable (1)
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Outliers Split-sample Validation
Detecting univariate outliers Detecting multivariate outliers
A Simple Guide to Using SPSS© for Windows
Chi-square Test of Independence
Outliers Split-sample Validation
Additional HW Exercise 9.1 (a) A state government official is interested in the prevalence of color blindness among drivers in the state. In a random sample.
Multiple Regression – Basic Relationships
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
LEVEL OF MEASUREMENT Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Quantifying Data.
Problem 1: Relationship between Two Variables-1 (1)
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Stepwise Binary Logistic Regression
Measures of Central Tendency
Sampling Distribution of the Mean Problem - 1
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Practice Problems Homework Problems Using Scripts.
9/18/2015Slide 1 The homework problems on comparing central tendency and variability extend the focus central tendency and variability to a comparison.
Hierarchical Binary Logistic Regression
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
110/10/2015Slide 1 The homework problems on comparing central tendency and variability extend our focus on central tendency and variability to a comparison.
In many reports, there is usually a section that describes the demographics of the sample or population of subjects or clients who are included in the.
Demographic Profiles of Agency Clients - Part 2 Next, we will create a table and a column chart for the conservator field in my database. Because we are.
SW318 Social Work Statistics Slide 1 Compare Central Tendency & Variability Group comparison of central tendency? Measurement Level? Badly Skewed? MedianMeanMedian.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.
As shown in Table 1, the groups differed in terms of language skills and the type of job last held. The intake form asked the client to indicate languages.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
SW318 Social Work Statistics Slide 1 Frequency: Nominal Variable Practice Problem This question asks the frequency of widowed respondents of the survey.
Level of Measurement Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
SW318 Social Work Statistics Slide 1 Measure of Variability: Range (1) This question asks about the range, or minimum and maximum values of the variable.
Chi-square Test of Independence
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Practice Problem: Lambda (1)
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
1/23/2016Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
The frequency distribution
SW388R7 Data Analysis & Computers II Slide 1 Solving Homework Problems in SPSS The data sets Options for variable lists in statistical procedures Options.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
LEVEL OF MEASUREMENT Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
Multiple Regression – Split Sample Validation
Presentation transcript:

9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key” characteristics included in the report depend upon the context of the research, but often include demographics like sex, age, race, education, etc. When the characteristic is measured by a qualitative variable, we summarize the data with frequency distributions and contingency tables.

9/23/2015Slide 2 The frequency distribution provides a profile of an individual categorical variable: the number and percent that fall in each category. In SPSS, a frequency distribution looks like the following: The categories of the variable are listed in the first column. I prefer to list both code numbers and value labels. The counts are listed in the second column, but counts communicate less information that percentages because we don’t know how to translate the number in the count to the population from which our sample was drawn. The third and fourth columns both show percents. The third column includes the missing data in the total. The fourth column shows percentages when missing cases are excluded. We generally report the percentage from the fourth column.

9/23/2015Slide 3 The percentage in a category is often reported as the proportion of cases in that category. The percentage is also the probability that an individual subject fell in a specific category. Based on the number or percentage, we can describe one category as the most likely (the modal category). Based on the numbers or percentages across categories, we can describe one category as more probable or more likely than another.

9/23/2015Slide 4 Based on the numbers or percentages across categories, we can also describe the odds of falling in one category or another, e.g. subjects were about 1.4 times more likely to be religiously moderate rather than fundamentalist. The odds are computed by dividing the number in one category (e.g. 101) by the number in the second category (e.g. 71). The odds of being in the first category rather than the second is 1.4 to 1.

9/23/2015Slide 5 Odds are generally expressed as some number to 1, and the “to 1” is not stated. The odds of being in the second category rather than the first are: 0.70 to 1 (71 ÷ 101). Note that the odds in opposite directions are reciprocals, i.e. 1 ÷ 1.4 =.70 and 1 ÷.70 = 1.4. If the odds are greater than 1, we characterize it as “more likely.” If the odds are less than 1, we characterize it as “less likely.” If the odds are very close to 1, we characterize it as “equally likely.”

9/23/2015Slide 6 It is easier to communicate odds that are more likely rather than less likely, so we should change the direction of the odds when we have a result that is less than one. While odds may seem out of place in a context other than betting on horse races, it is a construct of increasing importance in social and behavioral research because it gives us a strategy for predicting which category a subject is likely to belong to, using techniques like logistic regression (which we cover in spring).

9/23/2015Slide 7 The introductory statement in the question indicates: The data set to use (GSS200R) The statistic to use (Frequency distribution) The variable to analyze (degree)

9/23/2015Slide 8 The first statement for us to evaluate concerns the number of valid and missing cases. To answer this question, we produce the frequency distribution in SPSS.

To compute a frequency distribution in SPSS, select the Descriptive Statistics > Frequencies command from the Analyze menu. 9/23/2015Slide 9

First, in the Frequencies dialog box, scroll down the list of variables and click on degree. Second, click on the arrow button to move degree to the list box for variables. 9/23/2015Slide 10 The OK button is inactive because we have not yet provided enough information to compute the statistic.

Though we don’t need it to solve the problem, we will request a bar chart to display the variable. We can use the bar chart to confirm the statistical facts that we are including in our description. For example, it we describe one category as being the most likely, it should have the tallest bar in the chart. If we say that one category is twice as likely as another, its bar should look twice as tall as the other. 9/23/2015Slide 11

First, click on the option button for Bar charts. Second, click on the Continue button to close the dialog box. 9/23/2015Slide 12 We have a choice of plotting frequency counts or percentages. Which this affects the numeric values of the chart, the size of the bars will be exactly the same in either option.

Click on the OK button to obtain the output. When one or more variables are included in the Variable(s) list box, the OK button becomes active. 9/23/2015Slide 13

More respondents had a Bachelor degree than a Junior College degree. Clearly, the most frequent response was a High School degree. 9/23/2015Slide 14 Before we start answering the questions in the problem, we will look at the statements that are supported by the bar chart. It looks like respondents were about twice as likely to have a Bachelor degree than a Junior College degree. We will check the numeric results to make more precise statements.

The output for the Frequencies command consists of two tables. The second table shows the count and percents for each of the values of the variable. 9/23/2015Slide 15 The first table lists the statistics for N, the number of cases. The number of valid cases and the number of cases with missing data are included. Missing cases can be either “System” missing (the data was not entered into SPSS) or “user-defined” missing data (the researcher designated certain codes as representing missing data, e.g. 9 for NA.

The information about the number of cases missing data appears in two places in the output. First, in the statistics table, the total number of cases with Missing data is listed as 3. Second, the number and codes for missing data are listed at the bottom of the frequency table. If there were any “System” missing data, it would also be listed here. 9/23/2015Slide 16 If there were no missing data, the Missing number would be 0 and there would be no entry for Missing in the frequency table.

The statement that there were 248 cases available for the analysis; 22 cases were missing data is not correct. The frequency table in the SPSS output showed the total number of valid cases to be 267 and that there was a total of 3 cases in multiple categories of missing data. The check box is not marked. 9/23/2015Slide 17

The next statement indicates the frequency count for the number of subjects with a junior college degree. 9/23/2015Slide 18

The third row of the frequency table shows the number of subjects with a junior college degree to be 19. 9/23/2015Slide 19

The statement that in this sample, there were 19 survey respondents who had a junior college degree is correct. In the frequency table, the count of cases in the 'Frequency' column for JUNIOR COLLEGE was 19. The statement is correct, so the check box is marked. 9/23/2015Slide 20

The next statement indicates the proportion or percentage of respondents who had not graduated from high school. 9/23/2015Slide 21

The percent of cases in the first row of the table for less than high school under the Valid Percent column is 16.9%. Remember that we use the Valid Percent column and not the Percent column. 9/23/2015Slide 22

The check box for this statement is marked. The statement that in this sample, the proportion of survey respondents who had not graduated from high school was 16.9% is correct. In the frequency table, the percent of cases in the 'Valid Percent' column for LT HIGH SCHOOL was /23/2015Slide 23

The next statement in the problem indicates the probability that a subject in this sample was in the category that graduated from high school. We convert the valid percent from the frequency table to a probability by moving the decimal point two places to the left, in effect, dividing by /23/2015Slide 24

The valid percent for the High School category is Moving the decimal two place to the left results in a probability of We could have produced the same result by dividing by 100: 54.7 ÷ 100 = /23/2015Slide 25

The statement that in this sample, the probability that a survey respondent had graduated from high school was is not correct. In the frequency table, the percent of cases in the 'Valid Percent' column for HIGH SCHOOL was 54.7, not The probability is computed by dividing this percent by 100 (54.7÷100= Note: was computed by incorrectly using the entry in the Percent (54.1) column instead of the Valid Percent column (54.7) The check box for the statement is not marked. 9/23/2015Slide 26

The next statement in the problem requires us to identify the category with the largest number or percentage. The largest category is also identified as the mode of the distribution. The mode is the preferred measure of central tendency for nominal level variables. 9/23/2015Slide 27

The category with the largest number of cases, and correspondingly, the largest percentage of cases is High School. 9/23/2015Slide 28

Note: it is possible for a distribution to have more than one mode, i.e. several categories can have the same number of cases that is larger than the number for all other categories. The distribution would be called bi-modal or multi-modal and the check box would not be marked because the statement ignores an important fact about the distribution. The statement that Survey respondents were most likely to have graduated from high school is correct. The category HIGH SCHOOL had the largest percentage of cases (54.7%), making it the modal category. The check box for the statement is marked. 9/23/2015Slide 29

The final question asks us to compute the odds of being in one category (graduated from high school) versus another (graduated from junior college). The odds are the ratio of the two percentages or numbers. We interpret odds as the likelihood of being in one category rather than the other. Since there are more than two categories for this variable, the problem also identifies that only 165 cases are used to compute the odds instead of the entire 267 cases. 9/23/2015Slide 30

First, we can sum the number of cases in the high school and junior college categories to make certain the number of cases in the subset is stated correctly: = 165 9/23/2015Slide 31

The number of cases in the subset is correctly stated to be 165. Before we can mark the check box, we must make certain that the odds are correctly stated as about seven and two-thirds. 9/23/2015Slide 32

We can use Excel to compute the ratio of the number of cases in the two groups. The numerator for the ratio is the number for the group that is mentioned first, i.e. high school. The number for the group mentioned second is the denominator. 9/23/2015Slide 33

In Excel, I enter the formula =146/19 in cell A1 and press the enter key. The result appears in cell A1. With cell A1 selected, the formula appears in the formula bar, so I can double check that I entered the correct numbers. You could, of course, have used the computer calculator or a hand calculator to do the arithmetic. 9/23/2015Slide 34

The difference between the two answers is due to the fact the percents have been rounded (i.e and 7.1). Dividing the frequency numbers avoids the problem with rounding and provides a more precise answer. I could have used the percents from the Valid Percent column and gotten a similar answer. 9/23/2015Slide 35

The ratio which we computed was 7.68, which is very close to seven and two-thirds (7.67). The number of cases in the subset (165) used for the analysis is correct. The statement that for the subset of 165 cases who had graduated from high school or had a junior college degree, survey respondents were about seven and two thirds times more likely to have graduated from high school than to have a junior college degree is correct. The odds are computed by dividing the 'Frequency' for HIGH SCHOOL by the 'Frequency' for JUNIOR COLLEGE (146÷19=7.68). Since both the number of cases in the subset and the odds are correct, we mark the check box for the statement. 9/23/2015Slide 36

9/23/2015Slide 37 The homework problems translate some of the decimal fractions for odds and odds ratios from numbers to text. The following table shows the translations used. If the odds are: Homework problems will describe the likelihood as: Examples: 0.95 through 1.05about equally likely 0.95,0.96,0.97,0.98,0.99,1.00,1.01,1.02,1.03,1.04, through 2.05about twice as likely 1.95,1.96,1.97,1.98,1.99,2.00,2.01,2.02,2.03,2.04, through 3.05about three times as likely 2.95,2.96,2.97,2.98,2.99,3.00,3.01,3.02,3.03,3.04, through 4.05about four times as likely 3.95,3.96,3.97,3.98,3.99,4.00,4.01,4.02,4.03,4.04, through 5.05about five times as likely 4.95,4.96,4.97,4.98,4.99,5.00,5.01,5.02,5.03,5.04, through 6.05about six times as likely 5.95,5.96,5.97,5.98,5.99,6.00,6.01,6.02,6.03,6.04, through 7.05about seven times as likely 6.95,6.96,6.97,6.98,6.99,7.00,7.01,7.02,7.03,7.04, through 8.05about eight times as likely 7.95,7.96,7.97,7.98,7.99,8.00,8.01,8.02,8.03,8.04, through 9.05about nine times as likely 8.95,8.96,8.97,8.98,8.99,9.00,9.01,9.02,9.03,9.04, through 10.05about ten times as likely 9.95,9.96,9.97,9.98,9.99,10.00,10.01,10.02,10.03,10.04,10.05 and so on…

9/23/2015Slide 38 If the decimal fraction for the odds is: Homework problems will describe the likelihood as: Examples: 0.20 through 0.30 and a quarter times more likely 3.21 three and a quarter times more likely Greater than 0.30 and less than 0.37 and a third times more likely3.36 three and a third times more likely 0.45 through 0.55and a half times more likely3.49 three and a half times more likely Greater than 0.63 and less than 0.70 and two thirds times more likely 3.69 three and two thirds times more likely 0.70 through 0.80 and three quarter times more likely 3.70 three and three quarters times more likely otherwise reported as a number rounded to one decimal place times more likely The homework problems translate some of the decimal fractions for odds and odds ratios from numbers to text. The following table shows the translations used.

To save the answers we have marked for the question, click on the Save button. 9/23/2015Slide 39 We have now evaluated all of the questions for this problem.

When we have finished all of the questions, we click on the Submit at the bottom of the assignment. 9/23/2015Slide 40

After BlackBoard grades the assignment, it will give you an option to review the results. For this problem, we received the full 10 points because we marked all of the correct answers and did not mark any of the incorrect answers. Note: this version of BlackBoard does not give partial credit. 9/23/2015Slide 41

The feedback after the graded answer explains what the correct answer should have been. 9/23/2015Slide 42