SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.

Slides:



Advertisements
Similar presentations
Central Tendency- Nominal Variable (1)
Advertisements

A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of.
4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
Bivariate Analysis Cross-tabulation and chi-square.
SW388R6 Data Analysis and Computers I Slide 1 Paired-Samples T-Test of Population Mean Differences Key Points about Statistical Test Sample Homework Problem.
One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Assumption of normality
SPSS Session 1: Levels of Measurement and Frequency Distributions
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Detecting univariate outliers Detecting multivariate outliers
Chi-square Test of Independence
Discriminant Analysis – Basic Relationships
Multiple Regression – Basic Relationships
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Logistic Regression – Complete Problems
Problem 1: Relationship between Two Variables-1 (1)
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
BASIC STATISTICS WE MOST OFTEN USE Student Affairs Assessment Council Portland State University June 2012.
SW388R7 Data Analysis & Computers II Slide 1 Discriminant Analysis – Basic Relationships Discriminant Functions and Scores Describing Relationships Classification.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Measures of Central Tendency
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
Describing distributions with numbers
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
9/18/2015Slide 1 The homework problems on comparing central tendency and variability extend the focus central tendency and variability to a comparison.
Hierarchical Binary Logistic Regression
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Chi-Square Test of Independence Practice Problem – 1
110/10/2015Slide 1 The homework problems on comparing central tendency and variability extend our focus on central tendency and variability to a comparison.
SW318 Social Work Statistics Slide 1 Compare Central Tendency & Variability Group comparison of central tendency? Measurement Level? Badly Skewed? MedianMeanMedian.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.
As shown in Table 1, the groups differed in terms of language skills and the type of job last held. The intake form asked the client to indicate languages.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Perform Descriptive Statistics Section 6. Descriptive Statistics Descriptive statistics describe the status of variables. How you describe the status.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
SW318 Social Work Statistics Slide 1 Measure of Variability: Range (1) This question asks about the range, or minimum and maximum values of the variable.
Chi-square Test of Independence
SW388R6 Data Analysis and Computers I Slide 1 One-way Analysis of Variance and Post Hoc Tests Key Points about Statistical Test Sample Homework Problem.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Practice Problem: Lambda (1)
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
1/23/2016Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
The frequency distribution
SW388R7 Data Analysis & Computers II Slide 1 Solving Homework Problems in SPSS The data sets Options for variable lists in statistical procedures Options.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Multiple Regression – Split Sample Validation
Presentation transcript:

SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample Homework Problem Solving the Problem with SPSS Logic for Comparing Central Tendency and Variability Problems

SW388R6 Data Analysis and Computers I Slide 2 Impact of missing data on group comparisons - 1  When we analyze variables individually, we report on the valid and missing cases for each variable.  When we compare a measure of central tendency and variability for groups, we are analyzing two variables simultaneously:  one which defines the groups, and  one that represents the characteristic we are describing.  We report statistics for the cases that have valid data on both variables.

SW388R6 Data Analysis and Computers I Slide 3 Impact of missing data on group comparisons - 2  When we compare the measures of central tendency and variability on multiple characteristics for groups, the issue of valid and missing data becomes more complex. For example, if we wanted to compare age, income, and education for males and females, we may get different values for the means and the standard deviations depending on how the analysis is conducted in SPSS.  SPSS can compute the statistics for each characteristic in a separate analysis, or it can compute the statistics for all variables in a single analysis.

SW388R6 Data Analysis and Computers I Slide 4 Impact of missing data on group comparisons - 3  SPSS uses two rules for deciding how to handle missing data for multiple variables:  pairwise deletion of cases missing data, and  listwise deletion of cases missing data.  The default rule that SPSS will use unless instructed otherwise is listwise deletion.  In listwise deletion, SPSS omits cases that were missing data for any of the variables included in the analysis. Using our example, SPSS would omit a case from its calculations if it was missing data on age, income, education, or sex.

SW388R6 Data Analysis and Computers I Slide 5 Impact of missing data on group comparisons - 4  In pairwise deletion, SPSS omits cases only if they are missing data for the two variables needed for a specific calculation. When using pairwise deletion to compute the mean age for males and females, a case would be omitted only if it were missing data for age or sex. Cases missing data for income and education, but not for age or sex, are included in the calculations.  Computing statistics using listwise deletion of missing cases may produce different answers than one would get using pairwise deletion.

SW388R6 Data Analysis and Computers I Slide 6 Impact of missing data on group comparisons - 5  If we can get two different values for the same statistic, the obvious question is which one is correct.  An argument can be made for listwise deletion that it is correct because the same cases are being used for all calculations.  An argument can be made for pairwise deletion that it better represents the value for the statistic because it makes use of more cases.

SW388R6 Data Analysis and Computers I Slide 7 Impact of missing data on group comparisons - 6  The problem can become even more complex when we use multiple SPSS procedures for the same set of variables. For example, we will use the “Explore” procedure to get measures for interval and ordinal variables, and the “Crosstabs” procedure to get the mode and modal percentage, which “Explore” does not compute.  If we include only two variables in each procedure, the measures would have the same value under pairwise or listwise deletion, because the list only includes one pair of variables.

SW388R6 Data Analysis and Computers I Slide 8 Impact of missing data on group comparisons - 7  We can force SPSS to omit cases listwise across procedures if we create a dichotomous variable that indicates whether or not a case had valid data for all variables, and selecting only those cases as a subset for the analysis.  While we will not do this in our problems, the method requires that we create a new variable, e.g. “nmissing” which uses the SPSS NMISS function to count the number of variables that are missing data for the specified list of variables. We would then select cases to be included if nmissing = 0.

SW388R6 Data Analysis and Computers I Slide 9 Impact of missing data on group comparisons - 7  In solving the problems for this assignment, we will follow the strategy of including the variables only two at a time in the SPSS procedures:  one variable which defines the groups, and  one variable that represents the characteristic we are describing.  If you include multiple characteristic variables when you do the statistics in SPSS, you will probably get different answers than the ones stated in the problem and get the answer wrong.

SW388R6 Data Analysis and Computers I Slide 10 This problem uses the data set "GSS2000R.Sav" to compare the distribution of survey respondents who had not seen an x-rated movie in the last year to the distribution of survey respondents who had seen an x-rated movie in the last year for the variables: "highest year of school completed" [educ], "sex" [sex], "liberal or conservative political views" [polviews] and "frequency of attendance at religious services" [attend]. The groups are based on the variable "seen x-rated movie in last year" [xmovie]. The data available for this study included 136 survey respondents who had not seen an x- rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie]. Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Continued on next slide… Homework problems: Comparing central tendency and variability This is the general framework for the problems in the homework assignment on comparing central tendency and variability. The measures of central tendency and variability are used to compare and contrast two groups whose differences are important to the research being reported.

SW388R6 Data Analysis and Computers I Slide 11 Continued from previous slide Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). o True o False o Inappropriate use of a statistic Homework problems: Comparing central tendency and variability

SW388R6 Data Analysis and Computers I Slide 12 This problem uses the data set "GSS2000R.Sav" to compare the distribution of survey respondents who had not seen an x-rated movie in the last year to the distribution of survey respondents who had seen an x-rated movie in the last year for the variables: "highest year of school completed" [educ], "sex" [sex], "liberal or conservative political views" [polviews] and "frequency of attendance at religious services" [attend]. The groups are based on the variable "seen x-rated movie in last year" [xmovie]. The data available for this study included 136 survey respondents who had not seen an x- rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie]. Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Continued on next slide… Homework problems: Data set, groups and variables The first paragraph identifies: The data set to use, e.g. GSS2000R.Sav The groups to be compared in the analysis The variables used as the descriptors of the groups The variable to use to create the groups In this problem, the variable used to define groups has only two categories. If the grouping variable had more than two categories, the problem ignores the results for the categories not listed in the problem.

SW388R6 Data Analysis and Computers I Slide 13 This problem uses the data set "GSS2000R.Sav" to compare the distribution of survey respondents who had not seen an x-rated movie in the last year to the distribution of survey respondents who had seen an x-rated movie in the last year for the variables: "highest year of school completed" [educ], "sex" [sex], "liberal or conservative political views" [polviews] and "frequency of attendance at religious services" [attend]. The groups are based on the variable "seen x-rated movie in last year" [xmovie]. The data available for this study included 136 survey respondents who had not seen an x- rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie]. Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Continued on next slide… Homework problems: Sample size The second paragraph describes: the number of cases in each group, the number of total cases in the data set, the number of cases with missing data, and the number of cases that were in other categories of the grouping variable. The answer to the problem can only be true if all of the numbers describing the groups and sample are correct. The number of cases in the analysis will be the number in the two groups mentioned in the problem statement.

SW388R6 Data Analysis and Computers I Slide 14 This problem uses the data set "GSS2000R.Sav" to compare the distribution of survey respondents who had not seen an x-rated movie in the last year to the distribution of survey respondents who had seen an x-rated movie in the last year for the variables: "highest year of school completed" [educ], "sex" [sex], "liberal or conservative political views" [polviews] and "frequency of attendance at religious services" [attend]. The groups are based on the variable "seen x-rated movie in last year" [xmovie]. The data available for this study included 136 survey respondents who had not seen an x- rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie]. Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Continued on next slide… Homework problems: Comparing central tendency and variability The remaining paragraphs describe each demographic characteristic in terms of central tendency and variability. These paragraphs are written in the descriptive format similar to what would appear in a journal, rather than as tables of statistical values. This will require you to translate the SPSS output, variable, and value labels to more descriptive statements. The statistics themselves are shown in parentheses at the end of the statements, using APA formatting style where a style has been defined. For example, “M” is defined as the correct abbreviation for the mean.

SW388R6 Data Analysis and Computers I Slide 15 Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). Homework problems: Comparing interval level variables Comparison for an interval level variable that is not skewed, e.g. years of schooling, is done using the mean and the standard deviation of each group. This comparison supports statements about which group had a higher score or greater variability on the variable. For example, we can say that one group has more years of education that the other group, and the distribution of scores was more or less varied. If an interval level variable is badly skewed, the comparison is done with the median and the interquartile range, following the same rule which we used for individual variables.

SW388R6 Data Analysis and Computers I Slide 16 Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). Homework problems: Comparing ordinal level variables For an ordinal level variable, e.g. frequency of church attendance, groups are compared using the values for the median and the interquartile range. This comparison supports statements about which group had a higher score or greater variability on the variable. For example, we can say that one group went to church more often that the other group, and the distribution of scores was more or less spread out.

SW388R6 Data Analysis and Computers I Slide 17 Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). Homework problems: Comparing ordinal variables with many tie scores “Political views” is also an ordinal variable, but it contains an excessive number of tied scores which compromise the meaning of the median and interquartile range as measures of central tendency and variablility. When a variable has excessive ties, its mode and the percent of cases in the modal category is reported. The value label is used instead of the numeric code. An ordinal variable will be considered to have excessive tie scores when the median has the same value as either the lower or upper bound of the interquartile range, following the same rule we used for central tendency and variability for individual variables.

SW388R6 Data Analysis and Computers I Slide 18 Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). Homework problems: Comparing nominal level variables Nominal (including dichotomous) variables, e.g. sex, are compared using their modal category and the percent of cases in the modal category. The value label is used for the modal category instead of the numeric code.

SW388R6 Data Analysis and Computers I Slide 19 Survey respondents who had not seen an x-rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53). The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73). Survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%). Survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%). Survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00). The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x-rated movie in the last year (IQR = 5.00). o True o False o Inappropriate use of a statistic Homework problems: Choosing an answer The answer to a problem will be True if all of the statements about the sample size, and the comparisons of central tendency and variability are correct, both in terms of the statistic selected and the value reported. The answer to a problem will Inappropriate use of a statistic if the reported statistic violates the level of measurement criteria, i.e.: the mean and standard deviation are reported for an ordinal or nominal variable the median and interquartile range are reported for a nominal variable. The answer to a problem will be False if a wrong value is reported for the sample size or for a statistic, or the wrong statistic is reported but the level of measurement criteria are not violated.

SW388R6 Data Analysis and Computers I Slide 20 Solving the problem with SPSS: Checking the number of cases - 1 Select the Descriptive Statistics > Frequencies… command from the Analysis menu. Our first task is to use a frequency distribution to verify the number of cases in both groups to check the statement in the problem that: The data available for this study included 136 survey respondents who had not seen an x-rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie].

SW388R6 Data Analysis and Computers I Slide 21 Solving the problem with SPSS: Checking the number of cases - 2 In the Frequencies dialog box, we move the variable used to define the groups, xmovie, to the Variable(s): list box. Since all we want is the frequency distribution, we click on the OK button to generate the output.

SW388R6 Data Analysis and Computers I Slide 22 Solving the problem with SPSS: Checking the number of cases - 3 The data available for this study included 136 survey respondents who had not seen an x-rated movie in the last year and 49 survey respondents who had seen an x-rated movie in the last year. Out of the total of 270 cases in the dataset, 85 were omitted because of missing data and 0 cases were in other categories of the variable "seen x-rated movie in last year" [xmovie]. As we can see in the frequency table, each of these numbers is correct.

SW388R6 Data Analysis and Computers I Slide 23 Solving the problem with SPSS: Generating the output - 1 Select the Descriptive Statistics > Explore… command from the Analysis menu. We will use the Explore procedure to generate the measures of central tendency and variability that we need to evaluate the statements about the individual demographic variables. The Explore procedure gives us the output we need to solve the measures of central tendency and variability needed for interval and ordinal variables. To get the mode and modal percent for nominal level variables, we will use the Crosstabs procedure.

SW388R6 Data Analysis and Computers I Slide 24 Solving the problem with SPSS: Generating the output - 2 First, In the Explore dialog box, we move the first variable we want to compare, educ, to the Dependent List list box. Second, we move the variable defining the groups, xmovie, to the Factor List list box. Fourth, we click on the Statistics… button to select specific statistics. Third, we click the Statistics option button to limit the output displayed by SPSS. Following the discussion about missing data above, we will analyze characteristics one at a time.

SW388R6 Data Analysis and Computers I Slide 25 Solving the problem with SPSS: Generating the output - 3 In the Explore: Statistics dialog box, we can only select the general category of Descriptives. We do not have an option to specify individual measures. While Descriptives will include the interquartile range, it does not include the values of the first and third quartile, which we need to identify excessive ties for an ordinal variable. To get the quartiles, we mark the Percentiles check box. When we have marked the options we want, we click on the Continue button to close the dialog.

SW388R6 Data Analysis and Computers I Slide 26 Solving the problem with SPSS: Generating the output - 4 Having selected the statistics we want, we click on the OK button to generate the output.

SW388R6 Data Analysis and Computers I Slide 27 Solving the problem with SPSS: Statistical comparison of education - 1 The skewness for survey respondents who had not seen an x-rated movie in the last year is -0.33, and the skewness for survey respondents who had seen an x-rated movie in the last year is The skewness for both groups falls between -1 and +1. The mean and standard deviation should be reported as the measures of central tendency and variability for education Since educ is an interval level variable, we check the skewness of the distribution to determine when we report the mean or median as the measure of central tendency.

SW388R6 Data Analysis and Computers I Slide 28 Solving the problem with SPSS: Statistical comparison of education - 2 The statement that survey respondents who had not seen an x- rated movie in the last year had completed fewer years of school (M = 13.01) than survey respondents who had seen an x-rated movie in the last year (M = 13.53) is correct.

SW388R6 Data Analysis and Computers I Slide 29 Solving the problem with SPSS: Statistical comparison of education - 3 The statement that The scores for highest year of school completed varied more widely for survey respondents who had not seen an x-rated movie in the last year (SD = 3.25) compared to the scores for survey respondents who had seen an x-rated movie in the last year (SD = 2.73) is also correct.

SW388R6 Data Analysis and Computers I Slide 30 Generating the output for sex - 1 SPSS provides a short cut for us to use when we want to run the same procedure again. Position the mouse over the Dialog Recall tool button on the tool bar. Next, we will compute central tendency for the dichotomous variable, sex.

SW388R6 Data Analysis and Computers I Slide 31 Generating the output for sex - 2 Click the mouse on the Dialog Recall tool button. A drop-down menu listing the last procedures run appear. Click on the Explore item at the top of the menu.

SW388R6 Data Analysis and Computers I Slide 32 Generating the output for sex - 3 Click on the left arrow button to remove educ from the Dependent List.

SW388R6 Data Analysis and Computers I Slide 33 Generating the output for sex - 4 Move sex to the Dependent List. Since we only want to change the variable being analyzed and keep all of the options we previously specified, we click on the OK button.

SW388R6 Data Analysis and Computers I Slide 34 Solving the problem with SPSS: Statistical comparison of sex - 1 The Explore procedure does not supply us with any information about central tendency for a nominal level variable. We will use the Crosstabs procedure to create a contingency table.

SW388R6 Data Analysis and Computers I Slide 35 Generating more output for sex - 1 Select the Descriptive Statistics > Crosstab… command from the Analysis menu.

SW388R6 Data Analysis and Computers I Slide 36 Generating more output for sex - 2 First, move the variable for the characteristic we want to analyze, sex, to the Row(s) list box. Second, move the group variable, xmovie, to the Column(s) list box. Third, click on the Cells button to specify what will appear in the cells of the crosstabulated table. To keep from confusing myself, I always create crosstabs tables with the grouping variable (independent variable) in the columns and the characteristic variable (dependent variable) in the rows. The mode for each group will be on the row that has the largest percentage within the column.

SW388R6 Data Analysis and Computers I Slide 37 Generating more output for sex - 3 Accept the default Observed check box so that the table contains the tally of cases in each cell. Mark the check box for Column percentages. The cell with the largest percentage in each column is the mode for the group specified in the column, and the percentage in that cell is the modal percent. Click on the Continue button to close the dialog box.

SW388R6 Data Analysis and Computers I Slide 38 Generating more output for sex - 4 Click on the OK button to request the output.

SW388R6 Data Analysis and Computers I Slide 39 Solving the problem with SPSS: Statistical comparison of sex - 1 The mode for subjects who saw an x-rated movie is “1 MALE” because the largest percentage, (69.4%) in the “1 YES” column is located on that row. The mode for subjects who did not see an x-rated movie is “2 FEMALE” because the largest percentage, (68.4%) in the “0 NO” column is located on that row. The statement that survey respondents who had not seen an x-rated movie in the last year were most likely to have been female (68.4%), while survey respondents who had seen an x-rated movie in the last year were most likely to have been male (69.4%) is correct.

SW388R6 Data Analysis and Computers I Slide 40 Generating the output for political views - 1 After selecting Explore from the Dialog Recall menu, remove the variable sex from the Dependent List and move the variable polviews to the Dependent List. Since we only want to change the variable being analyzed and keep all of the options we previously specified, we click on the OK button.

SW388R6 Data Analysis and Computers I Slide 41 Solving the problem with SPSS: Statistical comparison of political views - 1 Since polviews is an ordinal level variable, we identify the medians and interquartile ranges for the two groups. We will compare these as our measures of central tendency and variability, provided that there are not excessive ties in either group for the medians of 4.00.

SW388R6 Data Analysis and Computers I Slide 42 Solving the problem with SPSS: Statistical comparison of political views - 2 Since the first quartile is the same as the 25 th percentile and the third quartile is the 75 th percentile, we can use the Percentiles table to detect excessive ties. We will use the row for Weighted Average (the SPSS default) as the calculation for percentiles. Tukey’s hinges are the percentiles used for box plots and may differ from other calculations for percentile. The first quartile for the group which saw an x-rated movie, 4.00, is the same value as the median for the group, 4.00, indicating excessive ties. The mode should be used as the measure of central tendency rather than the median.

SW388R6 Data Analysis and Computers I Slide 43 Generating more output for political views - 1 To generate the mode for political views, we use the Crosstabs procedure, which we used for sex. Click on the Dialog Recall tool button and select Crosstabs from the drop-down menu.

SW388R6 Data Analysis and Computers I Slide 44 Generating more output for political views - 2 Remove the variable sex from the Row(s) list box and move the variable polviews to the Row(s) list box. Since we only want to change the variable being analyzed and keep all of the options we previously specified, we click on the OK button.

SW388R6 Data Analysis and Computers I Slide 45 Solving the problem with SPSS: Statistical comparison of political views - 1 The statement that survey respondents who had not seen an x-rated movie in the last year were most likely to describe their political views as moderate (41.7%), as were survey respondents who had seen an x-rated movie in the last year (37.8%) is correct. The mode for subjects who saw an x-rated movie is “4 MODERATE” because the largest percentage, (37.8%) in the “1 YES” column is located on that row. The mode for subjects who did not see an x-rated movie is “4 MODERATE” because the largest percentage, (41.7%) in the “0 NO” column is located on that row.

SW388R6 Data Analysis and Computers I Slide 46 Generating output for church attendance - 1 After selecting Explore from the Dialog Recall menu, remove the variable polviews from the Dependent List and move the variable attend to the Dependent List. Since we only want to change the variable being analyzed and keep all of the options we previously specified, we click on the OK button.

SW388R6 Data Analysis and Computers I Slide 47 Solving the problem with SPSS: Comparison of church attendance - 1 Since attend is an ordinal level variable, we identify the medians and interquartile ranges for the two groups. We will compare these as our measures of central tendency and variability, provided that there are not excessive ties in either group for the medians of 3.00 and 2.00.

SW388R6 Data Analysis and Computers I Slide 48 Solving the problem with SPSS: Comparison of church attendance - 2 Since the first quartile is the same as the 25 th percentile and the third quartile is the 75 th percentile, we can use the Percentiles table to detect excessive ties. The statement that survey respondents who had not seen an x-rated movie in the last year attended religious services more often (Mdn = 3.00) than survey respondents who had seen an x-rated movie in the last year (Mdn = 2.00) is correct. The statement that The scores for frequency of attendance at religious services varied by the same amount for survey respondents who had not seen an x-rated movie in the last year (IQR = 5.00) compared to the scores for survey respondents who had seen an x- rated movie in the last year (IQR = 5.00) is also correct. Neither the first quartile or the third quartiles match the median for either group, so we will report the median and interquartile range. Since all of the reported statistics were correctly chosen and reported, the answer to the overall problem is True.

SW388R6 Data Analysis and Computers I Slide 49 Logic for homework problems: Comparing central tendency and variability - 1 Number of valid and missing cases correct? Yes No False Measurement level of variable? OrdinalNominal (dichotomous) Interval The logic for the problems in this assignment is the same as the logic for central tendency and variability except for the requirement at the end that the comparative statement must be correct as well as the reported statistical values.

SW388R6 Data Analysis and Computers I Slide 50 Logic for homework problems: Comparing central tendency and variability 2 Interval/ratio No Skewed? False Mean/ St.Dev. reported? Yes Median/ IQR reported? No False No A variable is skewed if its skewness is not between -1.0 and Mode reported? Mode is legitimate for interval variables, but not meaningful unless values are grouped. Homework problems do not include modes for interval variables. Yes True Yes Correct values and comparison? False No

SW388R6 Data Analysis and Computers I Slide 51 Logic for homework problems: Comparing central tendency and variability - 3 Ordinal Mean/ St.Dev. reported? No Median/ IQR reported? Yes False No Excessive ties? Inappropriate application of a statistic Mode reported? False No Yes Excessive ties occur when the median is equal to either the lower or upper bound of the IQR. Yes True Yes Correct values and comparison? False No

SW388R6 Data Analysis and Computers I Slide 52 Logic for homework problems: Comparing central tendency and variability - 4 Mode reported? Yes No Nominal (dichotomous) Median/ IQR reported? Mean/ St.Dev. reported? Inappropriate application of a statistic Yes Inappropriate application of a statistic Yes No Yes True Correct values and comparison? False No