Assumption of normality

Slides:



Advertisements
Similar presentations
Descriptive Statistics-II
Advertisements

Computing Transformations
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
SW388R6 Data Analysis and Computers I Slide 1 Testing Assumptions of Linear Regression Detecting Outliers Transforming Variables Logic for testing assumptions.
SW388R6 Data Analysis and Computers I Slide 1 Paired-Samples T-Test of Population Mean Differences Key Points about Statistical Test Sample Homework Problem.
One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
Detecting univariate outliers Detecting multivariate outliers
Chi-square Test of Independence
Principal component analysis
Multiple Regression – Assumptions and Outliers
Multiple Regression – Basic Relationships
SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.
Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Testing Assumptions of Linear Regression
Logistic Regression – Complete Problems
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
Assumptions of multiple regression
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Hierarchical Binary Logistic Regression
Chi-Square Test of Independence Practice Problem – 1
Stepwise Multiple Regression
Slide 1 Hierarchical Multiple Regression. Slide 2 Differences between standard and hierarchical multiple regression  Standard multiple regression is.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
11/19/2015Slide 1 We can test the relationship between a quantitative dependent variable and two categorical independent variables with a two-factor analysis.
Chi-square Test of Independence
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW318 Social Work Statistics Slide 1 One-way Analysis of Variance  1. Satisfy level of measurement requirements  Dependent variable is interval (ordinal)
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.
1/23/2016Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Evaluating Univariate Normality
Computing Transformations
Strategy for Complete discriminant Analysis
Multiple Regression – Split Sample Validation
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Assumption of normality Transformations Assumption of normality script Practice problems

Assumption of Normality Many of the statistical methods that we will apply require the assumption that a variable or variables are normally distributed. With multivariate statistics, the assumption is that the combination of variables follows a multivariate normal distribution. Since there is not a direct test for multivariate normality, we generally test each variable individually and assume that they are multivariate normal if they are individually normal, though this is not necessarily the case.

Evaluating normality There are both graphical and statistical methods for evaluating normality. Graphical methods include the histogram and normality plot. Statistical methods include diagnostic hypothesis tests for normality, and a rule of thumb that says a variable is reasonably close to normal if its skewness and kurtosis have values between –1.0 and +1.0. None of the methods is absolutely definitive.

Transformations When a variable is not normally distributed, we can create a transformed variable and test it for normality. If the transformed variable is normally distributed, we can substitute it in our analysis. Three common transformations are: the logarithmic transformation, the square root transformation, and the inverse transformation. All of these change the measuring scale on the horizontal axis of a histogram to produce a transformed variable that is mathematically equivalent to the original variable.

When transformations do not work When none of the transformations induces normality in a variable, including that variable in the analysis will reduce our effectiveness at identifying statistical relationships, i.e. we lose power. We do have the option of changing the way the information in the variable is represented, e.g. substitute several dichotomous variables for a single metric variable.

Problem 1 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of normality, total hours spent on the Internet is normally distributed. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic

Computing “Explore” descriptive statistics To compute the statistics needed for evaluating the normality of a variable, select the Explore… command from the Descriptive Statistics menu.

Adding the variable to be evaluated Second, click on right arrow button to move the highlighted variable to the Dependent List. First, click on the variable to be included in the analysis to highlight it.

Selecting statistics to be computed To select the statistics for the output, click on the Statistics… command button.

Including descriptive statistics First, click on the Descriptives checkbox to select it. Clear the other checkboxes. Second, click on the Continue button to complete the request for statistics.

Selecting charts for the output To select the diagnostic charts for the output, click on the Plots… command button.

Including diagnostic plots and statistics First, click on the None option button on the Boxplots panel since boxplots are not as helpful as other charts in assessing normality. Finally, click on the Continue button to complete the request. Second, click on the Normality plots with tests checkbox to include normality plots and the hypothesis tests for normality. Third, click on the Histogram checkbox to include a histogram in the output. You may want to examine the stem-and-leaf plot as well, though I find it less useful.

Completing the specifications for the analysis Click on the OK button to complete the specifications for the analysis and request SPSS to produce the output.

The histogram An initial impression of the normality of the distribution can be gained by examining the histogram. In this example, the histogram shows a substantial violation of normality caused by a extremely large value in the distribution.

The normality plot The problem with the normality of this variable’s distribution is reinforced by the normality plot. If the variable were normally distributed, the red dots would fit the green line very closely. In this case, the red points in the upper right of the chart indicate the severe skewing caused by the extremely large data values.

The test of normality Problem 1 asks about the results of the test of normality. Since the sample size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample size were 50 or less, we would use the Shapiro-Wilk statistic instead. The null hypothesis for the test of normality states that the actual distribution of the variable is equal to the expected distribution, i.e., the variable is normally distributed. Since the probability associated with the test of normality is < 0.001 is less than or equal to the level of significance (0.01), we reject the null hypothesis and conclude that total hours spent on the Internet is not normally distributed. (Note: we report the probability as <0.001 instead of .000 to be clear that the probability is not really zero.) The answer to problem 1 is false.

The assumption of normality script An SPSS script to produce all of the output that we have produced manually is available on the course web site. After downloading the script, run it to test the assumption of linearity. Select Run Script… from the Utilities menu.

Selecting the assumption of normality script First, navigate to the folder containing your scripts and highlight the NormalityAssumptionAndTransformations.SBS script. Second, click on the Run button to activate the script.

Specifications for normality script First, move variables from the list of variables in the data set to the Variables to Test list box. The default output is to do all of the transformations of the variable. To exclude some transformations from the calculations, clear the checkboxes. Third, click on the OK button to run the script.

The test of normality The script produces the same output that we computed manually, in this example, the tests of normality.

Problem 2 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Based on the rule of thumb for the allowable magnitude of skewness and kurtosis, total hours spent on the Internet is normally distributed. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic

Table of descriptive statistics To answer problem 2, we look at the values for skewness and kurtosis in the Descriptives table. The skewness and kurtosis for the variable both exceed the rule of thumb criteria of 1.0. The variable is not normally distributed. The answer to problem 2 if false.

Problem 3 In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Use 0.01 as the level of significance. Based on a diagnostic hypothesis test of normality, "total hours spent on the Internet" is not normally distributed. A logarithmic transformation of "total hours spent on the Internet" results in a variable that is normally distributed. 1. True 2. True with caution 3. False 4. Incorrect application of a statistic

The test of normality Problem 3 specifically asks about the results of the test of normality for the logarithmic transformation. Since our sample size is larger than 50, we use the Kolmogorov-Smirnov test. The null hypothesis for the Kolmogorov-Smirnov test of normality states that the actual distribution of the transformed variable is equal to the expected distribution, i.e., the transformed variable is normally distributed. Since the probability associated with the test of normality (0.200) is greater than the level of significance, we fail to reject the null hypothesis and conclude that the logarithmic transformation of total hours spent on the Internet is normally distributed. The answer to problem 3 is true.

Other problems on assumption of normality A problem may ask about the assumption of normality for a nominal level variable. The answer will be “An inappropriate application of a statistic” since there is no expectation that a nominal variable be normal. A problem may ask about the assumption of normality for an ordinal level variable. If the variable or transformed variable is normal, the correct answer to the question is “True with caution” since we may be required to defend treating an ordinal variable as metric. Questions will specify a level of significance to use and the statistical evidence upon which you should base your answer.

Steps in answering questions about the assumption of normality – question 1 The following is a guide to the decision process for answering problems about the normality of a variable: Incorrect application of a statistic Yes No Is the variable to be evaluated metric? Does the statistical evidence support normality assumption? No False Yes Are any of the metric variables ordinal level? No True Yes True with caution

Steps in answering questions about the assumption of normality – question 2 The following is a guide to the decision process for answering problems about the normality of a transformation: Incorrect application of a statistic Yes No Is the variable to be evaluated metric? Statistical evidence for transformation supports normality? Statistical evidence supports normality? No No False Yes Either variable ordinal level? No True Yes True with caution