Presentation is loading. Please wait.

Presentation is loading. Please wait.

2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.

Similar presentations


Presentation on theme: "2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe."— Presentation transcript:

1 2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe the standard deviation. The empirical rule states that approximately 68% of the cases in a normal distribution fall within one standard deviation of the mean, 95% of the cases in a normal distribution fall within two standard deviations of the mean, and 99.7% of all cases fall within three standard deviations of the mean. While distributions of real quantitative variables are usually not normal, the empirical rule has been demonstrated to be applicable if the distribution is “nearly normal.” The determination that a variable is “nearly normal” requires us to propose a set of criteria for determining the boundary between “nearly normal” and “not nearly normal.”

2 2/24/2016Slide 2 Like all of the criteria that we use in statistics, we will propose a criteria, recognize that is really an approximation rather than a precise estimate, and hope that common sense will prevail in applying the criteria. We have previously identified the criteria we will use for assessing the “nearly normal” condition: skewness, kurtosis, and extreme outliers. We will use our previous requirements for skewness and kurtosis (both between -1.0 and +1.0), but we will define outliers as cases that are more than 3 standard deviations from the mean (either above or below). The last criteria is derived from the empirical rule: if 99.7% of the cases in a normal distribution fall within three standard deviations of the mean, then those that fall outside three standard deviations must be relatively uncommon.

3 2/24/2016Slide 3 The requirement to compare the scores in a distribution to the mean plus or minus three standard deviations could lead to a lot of tedious arithmetic. Fortunately, there is a relatively easy substitute – converting the values in the distribution to “standard scores.” Standard scores convert the values of any distribution into the distance between each individual value and the mean of the distribution, in standard deviation units. Standardizing variables gives them a common unit of measure that makes it easy to compare scores across quantitative variables. For example, if I converted a student’s GRE score (e.g. 1100) and GPA (3.78) to standard scores, I would know which was further away from the mean for all students, and thus a higher measure of academic potential.

4 2/24/2016Slide 4 SPSS will automatically convert any distribution to standard scores (also referred to as z-scores) and we can use the same formula over and over to identify outliers. Many procedures use standardized scores to present findings or diagnostics, e.g. we will analyze standardized residuals in regression analysis. If the original variable does not satisfy the “nearly normal condition,” we will re-express the data values as logarithms and squares to see if we can induce normality. If the transformation is successful at meeting the criteria for a nearly normal distribution, we will calculate the percentage of cases falling within 1 and 2 standard deviations of the mean and compare our findings to the percentage prescribed by the empirical rule.

5 2/24/2016Slide 5 In these problems, we will base our assessment of normality on more expanded criteria than we have used previously. Since we are concerned with determining probabilities or percentages based on the normal distribution, we are concerned with kurtosis as well as skewness. The height of the distribution as measured by kurtosis is related to the standard deviation and has an impact on the percentage of cases within one standard deviation of the mean and within two standard deviations of the mean. In the last assignment, we used a boxplot strategy to identify outliers. In this assignment, we will define outliers as cases falling outside three standard deviations of the mean.

6 SOLVING HOMEWORK PROBLEMS The Empirical Rule states that about 68% of the values will fall within 1 standard deviation of the mean and 95% of the values will fall within 2 standard deviations of the mean, provided the variable satisfies the nearly normal condition that the distribution is unimodal and symmetric. There are numerous statistical tests and graphic methods for evaluating the normality of a distribution. In these problems, we will use a simple rule of thumb that states that the distribution of the variable is reasonably normal if both skewness and kurtosis of the distribution are between -1.0 and + 1.0 and there are no outliers less than or equal to three standard deviations below the mean or greater than or equal to three standard deviations above the mean. Slide 6

7 If the distribution satisfies the nearly normal condition, we will test whether or not the percentages specified by the empirical rule hold for the variable. We will consider the rule to be satisfied if the actual percentage of values falls within 2% of the proportion indicated by the empirical rule. If the distribution does not satisfy the nearly normal condition, we will examine the impact on the normality assumption when the distribution is re-expressed by computing the logarithm of the values if the variable is skewed to the right. If the variable is negatively skewed, we will square the values and examine the impact on the normality assumption. If the transformation is successful at meeting the criteria for a nearly normal distribution, we will calculate the percentage of cases falling within 1 and 2 standard deviations of the mean and compare the actual percentage to the percentage prescribed by the empirical rule. Slide 7

8 The Normal Distribution and the Empirical Rule Homework Problems

9 2/24/2016Slide 9 The Normal Distribution and the Empirical Rule - 1 These problems include a series of narrative statements, but no table. APA guidelines suggest that a table not be used if it would contain information for only a single variable. The notes provide information about the data set to use (world2003.sav), the variable used in the problem, literacy, and the formulas for re-expression if needed.

10 2/24/2016Slide 10 The Normal Distribution and the Empirical Rule - 2 The first paragraph asks about the number of cases in the data set, the number of cases missing data, the number of valid cases, and describes the analysis to be conducted.

11 2/24/2016Slide 11 The Normal Distribution and the Empirical Rule - 3 To compute the descriptive statistics and standard scores, select the Descriptive Statistics > Descriptives command from the Analyze menu. For these problems, outliers will be defined by standard scores (z scores). In SPSS, standard scores are created by the Descriptives procedure and not the Explore procedure, so we use the Descriptives comand.

12 2/24/2016Slide 12 The Normal Distribution and the Empirical Rule - 4 Move the variable for the analysis literacy to the Variable(s) list box. Click on the Options button to select optional statistics.

13 2/24/2016Slide 13 The Normal Distribution and the Empirical Rule - 5 The check boxes for Mean and Std. Deviation are already marked by default. Click on Continue button to close the dialog box. Mark the Kurtosis and Skewness check boxes. This will provide the statistics for assessing normality.

14 2/24/2016Slide 14 The Normal Distribution and the Empirical Rule - 6 Click on the OK button to produce the output. Mark the check box Save standardized values as variables.

15 2/24/2016Slide 15 The Normal Distribution and the Empirical Rule - 7 The descriptives output does not produce a case processing summary. It only tell us the number of valid cases for the problem, 131.

16 2/24/2016Slide 16 The Normal Distribution and the Empirical Rule - 8 We can obtain the total number of cases in the data set by scrolling to the row of data in the SPSS Data Editor, 192.

17 2/24/2016Slide 17 The Normal Distribution and the Empirical Rule - 9 We enter the total number of cases listed in the Data Editor. We enter the number of valid cases from the Descriptives output table. We compute the number of cases excluded because of missing values from the other two pieces of information: 192 – 131 = 61 cases were excluded.

18 2/24/2016Slide 18 The Normal Distribution and the Empirical Rule - 10 The first sentence in the next paragraph asks us to evaluate three criteria for the “nearly normal” condition: skewness kurtosis outliers more than three standard deviations from the mean.

19 2/24/2016Slide 19 The Normal Distribution and the Empirical Rule - 11 The first item in this sentence asks us to enter and characterize the degree and direction of the skewness for the distribution.

20 2/24/2016Slide 20 The Normal Distribution and the Empirical Rule - 12 Since the skewness is negative, we characterize it as skewness to the left. Since skewness (-1.21) is smaller than - 1.0, we characterize it as badly skewed. We enter the value of skewness from the table of descriptive statistics.

21 2/24/2016Slide 21 The Normal Distribution and the Empirical Rule - 13 The second part of the sentence asks us to enter and characterize the kurtosis of the distribution

22 2/24/2016Slide 22 The Normal Distribution and the Empirical Rule - 14 Since the kurtosis is positive, we characterize it as peaked. Since the kurtosis is less than 1.0, we characterize it as slightly more peaked. We enter the value of kurtosis from the table of descriptive statistics.

23 2/24/2016Slide 23 The Normal Distribution and the Empirical Rule - 15 The next sentence asks us enter the mean and to identify the number of outliers that were three or more standard deviations from the mean. These will be cases that have a standard score equal to or less than - 3.0 or equal to or greater than +3.0.

24 2/24/2016Slide 24 The Normal Distribution and the Empirical Rule - 16 In previous examples, we identified the number of outliers by creating a variable and doing a frequency distribution on it. In this example, we will count the number of outliers by sorting the column of data values. First, click on the column header for the variable (Zliteracy) containing the standard scores to select the column of data. Second, right click on the column header (Zliteracy) and select Sort Ascending from the popup menu. This will show any negative outliers at the top of the column.

25 2/24/2016Slide 25 The Normal Distribution and the Empirical Rule - 17 We see two negative values less than or equal to -3.0: - 3.20711 and -3.12964. These two cases are outliers. Scroll down in the data editor, past the cases with missing values.

26 2/24/2016Slide 26 The Normal Distribution and the Empirical Rule - 18 Click the right mouse button again on the column header for Zliteracy, and select Sort Descending from the pop- up menu. This will show any positive outliers at the top of the column.

27 2/24/2016Slide 27 The Normal Distribution and the Empirical Rule - 19 With the data for Zliteracy sorted in descending order, we see that the largest outlier was 1.0037. There are no outliers at the positive end of the distribution.

28 2/24/2016Slide 28 The Normal Distribution and the Empirical Rule - 20 We enter the number of outliers found in the column of standard scores: 2 We complete the sentence by entering the value for the mean from the table of descriptive statistics.

29 2/24/2016Slide 29 The Normal Distribution and the Empirical Rule - 21 The next sentences asks us to evaluate the criteria for a nearly normal distribution: skewness and kurtosis between -1.0 and +1.0 and outliers more than 3 standard deviations from the mean.

30 2/24/2016Slide 30 The Normal Distribution and the Empirical Rule - 22 Since the distribution was badly skewed to the left and contained two outliers with zscores more than 3 standard deviations from the mean, it is not nearly normal. The next question in the sentence focuses on our remedy for a distribution that is not nearly normal.

31 2/24/2016Slide 31 The Normal Distribution and the Empirical Rule - 23 Since the skewness for this variable was negative, we select re-expressed with squares.

32 2/24/2016Slide 32 The Normal Distribution and the Empirical Rule - 24 The next statement assumes that we have re-expressed that data and are testing whether or not the re- expressed data has a nearly normal distribution. We will defer selecting an answer to this question until we have evaluated the individual criteria. The formula for re-expressing the data is given in the second note.

33 2/24/2016Slide 33 The Normal Distribution and the Empirical Rule - 25 The first item in the second sentence asks us to enter and characterize the degree and direction of the skewness for the distribution. In order to answer this question, we first re-express the variable.

34 2/24/2016Slide 34 The Normal Distribution and the Empirical Rule - 26 To compute the transformed variable, select the Compute command from the Transform menu.

35 2/24/2016Slide 35 The Normal Distribution and the Empirical Rule - 27 In the Compute Variable dialog box, we type the name for the new variable, SQ_literacy, in the Target Variable text box. In the Numeric Expression text box, type the formula as shown to in note 2. My convention for naming transformed variables is to add the variable name to the letters LG_ for a log transformation and SQ_ for a square transformation. This helps me keep the relationship between the variables clear. Click on the OK button to compute the transformed variable.

36 2/24/2016Slide 36 The Normal Distribution and the Empirical Rule - 28 Scroll the data editor window to the right to see the transformed variable, SQ_literacy.

37 2/24/2016Slide 37 The Normal Distribution and the Empirical Rule - 29 To calculate the descriptive statistics so we can check the normality conditions for the transformed variable, click on the Recall Recently Used Dialogs tool button, and select Descriptives.

38 2/24/2016Slide 38 The Normal Distribution and the Empirical Rule - 30 Since we want the same statistics that we computed for the variable literacy, we only need to replace the variable literacy with SQ_literacy. Click on the OK button to produce the output. Be sure the check box for saving standardized values remains checks so that Descriptives will compute standard scores for SQ_literacy.

39 2/24/2016Slide 39 The Normal Distribution and the Empirical Rule - 31 Next, we will check for outliers that had a standard score less than or equal to - 3.0 or greater than or equal to +3.0. The square transformation of literacy [SQ_literacy] satisfied the criteria for a normal distribution. The skewness of the distribution (-0.65) was between - 1.0 and +1.0 and the kurtosis of the distribution (- 0.73) was between -1.0 and +1.0.

40 2/24/2016Slide 40 The Normal Distribution and the Empirical Rule - 32 Since the skewness is negative, we characterize it as skewness to the left. Since skewness (-.64) is greater than -1.0, we characterize it as slightly skewed. We enter the value of skewness from the table of descriptive statistics.

41 2/24/2016Slide 41 The Normal Distribution and the Empirical Rule - 33 The second part of the sentence asks us to enter and characterize the kurtosis of the distribution.

42 2/24/2016Slide 42 The Normal Distribution and the Empirical Rule - 34 Since the kurtosis is negative, we characterize it as flat. Since the kurtosis is greater than -1.0, we characterize it as slightly flatter. We enter the value of kurtosis from the table of descriptive statistics.

43 2/24/2016Slide 43 The Normal Distribution and the Empirical Rule - 35 The final part of the sentence asks us to identify the number of outliers that were three or more standard deviations from the mean. These will be cases that have a standard scores of the re-expressed variable equal to or less than -3.0 or equal to or greater than +3.0.

44 2/24/2016Slide 44 The Normal Distribution and the Empirical Rule - 36 When we sort ZSQ_literacy in ascending order, we see that there are no outliers with standard scores less than or equal to -3.0. To identify outliers, we examine ZSQ_literacy, which the Descriptives procedure added to the data set.

45 2/24/2016Slide 45 The Normal Distribution and the Empirical Rule - 37 When we sort ZSQ_literacy in descending order, we see that there are no outliers with standard scores greater than or equal to +3.0. The distribution of the re-expressed variable does not include any outliers.

46 2/24/2016Slide 46 The Normal Distribution and the Empirical Rule - 38 We enter a zero for the number of extreme outliers. Since the re-expressed variable met all of the criteria for being nearly normal, we can now choose satisfied from the drop-down list. Note: if the original distribution is “nearly normal”, re-expression is not necessary and the answers to all of the questions in this paragraph are na.

47 2/24/2016Slide 47 The Normal Distribution and the Empirical Rule - 39 The final paragraph compares the actual percentage within one and two standard deviations to the percent predicted by the Empirical Rule. We will create two new variables: one to represent the number of cases within 1 standard deviation, and a second to represent the number of cases within two standard deviations. A frequency distribution of these variables will tell us the actual percentages within these ranges.

48 2/24/2016Slide 48 The Normal Distribution and the Empirical Rule - 40 We will create a new variable that will have a value of 1 if the standard score is within 1 standard deviation of the mean, and 0 if it has a value outside this range. To compute the new variable, select the Compute command from the Transform menu.

49 2/24/2016Slide 49 The Normal Distribution and the Empirical Rule - 41 We will name the new variable within1sd, selecting a name which describes its contents. Type the formula as shown in the Numeric Expression text box. The formula will assign within1sd a value of 1 if the standard score the square transformation of literacy is greater than or equal to -1.0 and less than or equal to +1.0. If the value is not between -1.0 and +1.0, within1sd will be assigned a 0. Click on the OK button to create the new variable. Note: If re-expression was not needed, Zliteracy would be used in the formula instead of ZSQ_literacy.

50 2/24/2016Slide 50 The Normal Distribution and the Empirical Rule - 42 Scroll down in data view to see values of 0 and 1 for within1sd. When the standard scores for SQ_literacy are larger than 1.0 or smaller than -1.0, within1sd is assigned the value of 0. When the standard scores for SQ_literacy are greater than or equal to -1.0 or less than or equal to 1.0, within1sd is assigned the value of 1.

51 2/24/2016Slide 51 The Normal Distribution and the Empirical Rule - 43 To find the percentage of cases that have a standard score between -1.0 and +1.0 (within1sd = 1), we will run a frequency distribution on within1sd. To create the frequency distribution, select Descriptive Statistics > Frequencies from the Analyze menu.

52 2/24/2016Slide 52 The Normal Distribution and the Empirical Rule - 44 First, move the variable within1sd to the Variable(s) list box. Second, click on the OK button to produce the output.

53 2/24/2016Slide 53 The Normal Distribution and the Empirical Rule - 45 61.1% of the cases fall within one standard deviation of the mean. In this example, the Empirical Rule percentage of 68% overstates the percentage within one standard deviation.

54 2/24/2016Slide 54 The Normal Distribution and the Empirical Rule - 46 The 61.1 percent of cases actually within 1 standard deviation of the mean is entered in the problem statement. The second sentence focuses on the percentage of cases within two standard deviations. Note: if re-expression did not result in a nearly normal distribution, the answer to this question would be na.

55 2/24/2016Slide 55 The Normal Distribution and the Empirical Rule - 47 We will create a second new variable that will have a value of 1 if the standard score is within 2 standard deviations of the mean, and 0 if it has a value outside this range. To compute the new variable, select the Compute Variable command from the Recall Dialog pop-up menu.

56 2/24/2016Slide 56 The Normal Distribution and the Empirical Rule - 48 Replace the variable name “within1sd” with the name “within2sd”. Replace the criteria of - 1.0 with -2.0 and replace +1.0 with +2.0. Note: If re-expression was not needed, Zliteracy would be used in the formula instead of ZSQ_literacy.

57 2/24/2016Slide 57 The Normal Distribution and the Empirical Rule - 49 Scroll down in data view to see values of 0 and 1 for within2sd. When the standard scores for SQ_literacy are less than -2.0 (or larger than +2.0), within2sd is assigned the value of 0. When the standard scores for SQ_literacy are between -2.0 and +2.0, within2sd is assigned the value of 1.

58 2/24/2016Slide 58 The Normal Distribution and the Empirical Rule - 50 We will request a second frequency distribution to tally within2sd. To request the frequency distribution, select the Frequencies command from the Recall Dialog pop-up menu.

59 2/24/2016Slide 59 The Normal Distribution and the Empirical Rule - 51 First, remove the variable within1sd from the Variable(s) list box and move the variable within2sd into the list box. Second, click on the OK button to produce the output.

60 2/24/2016Slide 60 The Normal Distribution and the Empirical Rule - 52 96.2% of the cases fall within two standard deviations of the mean, compared to the 95% predicted by the empirical rule.

61 2/24/2016Slide 61 The Normal Distribution and the Empirical Rule - 53 Click on the Submit button to grade the problem. The 96.2 percent of cases actually within 2 standard deviations of the mean is entered in the final blank in the problem statement.

62 2/24/2016Slide 62 The Normal Distribution and the Empirical Rule - 54 The green shading on the answers indicates that all were correct.


Download ppt "2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe."

Similar presentations


Ads by Google