Slide 1 Statistics Workshop Tutorial 6 Measures of Relative Standing Exploratory Data Analysis
Slide 2 Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-6 Measures of Relative Standing
Slide 3 Copyright © 2004 Pearson Education, Inc. z Score (or standard score) the number of standard deviations that a given value x is above or below the mean. Definition
Slide 4 Copyright © 2004 Pearson Education, Inc. SamplePopulation x - µ z = Round to 2 decimal places Measures of Position z score z = x - x s
Slide 5 Copyright © 2004 Pearson Education, Inc. Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 sd Unusual Values:z score 2 sd FIGURE 2-14
Slide 6 Copyright © 2004 Pearson Education, Inc. Definition Q 1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q 2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q 1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
Slide 7 Copyright © 2004 Pearson Education, Inc. Q 1, Q 2, Q 3 divides ranked scores into four equal parts Quartiles 25% Q3Q3 Q2Q2 Q1Q1 (minimum)(maximum) (median)
Slide 8 Copyright © 2004 Pearson Education, Inc. Percentiles Just as there are quartiles separating data into four parts, there are 99 percentiles denoted P 1, P 2,... P 99, which partition the data into 100 groups.
Slide 9 Copyright © 2004 Pearson Education, Inc. Finding the Percentile of a Given Score Percentile of value x = 100 number of values less than x total number of values
From Percentile to Data Value What score is at the kth percentile? (1)Rank the data from lowest to highest (2)Find L (locator) L = k% * n a) If L is not a whole number, round up and find the score in that position b) If L is a whole #, find the average of the scores in positions L and L+1
Slide 11 Copyright © 2004 Pearson Education, Inc. Interquartile Range (or IQR): Q 3 - Q 1 Percentile Range: P 90 - P 10 Semi-interquartile Range: 2 Q 3 - Q 1 Midquartile: 2 Q 3 + Q 1 Some Other Statistics
Slide 13 Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-7 Exploratory Data Analysis (EDA)
Slide 14 Copyright © 2004 Pearson Education, Inc. Exploratory Data Analysis is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics Definition
Outliers An outlier is a very high or very low value that stand apart from the rest of the data They may be from data collection errors, data entry errors, or simply valid but unusual data values. Always identify and examine outliers to determine if they are in error
Slide 16 Copyright © 2004 Pearson Education, Inc. Important Principles An outlier can have a dramatic effect on the mean An outlier have a dramatic effect on the standard deviation An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured
Slide 17 Copyright © 2004 Pearson Education, Inc. For a set of data, the 5-number summary consists of the minimum value; the first quartile Q 1 ; the median (or second quartile Q 2 ); the third quartile, Q 3 ; and the maximum value A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q 1 ; the median; and the third quartile, Q 3 Definitions
Slide 18 Copyright © 2004 Pearson Education, Inc. Boxplots Figure 2-16
Outliers A data point is considered an outlier if it is 1.5 times the interquartile range above the 75 th percentile or 1.5 times the interquartile range below the 25 th percentile In other words, outliers are numbers outside the interval [Q1-1.5*IQR, Q3+1.5*IQR]
Box Plots and Histograms When looking at one variable, it’s a good idea to look at the box plot and histogram together Box plots complement histograms by providing more specific information about the center, the quartiles, and outliers
Slide 21 Copyright © 2004 Pearson Education, Inc. Figure 2-17 Boxplots
Shape, Center and Spread What should you tell about a quantitative variable? Always report the shape, center and spread If the distribution is skewed, report the median and IQR In a symmetric distribution, report the mean and standard deviation If there are any clear outliers and you are reporting the mean and the standard deviation, report them with the outliers and without them
Slide 23 Now we are ready for Part 21 of Day 1