Exploratory Data Analysis The process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate the data sets in order to understand their important characteristics page 94 of text
Outliers a value located very far away from almost all of the other values an extreme value can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured Outliers can have a dramatic effect on the certain descriptive statistics (mean and standard deviation). It is important to be aware of such data values.
Boxplots (Box-and-Whisker Diagram) Reveals the: center of the data spread of the data distribution of the data presence of outliers Excellent for comparing two or more data sets page 96 of text
Boxplots 5 - number summary Minimum first quartile Q1 Median (Q2) third quartile Q3 Maximum Medians and quartiles are not very sensitive to extreme values. Refer to Section 2-6 for finding the Q1 and Q3 values, and Section 2-4 for finding the median.
Boxplots 2 4 6 14 It appears that the Qwerty data set has a skewed right distribution page 97 of text 2 4 6 8 10 12 14 Figure 2-18 Boxplot of Qwerty Word Ratings
Bell-Shaped Figure 2-19 Boxplots A bell-shaped boxplot should have approximately the same distance from minimum to Q1 as Q3 to maximum. Also the same distance should also occur from Q2 to the median as from the median to Q3. Bell-Shaped
Bell-Shaped Uniform Figure 2-19 Boxplots A uniform boxplot should have approximately the same distance from minimum to Q1, Q1 to Q3, Q3 to maximum. Bell-Shaped Uniform
Figure 2-19 Boxplots Bell-Shaped Uniform Skewed
Exploring Measures of center: mean, median, and mode Measures of variation: Standard deviation and range Measures of spread and relative location: minimum values, maximum value, and quartiles Unusual values: outliers Distribution: histograms, stem-leaf plots, and boxplots page 97 of text