Describing Quantitative Data with Numbers Part 2
Five-Number Summary The five number summary consists of the minimum value, the first quartile, the median, the third quartile and the maximum value in the data set. It is written in order: min, Q1, M, Q3, max It essentially divides the data up into fourths. 25% of the data values lie in the intervals between each number.
Boxplots (Box and Whisker) A graphical representation of the 5 Number Summary A central box is drawn from the 1st Quartile to the 3rd Quartile A line in the box represents the Median Lines (called whiskers) extend from the box out to the lowest and highest observations that are NOT outliers Dots beyond the whiskers indicate observations that are outliers if any exist (1.5x IQR Rule) Boxplots can be oriented horizontally or vertically The axis parallel to the boxplot should by labeled and scaled The axis perpendicular may be labeled with a description of the variable
Number of Home Runs Hit by Barry Bonds
Measure of Spread: Standard Deviation Measures spread by finding the average distance of the values from the MEAN. The VARIANCE is the average of the distances squared. They are squared so negative differences (left of mean) don’t cancel out positive differences (right of mean) The square root of the variance is the standard deviation The square of the standard deviation is the variance Variance is in SQUARED units Standard deviation is in the same units as the data.
Formula for Variance and St. Dev,
Properties of Standard Deviation It measure the spread about the mean and is only used with the mean It is always greater than or equal to 0 It has the same units as the data It is highly vulnerable to extreme values (outliers) and in not a good estimator of spread for skewed data.