Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers
Histograms Histogram is the most common graph of the distribution of a quantitative variable. Pie chart and bar graph are the common graphs of the distribution of a categorical variable.
Histograms Note: There is no space between bars. the obs from 85 to 95 number /percentage at this range implies number
Overall Pattern of a Distribution See if the distribution has a simple shape that you can describe in a few words. The center and the spread.
Histograms: center and the spread Histogram B Histogram A
Histograms: shape Symmetric: if the right and left sides of the histogram are approximately mirror images of each other.
Histograms: shape Skewed to the right: if the right side of the histogram extends much farther out than left side.
Histograms: shape Skewed to the left: if the left side of the histogram extends much farther out than right side.
Stemplot A stemplot (a.k.a. stem-and-leaf plot) is quicker to make and presents more detailed information.
Stemplot The max temperatures for the first 11 days this February at West Lafayette (I faked the number 19) Largest place value Next place to the right Keep this row even you don’t have any 20s Duplicates have to be labeled separately.
Boxplots: The median M is the midpoint of a distribution. Half the observation are smaller that M and the other half are larger. How to find the median: 1) Arrange all observations in order of size, from smallest to largest. 2) If the number of observations n is odd, the median M is the center observation in the ordered list. 3) If the number of observations n is even, the median M is the average of the two center observations in the ordered list.
Boxplots: The first quartile Q 1 is the median of the left subgroup. The third quartile Q 3 is the median of the right. The median divided the sequence into left/right subgroups.
Boxplots: median [ ] Q 1 = 10.5Q 3 = 26
Q1Q1 Boxplots (without Outliers): median Q3Q3 Minimum Maximum Without outliers 25% of the data
Outliers: The interquartile range (IQR) is the distance between first quartile Q 1 and third quartile Q 3. IQR = Q 3 – Q 1 Any data observation which lies more than 1.5*IQR lower than the first quartile or 1.5*IQR higher than the third quartile is considered an outlier. Median Q1Q1 Q3Q3 IQR 1.5*IQR
Modified Boxplots (with Outliers) With outliers Largest non-outlier point Minimum(since we don’t have any outliers
Center and Spread : We often use two indexes to measure the central tendency: 2) Mean/ average: sample mean: 1) Median
Center and Spread : We often use two indexes to measure the variability or “spread” : 1) Interquartile range (IQR) 2) Standard deviation (std dev): sample variance: sample std dev:
Center and Spread : The median, Q1, Q3 suffer less impact at the present of outliers. Mean and standard deviation have better numerical properties.
Center and Spread : The max temperatures for the first 10 days this February at West Lafayette. The researcher made a typo when he recorded the value 49. Before: After: BeforeAfter Median40 Q136 Q34955 BeforeAfter Mean Std Dev