Presentation is loading. Please wait.

Presentation is loading. Please wait.

Numerical descriptions of distributions

Similar presentations


Presentation on theme: "Numerical descriptions of distributions"— Presentation transcript:

1 Numerical descriptions of distributions
Describe the shape, center, and spread of a distribution… for shape, see slide #6 below... Center: mean and median Spread: range, IQR, standard deviation We treat these as aids to understanding the distribution of the variable at hand… We'll start with the mean: The mean is often called the "average" and is in fact the arithmetic average ("add all the values and divide by the number of observations").

2 Mathematical notation:
w o ma n ( i ) h ei gh t x = 1 5 8 . 2 14 6 4 9 15 3 7 16 17 18 19 20 21 22 10 23 11 24 12 25 13 S Learn right away how to get the mean with calculators & JMP

3 Your numerical summary must be meaningful!
Height of 25 women in a class The distribution of women’s heights appears symmetrical. The mean is a good numerical summary. Here the shape of the distribution is wildly irregular. Why? Could we have more than one plant species or phenotype?

4 A single numerical summary here would not make sense.
A single numerical summary here would not make sense.

5 The Median (M) is often called the "middle" value and is the value at the midpoint of the observations when they are ranked from smallest to largest value…. arrange the data from smallest to largest if n is odd then the median is the single observation in the center (at the (n+1)/2 position in the ordering) if n is even then the median is the average of the two middle observations (at the (n+1)/2 position; i.e., in between…) In Table 1.10 (1.2,1/11), calculate the mean and median for the 2-seater cars' city m.p.g. to see that the mean is more sensitive to outliers than the median… use JMP-get data from the eBook…

6 Skewness SYMMETRIC SKEWED LEFT SKEWED RIGHT (negatively) (positively)
Mode = Mean = Median SYMMETRIC Mean Mode Mode Mean Median Median SKEWED LEFT (negatively) SKEWED RIGHT (positively)

7 Mean and median of a distribution with outliers
Without the outliers With the outliers Percent of people dying The median, on the other hand, is only slightly pulled to the right by the outliers (from 3.4 to 3.6). The mean is pulled to the right a lot by the outliers (from 3.4 to 4.2).

8 Mean and median of a symmetric … and a right-skewed distribution
Impact of skewed data Disease X: Mean and median are the same. Mean and median of a symmetric Multiple myeloma: … and a right-skewed distribution The mean is pulled toward the direction of the skew.

9 Spread: percentiles, quartiles (Q1 and Q3), IQR,
5-number summary (and boxplots), range, standard deviation pth percentile of a variable is a data value such that p% of the values of the variable are less than or equal to it. the lower (Q1) and upper (Q3) quartiles are special percentiles dividing the data into quarters (fourths). get them by finding the medians of the lower and upper halfs of the data IQR = interquartile range = Q3 - Q1 = spread of the middle 50% of the data. IQR is used with the so-called 1.5*IQR criterion for outliers - know this!

10 Measure of spread: the quartiles
The first quartile, Q1, is the value in the sample that has 25% of the data less than or equal to it ( it is the median of the lower half of the sorted data, excluding M). The third quartile, Q3, is the value in the sample that has 75% of the data less than or equal to it ( it is the median of the upper half of the sorted data, excluding M). Q1= first quartile = 2.2 M = median = 3.4 Q3= third quartile = 4.35

11 Five-number summary and boxplot
Largest = max = 6.1 BOXPLOT Q3= third quartile = 4.35 M = median = 3.4 Q1= first quartile = 2.2 Five-number summary: min Q1 M Q3 max Smallest = min = 0.6

12 Boxplots for skewed data
Comparing box plots for a normal and a right-skewed distribution Boxplots remain true to the data and depict clearly symmetry or skew.

13 5-number summary: min. , Q1, median, Q3, max
when plotted, the 5-number summary is a boxplot we can also do a modified boxplot to show outliers (mild and extreme). Boxplots have less detail than histograms and are often used for comparing distributions… e.g., Fig. 1.19, p.37 and below... Figure 1.19 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

14 Distance to Q3 7.9 − 4.35 = 3.55 Interquartile range Q3 – Q1
8 Distance to Q3 7.9 − 4.35 = 3.55 Q3 = 4.35 Interquartile range Q3 – Q1 4.35 − 2.2 = 2.15 Q1 = 2.2 Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than years, 1.5 * IQR. Thus, individual #25 is an outlier by our 1.5 * IQR rule.

15 Definition, pg 40–41 © 2009 W.H. Freeman and Company
Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company

16 Be sure you know how to compute the standard
Look at Example 1.19 on page 41 (1.2, 8/11) – see Fig for a graph of deviations from the mean... metabolic rates for 7 men in a dieting study: 1792, 1666, 1362, 1614, 1460, 1867, Mean=1600 cals., s= calories. Figure 1.21 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company Be sure you know how to compute the standard deviation with JMP since it’s almost never done by hand with the previous page’s formula... Put the metabolic rates into a JMP table and analyze…

17 why do we square the deviations
why do we square the deviations? - two technical reasons that we'll see when we discuss the normal distribution in the next section… why do we use the standard deviation (s) instead of the variance (s2)? s2 has units which are the squares of the original units of the data… why do we divide by n-1 instead of n? n-1 is called the number of degrees of freedom; since the sum of the deviations is zero, the last deviation can always be found if we know n-1 of them … which measure of spread is best? 5-number summary is better than the mean and s.d. for skewed data - use mean & s.d. for symmetric data

18 What should you use, when, and why?
$$$ Arithmetic mean or median? Middletown is considering imposing an income tax on citizens. City hall wants a numerical summary of its citizens income to estimate the total tax base. In a study of standard of living of typical families in Middletown, a sociologist makes a numerical summary of family income in that city. Mean: Although income is likely to be right-skewed, the city government wants to know about the total tax base. Median: The sociologist is interested in a “typical” family and wants to lessen the impact of extreme incomes.

19 Finish reading section 1.2
Be sure to go over the Summary at the end of each section and know all the terminology Do # 1.56, , 1.67, 1.69, (Mean/Median Applet), 1.78, 1.79 use JMP for any problem requiring more than very simple computations…


Download ppt "Numerical descriptions of distributions"

Similar presentations


Ads by Google