Section 4.1: Describing the Center of a Data Set
Two most popular measures of center are the mean and the median We will look at them separately, then compare the two.
Mean – the average (sometimes called the sample mean) Sample mean – denoted by:
Example: Range of Motion After Knee Surgery Traumatic knee dislocation often requires surgery to repair ruptured ligaments. One measure of recovery is range of motion. The article “Reconstruction of the Anterior and Posterior Cruciate Ligaments After Knee Dislocation” reported the following postsurgical range of motion for a sample of 13 patients:
Range of Motion (degrees) X 1 = 154 x 2 = 142 x 3 = 137 x 4 = 133 x 5 = 122 X 6 = 126 x 7 = 135 x 8 = 135 x 9 = 108 x 10 = 120 X 11 = 127 x 12 = 134 x 13 = 122
Population mean – denoted by μ is the average of all x values in the entire populaton.
Example: County Population Sizes The 50 states plus the District of Columbia contain 3137 counties. Let x denote the number of residents of a country. Then there are 3137 values of the variable x in the population. The sum of these 3137 values is 248,709,873 (1990 census), so the population average value of x is:
One potential drawback to the mean as a measure of center is an outlier. Outlier – an unusually large or small observation in the data set
Example: Number of Visits to a class website Forty students were enrolled in a section of STAT 130, a general education course in statistical reasoning. One month after the course began, the instructor requested a report that indicated how many times each student had accessed a web page on the class site. The 40 observations were:
The sample mean for the data set is 23.10
Median – the middle value in the list Sample median – obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list). –The single middle value if n is odd –The average of the middle two values if n is even
Example: Website data revisited The sample size for the website access data was n = 40, an even number. The median is the average of the 20 th and 21 st values (arrange the data in order from least to greatest)
The median can now be determined: Median = = 13 2 This value appears to be more typical than 23.1
Population median – the middle value of the ordered list consisting of all population observations.
Comparing Mean and Median Symmetric – mean = median Longer upper tailed (positive skew) – mean is greater than the median Negatively skewed – mean is smaller than the median
Sample Proportion of Success Where s is the label used for the response designated as success
Example: Tampering with Automobile Antipollution Equipment The use of antipollution equipment on automobiles has substantially improved air quality in certain areas. Unfortunately, many car owners have tampered with smog control devices to improve performance. Suppose that a sample of n=15 cars is selected and that each car is classified as S or F, according to whether or not tampering has taken place. The resulting data are:
S F S S S F F S S F S S S F F This sample contains nine S’s so: p = 9 = That is 60% of the sample responses are S’s.
Population proportion of S’s = π (not 3.14) Trimmed mean – computed by first ordering the data values from smallest to largest, then deleting a selected number of values from each end of the ordered list, and finally averaging the remaining values. Trimming Percentage – Is the percentage of values deleted from each end of the ordered list
Example: Alcohol Exposure Alcohol Exposure in seconds Let’s trim 10% off the mean. You will take away the three smallest and three largest numbers
New data values are: We deleted three zeros, 76, 123, and 414 The 10% trimmed mean is 18