Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s yearly homerun production: 8 13 14 16 23 26 28 33 39.

Similar presentations


Presentation on theme: "1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s yearly homerun production: 8 13 14 16 23 26 28 33 39."— Presentation transcript:

1 1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s yearly homerun production: 8 13 14 16 23 26 28 33 39 61

2

3 Mean/Mean…(Centers) Both measure center in different ways, but both are useful. Use median if you want a “typical” number. Mean = “Arithmetic Average Value” Mean/Median of a symmetric distribution are close together. If a distribution is exactly symmetric, mean = median. In a skewed distribution, the mean is farther out in the long tail than the median.

4 Measures of Spread Range = Largest – Smallest Observations in a list. What’s the problem with this? Better measure of spread: Quartiles. Range Quartiles 5 # Summary Variance Standard Deviation

5

6 Male/Female Surgeons (# of hysterectomies performed) Put in ascending order (male dr.s): odd # 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 Min Q1 M Q3 Max Put in ascending order (female dr.s): even # 5 7 10 14 18 19 25 29 31 33 Min Q1 M = 18.5 Q3 Max

7

8 Boxplots You can instantly see that female dr.’s perform less hysterectomies than male doctors. Also, there is less variation among female doctors.

9 Notes on boxplots Best used for side-by-side comparisons of more than 1 distribution. Less detail than histograms or stem plots. Always include the numerical scale...\Simulations\Hotdog Data.xls

10 Travel Times to Work #1 How long does it take you to get from home to school? Here are the travel times from home to work in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau: 30201040252010601540 530121010

11 The distribution… Describe Is the longest travel time (60 minutes) an outlier? How many of the travel times are larger than the mean? If you leave out the large time, how does that change the mean? The mean in this example is nonresistant because it is sensitive to the influence of extreme observations. The mean is the arithmetic average, but it may not be a “typical“number!

12 Travel Times to Work #2 Travel times to work in New York State are (on the average) longer than in North Carolina. Here are the travel times in minutes of 20 randomly chosen New York workers: 103052540201015 30201520851565 1560604045

13 Interquartile Range (IQR) Measures the spread of the middle ½ of the data. An observation is an outlier if:  Less than Q1 – 1.5(IQR) or  Greater than Q3 + 1.5(IQR)

14 Looking at the spread….  Quartiles show spread of middle ½ of data  Spacing of the quartiles and extremes about the median give an indication of the symmetry or skewness of the distribution. Symmetric distributions:1 st /3 rd quartiles equally distant from the median. In right-skewed distributions: 3 rd quartile will be farther above the median than the 1 st quartile is below it.

15 Is there a difference between the number of programmed telephone numbers in girls’ cell phones and the number of programmed numbers in boys’ cell phones? Do you think there is a difference? If so, in what direction? 1) Count the number of programmed telephone numbers in your cell phone and write the total on a piece of paper. 2) Make a back-to-back stemplot of this information, then draw boxplots. When you test for outliers, how many do you find for males and how many do you find for females using the 1.5 X IQR test? 3) Find the 5# Summary for each group. Compare the two distributions (SOCS!). 4) It is important in any study that you have “data integrity” (the data is reported accurately and truthfully). Do you think this is the case here? Do you see any suspicious observations? Can you think of any reason someone may make up a response or stretch the truth? If you DO see a difference between the two groups, can you suggest a possible reason for this difference? 5) Do you think a study of cell phone programmed numbers for a sophomore algebra class would yield similar results? Why or why not?

16 Spring ’09 Student Data Girls: 5345724136222106237 7529615427570134 Boys: 298 65819535141247 6017633

17 Standard Deviation: A measure of spread Standard deviation looks at how far observations are from their mean. It’s the natural measure of spread for the Normal distribution We like s instead of s-squared (variance) since the units of measurement are easier to work with (original scale) S is the average of the squares of the deviations of the observations from their mean.

18 S, like the mean, is strongly influenced by extreme observations. A few outliers can make s very large. Skewed distributions with a few observations in the single long tail = large s. (S is therefore not very helpful in this case) As the observations become more spread about the mean, s gets larger.

19 Mean vs. Median Standard Deviation vs. 5-Number Summary The mean and standard deviation are more common than the median and the five number summary as a measure of center and spread. No single # describes the spread well. Remember: A graph gives the best overall picture of a distribution. ALWAYS PLOT YOUR DATA! The choice of mean/median depends upon the shape of the distribution.  When dealing with a skewed distribution, use the median and the 5# summary.  When dealing with reasonably symmetric distributions, use the mean and standard deviation.

20 The variance and standard deviation are… LARGE if observations are widely spread about the mean SMALL if observations are close to the mean

21 Degrees of Freedom (n-1) Definition: the number of independent pieces of information that are included in your measurement. Calculated from the size of the sample. They are a measure of the amount of information from the sample data that has been used up. Every time a statistic is calculated from a sample, one degree of freedom is used up. If the mean of 4 numbers is 250, we have degrees of freedom  (4-1) = 3. Why? ____ ____ ____ ____ mean = 250 If we freely choose numbers for the first 3 blanks, the 4 th number HAS to be a certain number in order to obtain the mean of 250.

22 A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic rates of 7 men who took part in a study of dieting: 1792 1666 1362 1614 1460 1867 1439 Find the mean Column 1: Observations (x) Column 2: Deviations Column 3: Squared deviations (TI-83: STAT/Calc/1-var-Stats L1 after entering list into L1)

23

24 You do! (By Hand) Let X = What is the variance and standard deviation?

25 You do! (using 1 Var Stats) During the years 1929-1939 of the Great Depression, the weekly average hours worked in manufacturing jobs were 45, 43, 41, 39, 39, 35, 37, 40, 39, 36, and 37. What is the variance and standard deviation?

26

27 Miami Heat Salaries 1) Suppose that each member receives a $100,000 bonus. How will this effect the center, shape, and spread? 2) Suppose that each player is offered 10% increase in base salary. What happened to the centers and spread? PlayerSalary Shaq27.7 Eddie Jones13.46 Wade2.83 Jones2.5 Doleac2.4 Butler1.2 Wright1.15 Woods1.13 Laettner1.10 Smith1.10 Anderson.87 Dooling.75 Wang.75 Haslem.62 Mourning.33


Download ppt "1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s yearly homerun production: 8 13 14 16 23 26 28 33 39."

Similar presentations


Ads by Google