Chapter 3: Data Description
Chapter 2 Organize Data into Charts/Graphs Chapter 3 Take those charts and graphs and come up with a summary: Mean, Median, Mode, and Mid-range, etc. Measures of Central tendency Measures of Variation (range, variance and standard deviation) Measures of Position (percentiles, quartiles and deciles)
3-1: Measures of Central Tendency Summarizing data using ‘middle’ values: Mean, Median, Mode and Mid-Range
What do we mean by the AVERAGE? Think on your own first and jot down a couple of ideas Now, let’s discuss Why are there so many different options?
Statistic: a value obtained by using a SAMPLE Parameter: a value obtained by using a POPULATION Symbols for these to come in the on the next slide
MEAN: aka arithmetic average or average Symbol 𝑋 −𝑠𝑎𝑚𝑝𝑙𝑒 μ - population Sum of all the values divided by the total number of values (n: sample N: population)
MEAN: (continued) Rounding rule: round to one more decimal place than what occurs in the raw data. Ex…if all data values are to the tenths, then the mean should be rounded to the hundredth See p. 112, ex. #1 and 2
MEDIAN: is the halfway point in the data set symbol: MD Arrange the data in order from smallest to largest and find the middle What do you do if there are an even # of data values? Take the 2 middle values and average them together
MODE: the values that occurs most often. There may be no mode, one mode (unimodal), two modes (bimodal), or many modes (multimodal). No mode: when NO value occurs more often. We do not say it is 0.
MIDRANGE: Find the highest and the lowest values and divide by 2. Symbol: MR
EXAMPLE: these values represent the # of short-term parking spaces at 15 different airports. Let’s find the Mean, Median, Mode and Midrange 750 3400 1962 700 203 900 8662 260 1479 5905 9239 690 9822 2516
Mean: 3145.9 Median: 1479 Mode: 700 Midrange: 5012.5 Show calculator method
WEIGHTED MEAN: multiply each value by its corresponding weight and sum, then divide by the sum of the weights. Ex…Grades in college classes
Grouped Frequency Tables On calculator (otherwise LOTS of work) Find the midpoint of each class That’s what goes in the first column Second column: enter in the frequency values Stat →Calc→1-Variable Stats(leave list 1 and list 2 alone) →Calculate Let’s try one and evaluate the results (p.124: 13)
3-2: Measures of Variation We all need to know more than JUST the mean
Two experimental brands of outdoor paint are tested to see how long each will last before fading. Six cans of each brand constitute a small population. The results (in months) are shown. Find the mean and range of each group.
The average for both brands is the same, but the range for Brand A is much greater than the range for Brand B. Think about…Which brand would you buy? Why? I would buy… Look at the graph on p. 128
Range, Variance and Standard Deviation are used to help us make better decisions about our data RANGE: highest value - lowest value How can that be helpful to us? The larger the range the larger the spread of the values…
Range is useful, but can’t tell us everything… Variance and standard deviation are usually MORE helpful They are based on the distance EACH value is from the MEAN
To determine the spread of the data Uses of the Variance and Standard Deviation To determine the spread of the data To determine the consistency of a variable Let’s try one by hand because…
Let’s go back to our paint problem.. Let’s use column 2 Mean Value – Mean (Value – Mean)2 Population symbols: Var = sum of the last column: σ2 = Standard Deviation = SD = σ2
Rounding Rule: same as for mean – one more decimal place past what is given in the data. Why do we square the Value – Mean column… Symbols for Population: Symbols for Sample:
How to do it on the calculator… If it is one list of data – use 1 VAR stats - If it is two lists of data and you want them separately – use 1 VAR stats(L2) Frequency table is the only time (so far) where you use 2 VAR stats
Showed grouped on calculator… do #20 p. 144 together Coefficient of Variation Used when you want to compare 2 different sets of data with different units (all you need is mean & SD) Cvar = 𝑠 𝑋 100 −𝑠𝑎𝑚𝑝𝑙𝑒𝑠 = σ μ 100 − 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠 4th hour: 𝑋 = 6.8 and 1.2 7th hour: 𝑋 = 7.2 , S = 1.5 , 1.2 6.8 ∗ 100 =17.6% 1.5 7.2 ∗ 100 =20.8% 7th hour had more variation than 4th
Range rule of thumb: used to ESTIMATE the standard deviation s ≈ 𝑟𝑎𝑛𝑔𝑒 4 (this is ONLY an approximation) Chebyshev’s Theorem: The proportion of values from a data set that will fall within k standard deviations of the mean will be at least 1 - 1 k2 , where k is a number greater than 1. (What does this mean?)
At least ¾ of the data values will fall within 2 standard deviations of the mean. OR: 𝑋 ± 2(SD) = the upper/lower boundary that holds 2 standard deviations Let’s look at an ex...
The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10,000. Find the price range for which at least 75% of the houses will sell. Chebyshev’s Theorem states that at least 75% of a data set will fall within 2 standard deviations of the mean. 50,000 – 2(10,000) = 30,000 50,000 + 2(10,000) = 70,000 OR….sometimes you have to find k first
𝑘= 𝑣𝑎𝑙𝑢𝑒 −𝑚𝑒𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 And then use that in 1 - 1 k2 ….look at example on p.141
Empirical Rule: If the data is bell shaped then it follows the given pattern.
3-3: Measures of Position -Standard Scores (z – scores) and Percentiles
Used to locate RELATIVE position within a data set In your past: doctor visits when you were young or Iowa Assessment Scores Median = 50th percentile (not the same as %)
Z – scores (Standard Scores) Heard: can’t compare apples to oranges, but with this you can…sort of Z – score = 𝑣𝑎𝑙𝑢𝑒 −𝑚𝑒𝑎𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (have you seen this before?) EX: You scored a 78 on a test in English that a mean = 70 with a sd = 3.5. In Science you scored a 85 on a test that had a mean of 80 with a sd = 2. Compare your relative position within each class. Z-score (English) = 2.29 Z – score (Science) = 2.5 Comparatively, you scored better in your science class.
If you turn ALL the data in to z-scores, then you have ‘adjusted’ the mean = 0 & the sd = 1. Therefore, the z-score = the # of sd’s the value is from the mean (Chebyshev’s thm) Very useful to us in the future Percentiles: measures of position most often used in the education and health care fields Helps compare an individual to a group Divides the data set in to 100 equal parts
Percentile graphs can be constructed, but we aren’t going to do that b/c computers can do that for us We do need to be able to find percentile values though – use the formula below
Step 1: Arrange the data in order from smallest to largest A teacher gives a 20-point test to 10 students. Find the percentile rank of a score of 12. 18, 15, 12, 6, 8, 2, 3, 5, 20, 10 Step 1: Arrange the data in order from smallest to largest Step 2: Plug into the formula A student whose score was 12 did better than 65% of the class. = 65th percentile
The value 5 corresponds to the 25th percentile. What about if we need to go backwards? A teacher gives a 20-point test to 10 students. Find the value corresponding to the 25th percentile. 18, 15, 12, 6, 8, 2, 3, 5, 20, 10 Step 1: Order the data from smallest to largest Step 2: Use the formula… Step 3: Always round up…this is the position in your data, not the ANSWER The value 5 corresponds to the 25th percentile.
Percentiles/Deciles/Quartiles – Relationship P1 – P100 D1 - D10 = Q1 – Q4 = Interquartile Range = Q3 – Q1 Identifying Outliers: Anything outside of the ±IQR(1.5) Add to Q3 Subtract from Q1 Anything outside of that will be considered an outlier
3-4: Exploratory Data Analysis: BOXPLOTS or Stem and Leaf Plots Already looked at Stem and Leaf Boxplots (review) 5 important values Minimum Q1 Median Q3 Maximum
Boxplot – aka Box and Whisker Plot It looks like a box with whiskers (segments) attached to each end of the box Let’s draw one…