Download presentation
Presentation is loading. Please wait.
Published byBarry Wheeler Modified over 8 years ago
2
Last chapter... Four Corners: Go to your corner based on if your birthday falls in the Winter, Spring, Summer, or Fall; 1 minute In your group, come to a consensus about the three most important topics we learned and list them on the board. 5 minutes.
3
Last chapter, we learned... Appropriate graphical representations (numerical & categorical data) Always graph the data; always. Always embed context. Always. Describing numerical distributions/data sets via SOCS (the basics; we will get more sophisticated with our descriptions soon); do we use SOCS to describe categorical data distributions? Why or why not?
4
SOCS... Shape, Outlier(s), Center, Spread We loosely defined ‘center’ and ‘spread’ Now we will be much more specific & detailed... And remember, always embed context Here we go...
5
Word association time... When I say a word, you immediately write down what you think it means; don’t think, just write. Don’t talk; don’t say anything to anyone. Ready?
6
Word association time... Average
7
Patrons in a diner... The annual salaries of 7 patrons in a diner are listed below. Find the mean and the median using Stat Crunch Are the mean and the median similar? Would they represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (let’s practice a histogram; then a box plot) using Stat Crunch. What shape is the distribution? $45,000$48,000 $52,000$40,000 $35,000$58,000 $46,000
8
Now, Bill Gates walks into the diner... Find the mean and the median using Stat Crunch Are the mean and the median similar? Would both or either represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (histogram; box plot) using Stat Crunch. What shape is the distribution? $45,000$48,000 $52,000$40,000 $35,000$58,000 $46,000$3,710,000,000
9
What’s the moral of this story? Means are excellent measures of central tendency if the data is (fairly) symmetric However, means are highly influenced by outlier(s) So, if the data has an outlier(s), then a better measure of central tendency is the median, which is not influenced by outliers; this is called ‘resistant’ So, consider the shape of data/distribution, then wisely choose an appropriate measure of central tendency
10
Which measure of central tendency should we use?.
11
Which is larger: mean or median? Which should we use to describe the ‘typical’ or middle value?
12
The ‘C’ in SOCS So, when we are analyzing a numerical distribution (like looking at a histogram, stem plot, box plot, etc.), we need to wisely choose which ‘C’ to use... mean or median Generally, if symmetric use mean (or median) as a measure of central tendency; they will be similar in value (or the same) If skewed (left or right) use median as a measure of central tendency; why?
13
Measures of Spread What is the median of each of the following data sets (you can use Stat Crunch if you need to): (4, 4, 5, 6, 6)(5, 5, 5, 5, 5) Are they the same distribution/data set? Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center)
14
Spread... The second ‘S’ in SOCS Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center) Two measures of spread that we will focus on in this course are the standard deviation & inter-quartile range
15
Standard Deviation is... a typical distance of the observations from their mean is a number that measures how far away the typical observation is from the center of the distribution
16
Let’s play the standard deviation game... Your team’s task: Create a data set of four whole numbers (from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) with the lowest standard deviation value possible Input your four numbers (again use numbers from 0 to 10 only) into Stat Crunch, then calculate the standard deviation Change a value or values until you get the lowest possible standard deviation you can. 3 minutes. Go. Now create a data set (again only from 0 to 10) with the largest possible standard deviation.
17
Which has the largest SD?
18
Calculating the standard deviation...
19
Variance... Another measure of spread Not used very often; usually, if we use a mean as a measure of central tendency, we use the standard deviation as our measure of spread Variance is related to standard deviation variance = (standard deviation) 2 standard deviation =
20
The Empirical Rule... When distributions are uni-modal, ≈ symmetric, & mean ≈ median, then... life is beautiful Distribution is said to be ≈ Normal 68% of data within 1 standard deviation of mean 95% of data within 2 standard deviations of mean 99.7% of data within 3 standard deviations of mean
21
68-95-99.7 Rule (Empirical Rule) For (≈)Normal Distributions Only
22
Empirical ‘Model’...
23
Let’s practice...What percentage of adult females have a height that is: as tall or taller than 64.5”? Is this typical? 64.5” or shorter? Is this unlikely? between 62” and 67”? How common is this? either shorter than 59.5” or taller than 69.5”? taller than 67”? Is this unlikely? between 62” and 64.5”? Between 57” and 59.5”?
24
More practice with the Empirical ‘Model’... The weight of a certain type of chocolate bar is Normally distributed with a mean weight of 8.1 ounces and a standard deviation of 0.1 ounces. Draw the density curve and label 1, 2, & 3 standard deviation values on it. What proportion of chocolate bars weigh: 1) between 8 ounces and 8.2 ounces? How likely is this? 2) more than than 8.3 ounces? Is this common? 3) less than 7.8 ounces? Is this weight expected? 4) either less than 7.9 ounces or more than 8.3 ounces?
25
Your turn... Suppose that the age of retirement in a country is Normally distributed with a mean of 64 years of age with a standard deviation of 3.5 years. 1.What would you consider a ‘common’ age range at which to retire? 2.What percentage of people retire at age 71? 3.What percentage of people retire either after age 74.5 or before age 53.5? 4.What percentage of people retire between the ages of 64 and 67.5? 5. How likely is it that someone will retire at the age of 74.5 or older?
26
Slight detour... We will get back to Empirical Rule in a few minutes... Is 120 big or small? Think – Pair - Share
27
TPS... Is 120 big or small? Big if... day’s temperature in LA in degrees Fahrenheit or # units a student takes during a semester (really big!) Small if... monthly rent paid for an apartment in LA Usual or ‘average’ if... weight in pounds for a 15-year-old girl or systolic blood pressure Nearly impossible to answer how unusual 120 is unless we know what we are comparing 120 to.
28
Something else to consider... A student’s ACT score was 25.9; their SAT score was 1172. Which is a better score? ACT scores’ (national) mean = 21, standard deviation 4.7 SAT (national) mean (critical reading & math) = 1010, standard deviation = 163
29
When we have a Normal distribution, then... z- Scores, standardizing... When we have a Normal distribution, we can calculate z-scores, or standardizing data, convert raw data into # of SD’s away from mean
30
Let’s practice with some of our heights...
31
Data gathering time again... # siblings you have on board & enter into Stat Crunch Numerical analysis (statistical summary in Stat Crunch) and graphical representation Describe the distribution
32
Skewed? Shouldn’t use mean & SD But we still need to describe the center and the spread of the distribution Use median and IQR (Inter-quartile Range) Median & IQR are not effected by outlier(s) (resistant) IQR = Q3 – Q1 IQR is amount of space the middle 50% of the data occupy
33
Range of data... Another measure of variability (used with any distribution) is range Range = maximum value – minimum value Range for our data =
34
Boxplots...based on 5-number summary
35
Boxplots...
36
Modified boxplot – shows outlier(s)
37
Two modified boxplots...
38
What are outliers? Boxplots are the only graphical representation where we specifically define an outlier Potential outliers are values that are more than 1.5 IQRs from Q1 or Q3 IQR x 1.5; add that product to Q3; any value(s) beyond that point is an outlier to the right Q1; any value(s) beyond that point is an outlier to the left
39
Go back to our Siblings data... Using Stat Crunch, calculate descriptive statistics Let’s calculate (by hand) to see if we have any outliers Q3 – Q1 = IQR IQR x 1.5; add this product to Q3; are there any values in our data set beyond this point to the right? IQR x 1.5; subtract product from Q1; are there any values in our data set beyond this point to the left? Now use Stat Crunch to create a boxplot; are our calculations confirmed with our boxplot?
40
Be careful with outliers... Are they really an outlier? Is your data correct? Was it input accurately? COC’s recent 99-year-old graduate Don’t automatically throw out an unusual piece of data; investigate
41
Be careful... one more thing...
42
Partner Practice...
43
Your turn... In pairs, choose a set of data from the Math 140 spreadsheet that is skewed (to left or right); you probably won’t know if the data is skewed until you copy and paste into Stat Crunch and create a graph Create a box plot; print out; put your names on it Label (on the graph) the 5-number summary (with arrows pointing to each value on the graph) Analyze through SOCS (which measure of central tendency should you use? Which measure of spread should you use?); be sure you show your work to justify that a point/points are outliers Now, using the same data, create a histogram. What characteristics of the data does the histogram show that the box plot does not?
44
Homework... Practice... 3.1, 3.3, 3.4, 3.11, 3.13, 3.14, 3.21, 3.28, 3.33, 3.34, 3.42, 3.49
45
Let’s talk about Exam #1...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.