Statistics 200 Lecture #5 Tuesday, September 6, 2016 Textbook: Sections 2.7 through 3.2 Objectives: • Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines: – slope interpretation – y-intercept interpretation – dangerous to extrapolate
Standardized z-scores Tells us how many standard deviations an observation is from the mean. A useful measure of the relative value of any observation in a dataset Allows comparison of observations in different data sets.
Standardized z-scores Z-scores correspond directly to the Empirical Rule. About 68% of values have z-scores between __ and __. About 95% of values have z-scores between __ and __. About 99.7% of values have z-scores –1 1 –2 2 –3 3
Example 1 What is the z-score and interpretation in the following situation? Obs = 3, mean = 4, SD = 0.5 Z-score = (observation – mean)/SD = (3 – 4) / 0.5 = –1 / 0.5 = –2 Interpretation: The observation of 3 is 2 standard deviations below the mean.
Example 2 What is the z-score and interpretation in the following situation? Obs = 200, mean=150, SD = 20 Z-score = (observation – mean)/SD = (200-150)/20 = 50/20 = 2.5 Interpretation: The observation 200 is 2.5 standard deviations above the mean.
More complicated example: which person has a more unusual height? Me: a 53” tall woman My husband: a 73” tall man Women’s heights are normal with mean 54” and std. dev. 3”. Men’s heights are normal with mean 70” and std. dev. 3” These heights come from different distributions, so we cannot compare them directly. We need a tool to make them comparable… Z-score!
Calculate Z-scores for both: Me: Z-score = (obs – mean)/(std. dev) = (53 – 54) / (3) = -1/3 = -0.33 Husband: Z-score = (obs – mean) / (std. dev) = (73 – 70) / 3 = 3 / 3 = 1
Compare Z-scores – draw them below Me Husband
Compare Z-scores below .33 Me: ____ std. dev. _____ the mean Conclusion: My husband’s height is more unusual than mine, because it is more std. dev. from the mean. below .33 Me: ____ std. dev. _____ the mean Husband: ____ std. dev. _____ the mean 1 above
So far… We have talked about quantitative variables, but only one at a time. Now we’re going to begin looking at the relationships between two different quantitative variables. Start with looking at a Scatterplot
Scatterplots: A scatterplot is a two-dimensional graph of two numeric variables. There are two axes on a scatterplot, the vertical axis (y-axis) and the horizontal axis (x-axis). The y-axis is assigned to the response variable The x-axis is assigned to the explanatory variable.
Example 1: Apartment size and rent Two Variables: size of one-bed-room apartment (square feet) monthly rent ($) Size (Square Ft) Rent ($) 415 438 485 636 548 666 646 545 690 688 538 469 1000 833 1003 1089 1150 1181 1237 1225 1469 1501 1177 958
What is the average pattern? What is the direction of the pattern? A positive, linear association Response / dependent / y variable Explanatory / independent / x variable
Linear versus curvilinear Linear relationship a relationship that, on average, will follow a line Curvilinear or nonlinear relationship a relationship that, on average, will follow a curve
Association : a term used to describe direction of the pattern shown by the two variables. A positive association occurs when the values of one variable tend to _________as the values of the other variable increase. A negative association occurs when the values of one variable tend to _________ as the values of the other variable increase. increase decrease
Outliers unusual combination When we consider two variables, an outlier is a point with an _________________ of values. May be unusual and interesting data points, or may be errors. unusual combination
Example – Tornado Activity Variables: year number of tornadoes (Jan – May) Unusually high observations that don’t follow trend of other observation Source: National Weather Service
Formalize the trend: Regression lines Regression line: a straight line that describes how values of the response variables (y) are related, on average, to values of the explanatory variable (x). We can use the regression line to… Estimate average value of y at a specified value of x Predict the unknown value of y for an individual using that individual’s x value.
Specify Linear Relationships with Simple Linear Regression Model used to find the best straight line to fit the data points Name of Procedure: ___________ Squares Least Square Model: smallest ________ of the __________ differences found with all possible lines Least sum squared
The regression equation In math average value of y In statistics y-intercept slope
In a picture:
Example : Positive Linear Relationship between meal bill ($) and amount of tip ($) data from a restaurant r = 0.830 & n = 10 bills
Example: Tip example Question: Use the amount of bill ($) to estimate the amount of tip left ($), on the average? Identify the Variables: Bill ($): response explanatory Tip ($): response explanatory Note: explanatory variable is also called the predictor variable
To fit a regression line in Minitab: Stat > Regression > Fitted Line Plot correctly identify explanatory variable and response straight line: simple linear regression
Least Squares Regression Equation The regression equation is Tip = -0.60 + 0.190 Bill sample y-intercept (bo) sample slope (b1)
Slope Interpretation 1 increase 19 Tip = -0.60 + 0.19 Bill tip tip For each additional ___ $ found on the bill, you can expect the tip to ____________ by ___ cents, on the average 1 increase 19
Y-intercept Interpretation Tip = -0.60 + 0.19 Bill bo = -$0.60 In theory it says: When you have no bill, you can expect a tip to be ________ So does the y-intercept have a logical interpretation in the context of this problem? -$0.60 No: we have no data for bill = 0
Estimation & Limitations Question: If the bill is $30, estimate the average amount left for a tip? Tip = -0.60 + 0.19 Bill 30 Tip = -0.60 + 0.19×(_____) $5.1 Tip = ______ Note: Bill = $30 is not an actual observation in the sample Estimate Can: _______________ within the range of $15 to $45
Example 5B: Estimation & Limitations Question: If the bill is $70, estimate the average amount left for a tip. 70 x = $_____ Tip = -0.60 + 0.19 × Bill Extrapolate Can’t: _______________ outside the range of $15 to $45
To remember about regression equations: Y-intercept: logical interpretation: restricted to data where ____ is in the range of data in the sample No Extrapolation: don’t use a regression equation to estimate a value for the response variable ___________ the range of x values Estimation: regression equation estimates the __________ value for y at a given value of x. outside average
Review: If you understood today’s lecture, you should be able to solve 3.1, 3.3, 3.5, 3.13, 3.15, 3.19, 3.21 Recall Objectives: • Define z-scores and relate them to the empirical (68-95- 99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines: – slope interpretation – y-intercept interpretation – dangerous to extrapolate