Statistics 200 Lecture #5 Tuesday, September 6, 2016

Slides:



Advertisements
Similar presentations
CHAPTER 3 Describing Relationships
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Statistics 200 Lecture #6 Thursday, September 8, 2016
Statistics 200 Lecture #4 Thursday, September 1, 2016
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Describing Relationships
Unit 4 LSRL.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 5 LSRL.
LSRL Least Squares Regression Line
Regression and Residual Plots
Linear transformations
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 3: Describing Relationships
Simple Linear Regression
Simple Linear Regression
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Unit 4 Vocabulary.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
3.2 – Least Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Algebra Review The equation of a straight line y = mx + b
Chapter 3: Describing Relationships
Lesson 2.2 Linear Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Statistics 200 Lecture #5 Tuesday, September 6, 2016 Textbook: Sections 2.7 through 3.2 Objectives: • Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines: – slope interpretation – y-intercept interpretation – dangerous to extrapolate

Standardized z-scores Tells us how many standard deviations an observation is from the mean. A useful measure of the relative value of any observation in a dataset Allows comparison of observations in different data sets.

Standardized z-scores Z-scores correspond directly to the Empirical Rule. About 68% of values have z-scores between __ and __. About 95% of values have z-scores between __ and __. About 99.7% of values have z-scores –1 1 –2 2 –3 3

Example 1 What is the z-score and interpretation in the following situation? Obs = 3, mean = 4, SD = 0.5 Z-score = (observation – mean)/SD = (3 – 4) / 0.5 = –1 / 0.5 = –2 Interpretation: The observation of 3 is 2 standard deviations below the mean.

Example 2 What is the z-score and interpretation in the following situation? Obs = 200, mean=150, SD = 20 Z-score = (observation – mean)/SD = (200-150)/20 = 50/20 = 2.5 Interpretation: The observation 200 is 2.5 standard deviations above the mean.

More complicated example: which person has a more unusual height? Me: a 53” tall woman My husband: a 73” tall man Women’s heights are normal with mean 54” and std. dev. 3”. Men’s heights are normal with mean 70” and std. dev. 3” These heights come from different distributions, so we cannot compare them directly. We need a tool to make them comparable… Z-score!

Calculate Z-scores for both: Me: Z-score = (obs – mean)/(std. dev) = (53 – 54) / (3) = -1/3 = -0.33 Husband: Z-score = (obs – mean) / (std. dev) = (73 – 70) / 3 = 3 / 3 = 1

Compare Z-scores – draw them below Me Husband

Compare Z-scores below .33 Me: ____ std. dev. _____ the mean Conclusion: My husband’s height is more unusual than mine, because it is more std. dev. from the mean. below .33 Me: ____ std. dev. _____ the mean Husband: ____ std. dev. _____ the mean 1 above

So far… We have talked about quantitative variables, but only one at a time. Now we’re going to begin looking at the relationships between two different quantitative variables. Start with looking at a Scatterplot

Scatterplots: A scatterplot is a two-dimensional graph of two numeric variables. There are two axes on a scatterplot, the vertical axis (y-axis) and the horizontal axis (x-axis). The y-axis is assigned to the response variable The x-axis is assigned to the explanatory variable.

Example 1: Apartment size and rent Two Variables: size of one-bed-room apartment (square feet) monthly rent ($) Size (Square Ft) Rent ($) 415 438 485 636 548 666 646 545 690 688 538 469 1000 833 1003 1089 1150 1181 1237 1225 1469 1501 1177 958

What is the average pattern? What is the direction of the pattern? A positive, linear association Response / dependent / y variable Explanatory / independent / x variable

Linear versus curvilinear Linear relationship a relationship that, on average, will follow a line Curvilinear or nonlinear relationship a relationship that, on average, will follow a curve

Association : a term used to describe direction of the pattern shown by the two variables. A positive association occurs when the values of one variable tend to _________as the values of the other variable increase. A negative association occurs when the values of one variable tend to _________ as the values of the other variable increase. increase decrease

Outliers unusual combination When we consider two variables, an outlier is a point with an _________________ of values. May be unusual and interesting data points, or may be errors. unusual combination

Example – Tornado Activity Variables: year number of tornadoes (Jan – May) Unusually high observations that don’t follow trend of other observation Source: National Weather Service

Formalize the trend: Regression lines Regression line: a straight line that describes how values of the response variables (y) are related, on average, to values of the explanatory variable (x). We can use the regression line to… Estimate average value of y at a specified value of x Predict the unknown value of y for an individual using that individual’s x value.

Specify Linear Relationships with Simple Linear Regression Model used to find the best straight line to fit the data points Name of Procedure: ___________ Squares Least Square Model: smallest ________ of the __________ differences found with all possible lines Least sum squared

The regression equation In math average value of y In statistics y-intercept slope

In a picture:

Example : Positive Linear Relationship between meal bill ($) and amount of tip ($) data from a restaurant r = 0.830 & n = 10 bills

Example: Tip example Question: Use the amount of bill ($) to estimate the amount of tip left ($), on the average? Identify the Variables: Bill ($): response explanatory Tip ($): response explanatory Note: explanatory variable is also called the predictor variable

To fit a regression line in Minitab: Stat > Regression > Fitted Line Plot correctly identify explanatory variable and response straight line: simple linear regression

Least Squares Regression Equation The regression equation is Tip = -0.60 + 0.190 Bill sample y-intercept (bo) sample slope (b1)

Slope Interpretation 1 increase 19 Tip = -0.60 + 0.19 Bill tip tip For each additional ___ $ found on the bill, you can expect the tip to ____________ by ___ cents, on the average 1 increase 19

Y-intercept Interpretation Tip = -0.60 + 0.19 Bill bo = -$0.60 In theory it says: When you have no bill, you can expect a tip to be ________ So does the y-intercept have a logical interpretation in the context of this problem? -$0.60 No: we have no data for bill = 0

Estimation & Limitations Question: If the bill is $30, estimate the average amount left for a tip? Tip = -0.60 + 0.19 Bill 30 Tip = -0.60 + 0.19×(_____) $5.1 Tip = ______ Note: Bill = $30 is not an actual observation in the sample Estimate Can: _______________ within the range of $15 to $45

Example 5B: Estimation & Limitations Question: If the bill is $70, estimate the average amount left for a tip. 70 x = $_____ Tip = -0.60 + 0.19 × Bill Extrapolate Can’t: _______________ outside the range of $15 to $45

To remember about regression equations: Y-intercept: logical interpretation: restricted to data where ____ is in the range of data in the sample No Extrapolation: don’t use a regression equation to estimate a value for the response variable ___________ the range of x values Estimation: regression equation estimates the __________ value for y at a given value of x. outside average

Review: If you understood today’s lecture, you should be able to solve 3.1, 3.3, 3.5, 3.13, 3.15, 3.19, 3.21 Recall Objectives: • Define z-scores and relate them to the empirical (68-95- 99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines: – slope interpretation – y-intercept interpretation – dangerous to extrapolate