Chapter 3 Describing Relationships Section 3.2

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Chapter 3 Describing Relationships Section 3.2 Least-Squares Regression

Least-Squares Regression MAKE predictions using regression lines, keeping in mind the dangers of extrapolation. CALCULATE and interpret a residual. INTERPRET the slope and y intercept of a regression line. DETERMINE the equation of a least-squares regression line using technology or computer output. CONSTRUCT and INTERPRET residual plots to assess whether a regression model is appropriate.

Least-Squares Regression INTERPRET the standard deviation of the residuals and r2 and use these values to assess how well a least-squares regression line models the relationship between two variables. DESCRIBE how the least-squares regression line, standard deviation of the residuals, and r2 are influenced by outliers. FIND the slope and y intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation.

Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other.

Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other.

Regression Lines Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. Regression lines are expressed in the form 𝑦 = 𝑏 0 + 𝑏 1 𝑥 where 𝑦 (pronounced “y-hat”) is the predicted value of y for a given value of x.

Prediction A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles.

Prediction A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles.

Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000 A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000

Prediction 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000 A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Predict the price of a Ford F-150 that has been driven 100,000 miles. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 100000 𝑝𝑟𝑖𝑐𝑒 =$21,967

Extrapolation Can we predict the price of a Ford F-150 with 300,000 miles driven?

Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 300000

Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 300000 𝑝𝑟𝑖𝑐𝑒 =−$10,613

Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 300000 Extrapolation is the use of a regression line for prediction far outside the interval of x values used to obtain the line. Such predictions are often not accurate. 𝑝𝑟𝑖𝑐𝑒 =−$10,613

Extrapolation 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 Can we predict the price of a Ford F-150 with 300,000 miles driven? 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 300000 Extrapolation is the use of a regression line for prediction far outside the interval of x values used to obtain the line. Such predictions are often not accurate. 𝑝𝑟𝑖𝑐𝑒 =−$10,613 CAUTION: Don’t make predictions using values of x that are much larger or much smaller than those that actually appear in your data.

Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable).

Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable).

Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable). A residual is the difference between the actual value of y and the value of y predicted by the regression line.

Residuals In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the “leftover” variation in the response variable). A residual is the difference between the actual value of y and the value of y predicted by the regression line. 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑎𝑐𝑡𝑢𝑎𝑙 𝑦 −𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑦 =𝑦 − 𝑦

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles.

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price.

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑝𝑟𝑖𝑐𝑒 =$26,759

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑝𝑟𝑖𝑐𝑒 =$26,759

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑝𝑟𝑖𝑐𝑒 =$26,759

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765 Interpret the residual.

Residuals A random sample of 16 used Ford F-150 SuperCrew 4 × 4s was selected from among those listed for sale at autotrader.com. The data are shown in the table. For these data, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Calculate and interpret the residual for the truck that was driven 70,583 miles. Find the predicted price. Find the residual. 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝑝𝑟𝑖𝑐𝑒 − 𝑝𝑟𝑖𝑐𝑒 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 70583 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=𝟐𝟏𝟗𝟗𝟒 −𝟐𝟔𝟕𝟓𝟗 𝑝𝑟𝑖𝑐𝑒 =$26,759 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙=−$4765 Interpret the residual. The actual price of this truck is $4765 less than the cost predicted by the regression line with x = miles driven.

Interpreting a Regression Line A regression line is a model for the data, much like the density curves of Chapter 2. The y intercept and slope of the regression line describe what this model tells us about the relationship between the response variable y and the explanatory variable x.

Interpreting a Regression Line A regression line is a model for the data, much like the density curves of Chapter 2. The y intercept and slope of the regression line describe what this model tells us about the relationship between the response variable y and the explanatory variable x. In the regression equation 𝑦 = 𝑏 0 + 𝑏 1 𝑥 : 𝑏 0 is the y intercept, the predicted value of y when x = 0 𝑏 1 is the slope, the amount by which the predicted value of y changes when x increases by 1 unit

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why.

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope.

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $0.1629 (16.29 cents) for each additional mile that the truck has been driven.

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $0.1629 (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept.

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. Interpret the slope. The predicted price of a used Ford F-150 goes down by $0.1629 (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept. The predicted price (in dollars) of a used Ford F-150 that has been driven 0 miles. (The y intercept does have meaning in this case, as it is possible to have a number of miles driven near 0 miles.)

Interpreting a Regression Line Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . Interpret the slope of the regression line. Does the value of the y intercept have meaning in this context? If so, interpret the y intercept. If not, explain why. important to include the word predicted (or equivalent) in your response. Otherwise, it might appear that you believe the regression equation provides actual values of y. When asked to interpret the slope or y intercept, it is very CAUTION: Interpret the slope. The predicted price of a used Ford F-150 goes down by $0.1629 (16.29 cents) for each additional mile that the truck has been driven. Interpret the y intercept. The predicted price (in dollars) of a used Ford F-150 that has been driven 0 miles. (The y intercept does have meaning in this case, as it is possible to have a number of miles driven near 0 miles.)

The Least-Squares Regression Line There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals.

The Least-Squares Regression Line There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals.

The Least-Squares Regression Line There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals. The least-squares regression line is the line that makes the sum of the squared residuals as small as possible.

p. 184 and 187 Using your Calculator We are going to practice using your calculator to make a scatter plot and residual plot.

Determining if a Linear Model Is Appropriate: Residual Plots One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot.

Determining if a Linear Model Is Appropriate: Residual Plots One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot. A residual plot is a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis.

Determining if a Linear Model Is Appropriate: Residual Plots One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot. A residual plot is a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis.

Determining if a Linear Model Is Appropriate: Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size.

Determining if a Linear Model Is Appropriate: Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size.

Determining if a Linear Model Is Appropriate: Residual Plots A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate: The residual plot should show no obvious patterns. The residuals should be relatively small in size. Pattern in residuals Linear model not appropriate

Determining if a Linear Model Is Appropriate: Residual Plots How to Interpret a Residual Plot To determine whether the regression model is appropriate, look at the residual plot. If there is no leftover curved pattern in the residual plot, the regression model is appropriate. If there is a leftover curved pattern in the residual plot, consider using a regression model with a different form.

Residual Plots Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology produced the following residual plot. Is a linear model appropriate for these data? Explain.

Residual Plots Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology produced the following residual plot. Is a linear model appropriate for these data? Explain. Because there is no obvious pattern left over in the residual plot, the linear model is appropriate.

How Well the Line Fits the Data: The Role of s and r2 in Regression Start here today.

How Well the Line Fits the Data: The Role of s and r2 in Regression To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the “typical” prediction error when using the least-squares regression line.

How Well the Line Fits the Data: The Role of s and r2 in Regression To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the “typical” prediction error when using the least-squares regression line. The standard deviation of the residuals s measures the size of a typical residual. That is, s measures the typical distance between the actual y values and the predicted y values.

How Well the Line Fits the Data: The Role of s and r2 in Regression The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y.

How Well the Line Fits the Data: The Role of s and r2 in Regression The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. The coefficient of determination r2 measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y. In other words, r2 measures the percent of the variability in the response variable that is accounted for by the least-squares regression line.

How Well the Line Fits the Data: The Role of s and r2 in Regression The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y. The coefficient of determination r2 measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y. In other words, r2 measures the percent of the variability in the response variable that is accounted for by the least-squares regression line. r2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset.

How Well the Line Fits the Data: The Role of s and r2 in Regression Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2.

How Well the Line Fits the Data: The Role of s and r2 in Regression Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s.

How Well the Line Fits the Data: The Role of s and r2 in Regression Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven.

How Well the Line Fits the Data: The Role of s and r2 in Regression Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven. Interpret r2.

How Well the Line Fits the Data: The Role of s and r2 in Regression Recall that for a random sample of 16 used Ford F-150 SuperCrew 4 × 4s, the least-squares regression equation is 𝑝𝑟𝑖𝑐𝑒 =38257−0.1629 𝑚𝑖𝑙𝑒𝑠 𝑑𝑟𝑖𝑣𝑒𝑛 . For this model, technology gives s = $5740, and r2 = 0.66. Interpret the value of s. Interpret the value of r2. Interpret s. The actual price of a Ford F-150 is typically about $5740 away from the price predicted by the least-squares regression line with x = miles driven. Interpret r2. About 66% of the variability in the price of a Ford F-150 is accounted for by the least-squares regression line with x = miles driven.

Interpreting Computer Regression Output A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

Interpreting Computer Regression Output A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

Interpreting Computer Regression Output A number of statistical software packages produce similar regression output. Be sure you can locate the slope b1 the y intercept b0 the values of s the value of r2

Calculating the Regression Equation from Summary Statistics Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least-squares regression line using only the means and standard deviations of the two variables and their correlation.

Calculating the Regression Equation from Summary Statistics Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least-squares regression line using only the means and standard deviations of the two variables and their correlation. How to Calculate the Least-squares Regression Line Using Summary Statistics We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means 𝑥 and 𝑦 and the standard deviations sx and sy of the two variables and their correlation r. The least-squares regression line is the line 𝑦 = 𝑏 0 + 𝑏 1 𝑥 with slope 𝑏 1 =𝑟∙ 𝑠 𝑦 𝑠 𝑥 and y intercept 𝑏 0 = 𝑦 − 𝑏 1 𝑥

Regression to the Mean

Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 =103.34+2.75𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy

Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 =103.34+2.75𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy For an increase of 1 standard deviation in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations in the response variable y.

Regression to the Mean The scatterplot shows height versus foot length and the regression equation 𝑦 =103.34+2.75𝑥. We have added four more lines to the graph: a vertical line at the mean foot length x a vertical line at x + sx a horizontal line at the mean height y a horizontal line at y + sy For an increase of 1 standard deviation in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations in the response variable y. This is called regression to the mean, because the values of y “regress” to their mean.

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS r = 0.816

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations.

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION When we study the relationship between two variables, we often hope to show that changes in the explanatory variable cause changes in the response variable.

Correlation and Regression Wisdom Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. ASSOCIATION DOES NOT IMPLY CAUSATION When we study the relationship between two variables, we often hope to show that changes in the explanatory variable cause changes in the response variable. CAUTION: A strong association between two variables is not enough to draw conclusions about cause and effect.

Section Summary MAKE predictions using regression lines, keeping in mind the dangers of extrapolation. CALCULATE and interpret a residual. INTERPRET the slope and y intercept of a regression line. DETERMINE the equation of a least-squares regression line using technology or computer output. CONSTRUCT and INTERPRET residual plots to assess whether a regression model is appropriate.

Section Summary INTERPRET the standard deviation of the residuals and r2 and use these values to assess how well a least-squares regression line models the relationship between two variables. DESCRIBE how the least-squares regression line, standard deviation of the residuals, and r2 are influenced by outliers. FIND the slope and y intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation.

Assignment 3.2 p. 205-210 #56-68 EOE (Every Other Even) and 70-78 all (56, 60, 64, 68, 70, 71-78 all and Chapter 3 FRAPPY!) If you are stuck on any of these, look at the odd before or after and the answer in the back of your book. If you are still not sure text a friend or me for help (before 8pm). Tomorrow we will check homework and review for 3.2 Quiz.