Topic 13: Quantitative-Quantitative Association Part 1:

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Statistical Relationship Between Quantitative Variables
Linear Regression/Correlation
Linear Regression Analysis
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Correlation Correlation measures the strength of the LINEAR relationship between 2 quantitative variables. Labeled as r Takes on the values -1 < r < 1.
Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.
1.6 Linear Regression & the Correlation Coefficient.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Regression Regression relationship = trend + scatter
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. DosageHeart rate
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 12: Correlation and Linear Regression 1.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Lecture 9 Sections 3.3 Objectives:
Chapter 11: Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Sections Review.
CHAPTER 3 Describing Relationships
Two Quantitative Variables
Georgetown Middle School Math
SIMPLE LINEAR REGRESSION MODEL
(Residuals and
Chapter 3: Linear models
Describe the association’s Form, Direction, and Strength
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Describing Bivariate Relationships
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
EQ: How well does the line fit the data?
Unit 3 – Linear regression
AP Statistics, Section 3.3, Part 1
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
GET OUT p.161 HW!.
Unit 4 Vocabulary.
Least-Squares Regression
Introduction to Probability and Statistics Thirteenth Edition
The Least-Squares Line Introduction
Examining Relationships
Least Squares Regression
Review of Chapter 3 Examining Relationships
Adequacy of Linear Regression Models
Linear Regression and Correlation
Least-Squares Regression
Adequacy of Linear Regression Models
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Linear Regression and Correlation
Adequacy of Linear Regression Models
Examining Relationships
Adequacy of Linear Regression Models
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3 Vocabulary Linear Regression.
Chapters Important Concepts and Terms
Correlation & Trend Lines
9/27/ A Least-Squares Regression.
Statistics 101 CORRELATION Section 3.2.
Review of Chapter 3 Examining Relationships
Presentation transcript:

Topic 13: Quantitative-Quantitative Association Part 1: Introduction to linear regression Finding the best fit line by least squares regression Linear Regression and Outliers

Introduction to linear regression

Poverty vs. high school graduation rate The scatterplot below shows the relationship between high school graduate rate in all 50 US states and DC and the percentage of residents who live below the poverty line (income below $23,050 for a family of 4 in 2012). Explanatory variable? % HS grad Response variable? % in poverty Relationship? linear, negative, moderately strong

Quantifying the relationship The correlation coefficient (r) describes the strength of the linear association between two quantitative variables.

Strong, Moderate, or Weak? Interpreting r requires knowledge in one’s field. A value for r that implies a strong relationship in one field may not in another. Below is a table for serves as a starting point until you learn more about your field in particular. Positive Association 0.8 to 1.0 (very strong) 0.6 to 0.8 (strong) 0.4 to 0.6 (moderate) 0.2 to 0.4 (weak) 0.0 to 0.2 (very weak) Negative Association -0.8 to -1.0 (very strong) -0.6 to -0.8 (strong) -0.4 to -0.6 (moderate) -0.2 to -0.4 (weak) 0.0 to -0.2 (very weak)

Guessing the correlation Which of the following is the best guess for the correlation coefficient between % in poverty and % HS grad? 0.6 -0.75 -0.1 0.02 -1.5

Guessing the correlation Which of the following is the best guess for the correlation between % in poverty and % female householder with no husband present? 0.1 -0.6 -0.4 0.9 0.5

Assessing the correlations Which of the following has the strongest correlation, that is, the correlation coefficient is closest to 1 or -1

Finding the best fit line by least squares regression

Residuals Residuals are the distances of the observations to the line.

Method of least squares We find the line that minimizes the sum of the squares of the residuals. Consider the Geogebra applet. Conditions for the least squares line Linearity Nearly normal residuals Constant variability

Conditions: (1) Linearity The relationship between the explanatory and the response variable should be approximately linear.

Conditions: (2) Nearly normal residuals The residuals should be nearly normal.

Conditions: (3) Constant variability The variability of the points around the least squares line should be roughly constant.

Checking conditions What condition is this model obviously violating? Constant variability Linear Relationship Normal residuals

Checking conditions What condition is this model obviously violating? Constant variability Linear Relationship Normal residuals

r2 The strength of the fit of a linear model is most commonly evaluated using r2, that is, the square of the correlation coefficient. It tells us what percent of variability in the response variable is explained by the model. The remainder of the variability is due to other variables not included in the model or by inherent randomness in the data.

Interpretation of r2 r = -0.62 r2 = 0.38 38% of the variability in the % of residents living in poverty among the 51 states is explained by the model.

Linear Regression and Outliers

Outliers and direction of the association Data are available on the surface temperature and light intensity of 47 stars in the star cluster CYG oB1

Outliers and association strength r = 0.08, r2 = 0.0064 r = 0.79, r2 = 0.6241