Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 10 Simple Regression.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 22: Thurs., April 1 Outliers and influential points for simple linear regression Multiple linear regression –Basic model –Interpreting the coefficients.
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Introduction to Probability and Statistics Linear Regression and Correlation.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Ch. 14: The Multiple Regression Model building
Pertemua 19 Regresi Linier
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Lecture 10: Correlation and Regression Model.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Chapter 12: Correlation and Linear Regression 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Regression Analysis AGEC 784.
LECTURE 13 Thursday, 8th October
CHAPTER 12 More About Regression
Stats Club Marnie Brennan
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 29: Multiple Regression*
BA 275 Quantitative Business Methods
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Presentation transcript:

Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review Midterm II on Thursday in class: Allowed calculator, two double-sided pages of notes Office hours: Today after class; Wednesday, 1:30-2:30; by appointment (I will be around Wed. morning and Thurs. morning before 10:30).

R-Squared The R-squared statistic, also called the coefficient of determination, is the percentage of response variation explained by the explanatory variable. Total sum of squares = . Best sum of squared prediction error without using x. Residual sum of squares =

R-Squared example R2= 86.69. Read as “86.69 percent of the variation in neuron activity was explained by linear regression on years played.”

Interpreting R2 R2 takes on values between 0 and 1, with higher R2 indicating a stronger linear association. If the residuals are all zero (a perfect fit), then R2 is 100%. If the least squares line has slope 0, R2 will be 0%. R2 is useful as a unitless summary of the strength of linear association.

Caveats about R2 R2 is not useful for assessing model adequacy (e.g., linearity) or whether or not there is an association. A good R2 depends on the context. In precise laboratory work, R2 values under 90% might be too low, but in social science contexts, when a single variable rarely explains great deal of variation in response, R2 values of 50% may be considered remarkably good.

Coverage of Second Midterm Transformations of the data for two group problem (Ch. 3.5) Welch t-test (Ch. 4.3.2) Comparisons Among Several Samples (5.1-5.3, 5.5.1) Multiple Comparisons (6.3-6.4) Simple Linear Regression (Ch. 7.1-7.4, 7.5.3) Assumptions for Simple Linear Regression and Diagnostics (Ch. 8.1-8.4, 8.6.1, 8.6.3)

Transformations for two-group problem Goal: Find transformation so that the two distributions have approximately equal spread. Log transformation might work when distributions are skewed and spread is greater in the distribution with larger median. Interpretation of log transformation: For causal inference: Let be the additive treatment effect on the log scale ( ). Then the effect of the treatment is to multiply the control outcome by For population inference: Let and be the means of the logged values of population 1 and 2 respectively. If the logged values of the population are symmetric, then equals the ratio of the median of population 2 to the median of population 1.

Review of One-way layout Assumptions of ideal model All populations have same standard deviation. Each population is normal Observations are independent Planned comparisons: Usual t-test but use all groups to estimate . If many planned comparisons, use Bonferroni to adjust for multiple comparisons Test of vs. alternative that at least two means differ: one-way ANOVA F-test Unplanned comparisons: Use Tukey-Kramer procedure to adjust for multiple comparisons.

Regression Goal of regression: Estimate the mean response Y for subpopulations X=x, Applications: (i) Description of association between X and Y; (ii) Passive prediction of Y given X ; (iii) Control – predict what y will be if x is changed. Application (iii) requires the x’s to be randomly assigned. Simple linear regression model: Estimate and by least squares – choose to minimize the sum of squared residuals (prediction errors)

Ideal Model Assumptions of ideal simple linear regression model There is a normally distributed subpopulation of responses for each value of the explanatory variable The means of the subpopulations fall on a straight-line function of the explanatory variable. The subpopulation standard deviations are all equal (to ) The selection of an observation from any of the subpopulations is independent of the selection of any other observation.

The standard deviation is the standard deviation in each subpopulation. measures the accuracy of predictions from the regression. If the simple linear regression models holds, then approximately 68% of the observations will fall within of the least squares line 95% of the observations will fall within of the least squares line

Inference for Simple Linear Regression Inference based on the ideal simple linear regression model holding. Inference based on taking repeated random samples ( ) from the same subpopulations ( ) as in the observed data. Types of inference: Hypothesis tests for intercept and slope Confidence intervals for intercept and slope Confidence interval for mean of Y at X=X0 Prediction interval for future Y for which X=X0

Tools for model checking Scatterplot of Y vs. X (see Display 8.6) Scatterplot of residuals vs. fits (see Display 8.12) Look for nonlinearity, non-constant variance and outliers Normal probability plot (Section 8.6.3) – for checking normality assumption

Outliers and Influential Observations An outlier is an observation that lies outside the overall pattern of the other observations. A point can be an outlier in the x direction, the y direction or in the direction of the scatterplot. For regression, the outliers of concern are those in the x direction and the direction of the scatterplot. A point that is an outlier in the direction of the scatterplot will have a large residual. An observation is influential if removing it markedly changes the least squares regression line. A point that is an outlier in the x direction will often be influential. The least squares method is not resistant to outliers. Follow the outlier examination strategy in Display 3.6 for dealing with outliers in x direction and outliers in the direction of scatterplot.

Transformations Goal: Find transformations f(y) and g(x) such that the simple linear regression model approximately describes the relationship between f(y) and g(x). Tukey’s Bulging Rule can be used to find candidate transformations. Prediction after transformation Interpreting log transformations