Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
BA 275 Quantitative Business Methods
Experimental design and analyses of experimental data Lesson 2 Fitting a model to data and estimating its parameters.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Lecture 4 This week’s reading: Ch. 1 Today:
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Multiple Regression models Estimation Goodness of fit tests
Chapter 11: Inferential methods in Regression and Correlation
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear.
Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Our theory states.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Equations in Simple Regression Analysis. The Variance.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Simple Linear Regression Models
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Lecture 4 SIMPLE LINEAR REGRESSION.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.
Xuhua Xia Polynomial Regression A biologist is interested in the relationship between feeding time and body weight in the males of a mammalian species.
Regression. Population Covariance and Correlation.
2014. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR We need to be.
Why Design? (why not just observe and model?) CopyrightCopyright © Time and Date AS / Steffen Thorsen All rights reserved. About us | Disclaimer.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
By: Corey T. Williams 03 May Situation Objective.
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
1 Chapter 5 : Volatility Models Similar to linear regression analysis, many time series exhibit a non-constant variance (heteroscedasticity). In a regression.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 20 Linear and Multiple Regression
Linear Regression.
6-1 Introduction To Empirical Models
Hypothesis testing and Estimation
Statistics for Business and Economics
Presentation transcript:

Simple Linear Regression

Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting Method How to determine this regression function? (need to estimate the parameters.)

Least Squares Regression Function : Least Squares Estimates

How do we know the two estimators can minimize Q?

Terminology Fitted model True model Fitted regression function

It can be shown that

REGRESSION ON MIDTERM GRADE Obs MIDTERM FINAL Figure 1.4 SAS PROC PRINT output for the grade data problem.

TITLE ‘REGRESSION ON MIDTERM GRADE’; DATA; INPUT MIDTERM FINAL; CARDS; ; PROC PLOT; PLOT FINAL*MIDTERM=’O’ PRED*MIDTERM=’P’ / OVERLAY; LABEL FINAL=’FINAL’; PROC PRINT; PROC REG; MODEL FINAL=MIDTERM / P; OUTPUT PREDICTED=PRED RESIDUAL=RESID; PROC RANK NORMAL=VW; VAR RESID; RANKS NSCORE; PROC PLOT; PLOT RESID*NSCORE=’R’; LABEL NSCORE=’NORMAL SCORE’; RUN;

REGRESSION ON MIDTERM GRADE Model: MODEL1 Dependent Variable: FINAL Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept MIDTERM

Dep Var Predicted Obs FINAL Value Residual Sum of Residuals 0 Sum of Squared Residuals Predicted Residual SS (PRESS)

| | o | | o p p | o o | o | o p 80 + p o F | o p pp I | o o o o o N | o pp o o A | p p L | p | o o 60 + p | p o | 40 + o | NOTE: 6 obs hidden. MIDTERM Figure 1.6 Output for the first PROC PLOT step for the grade data problem.

20 + | | R | R R 10 + | R R | e | R R R s | R i | d R R--R u | R R a | R l | R R | R | | R | | R | Predicted Value of FINAL Figure 1.7 The remainder of the output from the first PROC PLOT step.

20 + | | R | R R 10 + | R R | e | R R R s | R i | d 0 + R R R u | R R a | R l | R R | R | | R | | R | NORMAL SCORE

* Confidence Interval

The range lies between –1 and 1. * Pearson’s Correlation Coefficient * Goal : The degree of linear correlation between two variables.

* Coefficient of Determination: the fraction of the variance in y that is explained by regression on x. Definition : Goal : may be used as an index of linearity for the relation of y to x.

120 + | o | | o | o | | o P | o R 80 + E | o o S | o U | o R 60 + o o | o 40 + o o | o o | o o o | o | 20 + | VOLUME Figure 3.3: A plot of the air pressure data (an example of residual analysis).

| 30 + | | * | 20 + | R | e | * s |* i | d 10 + * * u | a | * l | * * | | * * * | * | * * | * * * * * * * | Predicted Value of P Figure 3.4 The residual on fit plot after fitting the model P= a + b V + e to the air pressure data.

| * | | * | | | * * * * * * | * * * R | * * * e * * s | * i | * * * d | u | * a * l | * | * | | | * | Predicted Value of P Figure 3.5 The residual on the fit plot using the model P = a + b/V +e for the air pressure data.

Weighted Regression Problem : (unequal variance) Model : Claim : minimize Ordinary Regression Model : Claim : minimize

How to determine the weights? So the optimal weights are inversely proportional to the variances of the y.

DATA; INPUT V P; VI=1/V; CARDS; ; PROC REG; MODEL P=VI; WIGHT W; OUTPUT P=FIT R=RES; DATA; SET; WRES=SQRT(W)*RES; PROC REG; MODEL P=VI; OUTPUT P=LSFIT; DATA; SET; W=1/LSFIT; PROC RANK NORMAL=VW; VAR WRES; RANKS NSCORE; PROC PLOT; PLOT WRES*FIT=’*’ / VREF=0 VPOS=30; POLT WRES*NSCORE=’*’ /VPOS=30; LABEL WRES=’WEIGHTED RESIDUAL’ NSCORE=’NORMAL SCORE’; RUN;

| | | * W | * E | I * * * G | * * * H | * T | * * * E | * * D * | * R | * * * E | * S | I * D | * U | A | * L | * | | * | Predicted Value of P Figure 3.13 Weighted residual plot for a weighted fit of the model P = a + b/V + e to the air pressure data.

| | * * * | * | R | * * * e | * * s * * * * * i | * * * * d | * u | * * a | * l * | | | * | | Predicted Value of PT Figure 3.17 Residual on fit plot for the model –1/ P =α+ BV + e in air pressure data.

| | | * * * | * | R | * * * e | * * s 0 + * * * * * i | * * * * d | * u | * * a | * l * | | | * | | NORMAL SCORE Figure 3.18 Residual normal probability plot for the model –1/ P =α+ BV + e in air pressure data..

| * | * | | * * * | * R | * e | * * s * i | * * * d | * * u | a | * * l * | * * | | * | | * | | Predicted Value of PT Figure 3.19 Residual on fit plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.

| * | * | | * * * | * R | * e | * * s 0 + * i | * * * d | * * u | a | * * l * | * * | | * | | * | | NORMAL SCORE Figure 3.20 Residual normal probability plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.

How to determine the weights of transformation T such that (assuming T is monotonic increasing)