Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Ch11 Curve Fitting Dr. Deshi Ye
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Engineering Computation Curve Fitting 1 Curve Fitting By Least-Squares Regression and Spline Interpolation Part 7.
Multiple Linear Regression
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Simple linear regression and correlation analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Curve-Fitting Regression
Regression Regression relationship = trend + scatter
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Statistics for Managers using Microsoft Excel 3rd Edition
Kin 304 Regression Linear Regression Least Sum of Squares
BPK 304W Regression Linear Regression Least Sum of Squares
CHAPTER 29: Multiple Regression*
SIMPLE LINEAR REGRESSION
Presentation transcript:

Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model in general: –making forecasts/predictions/estimates for Y –investigation of functional relationship between Y and Xi –filling-in missing data in Y-series –validation of Y-series Use of regression model in data processing: –validation and in-filling of missing data using a relation curve and of discharges using RR-relation –transformation of water levels to discharges using a power type regression equation –estimation of rainfall/climatic variable on a catchment grid like in kriging OHS - 1

Linear and non-linear regression equations Linear regression –simple linear regression (i = 1) –multiple and stepwise regression (i > 1) in stepwise- regression the independent variables enter model one by one based on largest reduction of unexplained variance (free variables); forced variables always enter model Non-linear regression OHS - 2

Suitable regression model Model depends on: –variables considered –physics of the processes –range of the data of interest A non-linear relation may well be described by a linear regression equation within a particular range of the variables in regression –annual rainfall-runoff relation is in principle non-linear, but: *for low rainfall abstractions vary strongly due to evaporation *for very high rainfall evaporation has reached its potential and is almost constant *within a limited range relation assumption of linearity is often suitable OHS - 3

Evaporation Runoff = Rainfall General form of relation between annual rainfall and runoff OHS - 4

Use of regression model for discharge validation Steps –develop regression model where runoff/discharge is regressed on rainfall: Q t = f(P t, P t-1,…..) –by investigating the time-wise behaviour of the residuals stationarity of the relationship is tested –if rainfall is error free deviations from stationarity may be due to: *change in drainage characteristics *incorrect runoff data due to errors in the water level data and/or in the stage-discharge relation –visualisation of non-stationarity by double mass analysis of observed discharge and via regression computed discharge OHS - 5

ii ŶiŶi Residual = part of Y not explained by regression Part of Y explained by regression Distribution of residuals Simple linear regression model Ŷ =  +  X Y =  +  X +  Y - Y =   Y 2 =  Y 2 +   2 Ŷ =  +  X Y =  +  X +  Y - Y =   Y 2 =  Y 2 +   2 Total variance = explained variance + unexplained variance Ŷ =  +  X OHS - 6

Years Months Direction for parameter estimation 3-D plot of monthly rainfall DIRECTION OF DATA VECTOR FOR REGRESSION ANALYSIS OHS - 7

Estimation of regression coefficients Minimising the sum of squared errors to obtain Least Squares Estimators: First derivatives of M to a and b set to zero: normal equations: Solutions for b and a OHS - 8

Measure for goodness of fit Other forms of regression equation (Y - Y) = b(X - X) Or with correlation coefficient r = S XY /  X.  Y : (Y - Y) = r  Y /  X (X - X) By squaring previous equation and averaging   2 =  Y 2 (1 - r 2 ) r 2 = coefficient determination r 2 is a measure for the quality of the regression fit NOTE: A high r 2 is not sufficient; behaviour of residual about regression line and development with time also extremely important OHS - 9

Confidence limits Error variance Confidence limits regression line Confidence limits prediction MIND THE DIFFERENCE OHS - 10

Application of regression analysis for data validation 17 years of annual rainfall and runoff data Procedure: –Plotting of time series –Fitting of regression equation R = f(P) –Plot of residual versus P –Plot of residual versus time –Plot of accumulated residual with time –Double mass analysis of observed versus regression based runoff –Adjustment of runoff data –Repetition of above procedure and compare with above –Compare coefficients of determination –Compute confidence limits about regression and for prediction OHS - 11

Rainfall-runoff record OHS - 12

Regression fit rainfall-runoff OHS - 13

Plot of residual versus rainfall OHS - 14

Plot of residual versus time OHS - 15

Plot of accumulated residual OHS - 16

Double mass analysis of observed versus computed runoff Break in measured runoff OHS - 17

Plot of rainfall versus corrected runoff OHS - 18

Plot of rainfall-corrected runoff regression OHS - 19

Plot of residual (corrected) versus rainfall OHS - 20

Plot of residual (corrected) versus time OHS - 21

Plot of regression line with confidence limits OHS - 22

Extrapolation Extrapolation of a regression equation beyond the measured range of X to obtain a value of Y not recommended: –confidence intervals become large –relation Y = f(X) may be non-linear for full range of X –extrapolation only if evidence of applicability of relation OHS - 23

Multiple linear regression models Model for monthly rainfall: R(t) =  +  1 P(t) +  2 P(t-1)+…. General linear model Y =  1 X 1 +  2 X 2 +….….+  p X p +  Matrix form: YX  Y = X  +  where: Y Y = (nx1) - data vector of (y i -y) X X = (nxp) - data matrix of (x i1 -x 1 ),…,(x ip -x p )   = (px1) - column vector of regression coeff.   = (nx1) - column vector of residuals Centered about the mean OHS - 24

Estimation of regression coefficients Minimisation of residual sum of squares  T  :  YX  YX   T  = (Y - X  ) T (Y - X  )   bDifferentiating with respect to  and replacing  by its estimate b normal equations: XXbXY X T Xb = X T Y bFor b it follows: bXXXY b = (X T X) -1 X T Y b  with: E[b] =  b =   2 (XX) Cov(b) =   2 (X T X) -1 OHS - 25

Analysis of variance table (ANOVA) Total sum of squares about the mean = regression sum of squares + + residual sum of squares Total sum of squares about the mean = regression sum of squares + + residual sum of squares Coefficient of determination = R m 2 = S R /S Y = 1 - S e /S Y Coefficient of determination = R m 2 = S R /S Y = 1 - S e /S Y OHS - 26

Coefficient of determination From ANOVA table Coefficient of determination R m 2 R m 2 = S R /S Y = 1 - S e /S Y Coefficient of determination adjusted for number of independent variables in regression R ma 2 R ma 2 = 1 - MS e /MS Y = 1 - (1 - R m 2 ).(n - 1)/(n - p - 1) OHS - 27

Comments Points of concern in using multiple regression: –can a linear model be used –what independent variables should be included Independent variables may be mutually correlated –investigate through the correlation matrix Retaining variables in regression that are highly correlated complicate interpretation of regression coefficients, with physically nonsense values Apply stepwise regression to select the “best” regression equation In stepwise regression a distinction can be made between “free” and “forced” variables; May May enter regression dependent on correlation Will Will enter regression irrespective of correlation OHS - 28

Non-linear models By transformation non-linear models can be transformed to linear models, e.g. Y =  X  to: ln Y = ln  +  ln X or: Y T =  T +  T X T where: Y T = ln Y X T = ln X  T = ln   T =  Remarks: –The transformed residual sum of squares is minimised rather than the residual sum of squares –Error term is additive in the transformed state, i.e. multiplicative in the power model:  T = ln  OHS - 29

Filling-in missing data Filling-in of missing water level and rainfall data in previous modules Filling in of discharge data using regression relation with rainfall often suitable for monthly, seasonal or annual data Monthly regression model e.g.: Q k,m = a k + b 1k P k,m + b 2k P k-1,m + s e,k e Addition of random component yes or no –Note: E[e] = 0, hence for single value no random component –For longer in-filling: could be considered dependent on use as no addition reduces the variance of series Regression model for month k, computing values for Q in year m OHS - 30

Type of regression model for filling-in missing flows Previously the following rainfall-discharge relation was proposed: Often regression coefficients do not vary much from month to month, but rather with wetness of month. Two sets of parameters are used in a regression model for all or a number of months: –one set for dry conditions –another set for wet conditions In the latter approach the non-linear relationship is fitted by two linear models Q k,m = a k + b 1k P k,m + b 2k P k-1,m + s e,k e OHS - 31