Linear Regression Modelling

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Properties of Least Squares Regression Coefficients
Topic 12: Multiple Linear Regression
The Simple Regression Model
CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Correlation and regression
The Simple Linear Regression Model: Specification and Estimation
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.
Topic4 Ordinary Least Squares. Suppose that X is a non-random variable Y is a random variable that is affected by X in a linear fashion and by the random.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Simple Linear Regression and Correlation
Meta-Analysis and Meta- Regression Airport Noise and Home Values J.P. Nelson (2004). “Meta-Analysis of Airport Noise and Hedonic Property Values: Problems.
Variance and covariance Sums of squares General linear models.
Objectives of Multiple Regression
Chapter 11 Simple Regression
Simple Linear Regression Models
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
The simple linear regression model and parameter estimation
Chapter 11: Linear Regression and Correlation
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
Modeling in R Sanna Härkönen.
Part 5 - Chapter
Ch. 2: The Simple Regression Model
Chapter 5 STATISTICS (PART 4).
Evgeniya Anatolievna Kolomak, Professor
ECONOMETRICS DR. DEEPTI.
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
BIVARIATE REGRESSION AND CORRELATION
Simple Linear Regression - Introduction
Ch. 2: The Simple Regression Model
The regression model in matrix form
Regression Models - Introduction
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
J.-F. Pâris University of Houston
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Product moment correlation
Ch 4.1 & 4.2 Two dimensions concept
Linear Regression Summer School IFPRI
Created by Erin Hodgess, Houston, Texas
Topic 11: Matrix Approach to Linear Regression
Regression Models - Introduction
Presentation transcript:

Linear Regression Modelling

Linear Regression Modelling In statistics, linear regression is an approach for modelling the relationship between a scalar dependent variable (y) and one or more explanatory variables (or independent variables) denoted by x. (explain dependent-independent variables and link to stochastic and non stochastic)

OLS Ordinary Least Squares is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the sum of the squares of the differences between the observed responses (values of the variable being predicted) in the given dataset and those predicted by linear function of a set of explanatory variables.

Least Squares Adjustment We have to identify the vector of observations: Sn,1 = 𝑠1 𝑠2 … 𝑠𝑛 And the vector of unknowns values (or estimates, dimension m). Xm,1 = 𝑥1 𝑥2 … 𝑥𝑚 The variables are part of the functional model that explains observations + corrections (also called residuals) as functions of the unknown values. The functions applied here may be linear or non-linear with respect to the unknown values. Redundancy r = n-m

Least Squares Adjustment 2. We create our functional model. The functional model expresses observations using functions of unknown values: S 1 + v 1 = a 11 x 1 + a 12 x 2+ … + a 1 m S 2 + v 2 = a 21 x 1 + a 22 x 2+ … + a 2 m ….. S n + v n = a n1 x 1 + a n2 x 2+ … + a n m Or in Matrix: S + V = A X …with the vector of corrections (or residuals) Vn,1 = 𝑣1 𝑣2 … 𝑣𝑚

Least Squares Adjustment Following this, the quantities s and a are known while x and v are unknown. S 1 + v 1 = a 11 x 1 + a 12 x 2+ … + a 1 m x n S 2 + v 2 = a 21 x 1 + a 22 x 2+ … + a 2 m x n ….. S n + v n = a n1 x 1 + a n2 x 2+ … + a n m x n First 𝑥 will be determined If v is desired, it is calculated by v= A 𝑥 - s

Least Squares Adjustment 3. Stochastical model The stochastical model expresses assumptions about the stochastical properties of the data: Type I. Independent Observations with Unique Var Type II. Independent Observations with difference variances Type III. Non independent observations (covariances)

Least Squares Adjustment Type I. Independent observations with Unique variance

Least Squares Adjustment Type II. Independent Observations with difference variances

Least Squares Adjustment 𝑥 = (AT P A)-1 AT P s V= A 𝑥 - s 𝑥  matrix of estimated coefficients A  coefficient matrix P  weight matrix (Type II) S  observed results

Least Squares Adjustment Application of Results Once you have calculated the results of adjustment (x1,x2,x3…) you can calculate any value.

OLS – Example 1 z= a0 + a1 t +a2 t2+ a3 t3 z= f(x , y) A specific variable (s) is measured 8 times at 8 different times of the day (t). Our goal is to define a mathematical model that represents the best the temporal distribution of this variable, in order to be able to calculate the value for any other time, in between those measured. We try to fit a third order polynomial to our dataset: z= a0 + a1 t +a2 t2+ a3 t3 z= f(x , y) z  measured parameter (stochastic value) t time (non stochastic values) t 1 3 5 7 8 10 12 15 s 6 2

OLS – Example 2 z= a0 + a1 x +a2 y z= f(x , y) We have measured some heights with a GPS device and we want to fix the following mathematical formula to this point cloud to obtain a continuous surface. z= a0 + a1 x +a2 y z= f(x , y) z  measured heights (stochastic value) x, y  coordinates (non stochastic values)

S= matrix(c(3,2,9.4),nrow=1) t=matrix(c(1,5,2,5,3,2,9,4),nrow=2,byrow=TRUE) plot(t,S) A0= matrix(c(1,1,1,1,1,1,1,1)) A1= matrix(t) A2= matrix(t**2) A3= matrix(t**3) A=cbind(A0,A1,A2,A3) X_hat= (solve(t(A)%*% A))%*%t(A)%*%t(S) adjustment= curve((X_hat[1]+X_hat[2]*x+X_hat[3]*x**2+X_hat[4]*x**3), from=0,to=15) # y1 <- pnorm(x) # y2 <- pnorm(x,1,1) plot(t, S, type="p",col="black",ylim=c(0,6)) lines(adjustment,type="l",col="red")

OLS - Example We measure heights in 5 different positions, and we obtain the following results: x y z 2 1 3 4 5 9 Which function, in terms of a0, a1, a2 represent best our available dataset, minimizing the error of the adjustment (assuming that all measurements are subject to the same error)?

OLS - Example x <- c(2 , 4 , 3 , 5 ) y <- c(1 , 5 , 2 , 5 ) rate <- c(3 , 2 , 9 , 4) rate= a0+(x)a1+(y)a2 fit = lm(rate ~ x+y) attributes(fit) fit$coefficients residuals (fit)

Background Also called regression modelling Ordinary Least Squares (OLS) modelling A variable responds (Response - y) to changes in different explanatory variables (Terms -x) E.g. Crop yield increases as a result of an increase in fertilisation Assumptions (of linear model): Independence of samples (objects) ** AUTOCORRELATION Normal distribution of residuals Homoscedasticity of variance

Homo/Hetero-scedasticity Image Sources: Wikipedia Implies that the model fitted cannot be evenly applied across the dataset

OLS the model http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/regression-analysis-basics.htm

Exercise 2 Using the internal R Data set “trees” We will look at whether an increase in volume is dependent on the height of the trees, or the trees girth or an interaction of these two factors Consider the formulation of a regression test: y = ax+b Which is the response variable which are the explanatory variables? Then perform a linear regression test

Reference material https://en.wikipedia.org/wiki/Linear_model https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient