Download presentation
Presentation is loading. Please wait.
1
Linear Regression Modelling
2
Linear Regression Modelling
In statistics, linear regression is an approach for modelling the relationship between a scalar dependent variable (y) and one or more explanatory variables (or independent variables) denoted by x. (explain dependent-independent variables and link to stochastic and non stochastic)
3
OLS Ordinary Least Squares is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the sum of the squares of the differences between the observed responses (values of the variable being predicted) in the given dataset and those predicted by linear function of a set of explanatory variables.
4
Least Squares Adjustment
We have to identify the vector of observations: Sn,1 = π 1 π 2 β¦ π π And the vector of unknowns values (or estimates, dimension m). Xm,1 = π₯1 π₯2 β¦ π₯π The variables are part of the functional model that explains observations + corrections (also called residuals) as functions of the unknown values. The functions applied here may be linear or non-linear with respect to the unknown values. Redundancy r = n-m
5
Least Squares Adjustment
2. We create our functional model. The functional model expresses observations using functions of unknown values: S 1 + v 1 = a 11 x 1 + a 12 x 2+ β¦ + a 1 m S 2 + v 2 = a 21 x 1 + a 22 x 2+ β¦ + a 2 m β¦.. S n + v n = a n1 x 1 + a n2 x 2+ β¦ + a n m Or in Matrix: S + V = A X β¦with the vector of corrections (or residuals) Vn,1 = π£1 π£2 β¦ π£π
6
Least Squares Adjustment
Following this, the quantities s and a are known while x and v are unknown. S 1 + v 1 = a 11 x 1 + a 12 x 2+ β¦ + a 1 m x n S 2 + v 2 = a 21 x 1 + a 22 x 2+ β¦ + a 2 m x n β¦.. S n + v n = a n1 x 1 + a n2 x 2+ β¦ + a n m x n First π₯ will be determined If v is desired, it is calculated by v= A π₯ - s
7
Least Squares Adjustment
3. Stochastical model The stochastical model expresses assumptions about the stochastical properties of the data: Type I. Independent Observations with Unique Var Type II. Independent Observations with difference variances Type III. Non independent observations (covariances)
8
Least Squares Adjustment
Type I. Independent observations with Unique variance
9
Least Squares Adjustment
Type II. Independent Observations with difference variances
10
Least Squares Adjustment
π₯ = (AT P A)-1 AT P s V= A π₯ - s π₯ ο matrix of estimated coefficients A ο coefficient matrix P ο weight matrix (Type II) S ο observed results
11
Least Squares Adjustment
Application of Results Once you have calculated the results of adjustment (x1,x2,x3β¦) you can calculate any value.
12
OLS β Example 1 z= a0 + a1 t +a2 t2+ a3 t3 z= f(x , y)
A specific variable (s) is measured 8 times at 8 different times of the day (t). Our goal is to define a mathematical model that represents the best the temporal distribution of this variable, in order to be able to calculate the value for any other time, in between those measured. We try to fit a third order polynomial to our dataset: z= a0 + a1 t +a2 t2+ a3 t z= f(x , y) z ο measured parameter (stochastic value) tο time (non stochastic values) t 1 3 5 7 8 10 12 15 s 6 2
13
OLS β Example 2 z= a0 + a1 x +a2 y z= f(x , y)
We have measured some heights with a GPS device and we want to fix the following mathematical formula to this point cloud to obtain a continuous surface. z= a0 + a1 x +a2 y z= f(x , y) z ο measured heights (stochastic value) x, y ο coordinates (non stochastic values)
14
S= matrix(c(3,2,9.4),nrow=1) t=matrix(c(1,5,2,5,3,2,9,4),nrow=2,byrow=TRUE) plot(t,S) A0= matrix(c(1,1,1,1,1,1,1,1)) A1= matrix(t) A2= matrix(t**2) A3= matrix(t**3) A=cbind(A0,A1,A2,A3) X_hat= (solve(t(A)%*% A))%*%t(A)%*%t(S) adjustment= curve((X_hat[1]+X_hat[2]*x+X_hat[3]*x**2+X_hat[4]*x**3), from=0,to=15) # y1 <- pnorm(x) # y2 <- pnorm(x,1,1) plot(t, S, type="p",col="black",ylim=c(0,6)) lines(adjustment,type="l",col="red")
15
OLS - Example We measure heights in 5 different positions, and we obtain the following results: x y z 2 1 3 4 5 9 Which function, in terms of a0, a1, a2 represent best our available dataset, minimizing the error of the adjustment (assuming that all measurements are subject to the same error)?
16
OLS - Example x <- c(2 , 4 , 3 , 5 ) y <- c(1 , 5 , 2 , 5 ) rate <- c(3 , 2 , 9 , 4) rate= a0+(x)a1+(y)a2 fit = lm(rate ~ x+y) attributes(fit) fit$coefficients residuals (fit)
17
Background Also called regression modelling
Ordinary Least Squares (OLS) modelling A variable responds (Response - y) to changes in different explanatory variables (Terms -x) E.g. Crop yield increases as a result of an increase in fertilisation Assumptions (of linear model): Independence of samples (objects) ** AUTOCORRELATION Normal distribution of residuals Homoscedasticity of variance
18
Homo/Hetero-scedasticity
Image Sources: Wikipedia Implies that the model fitted cannot be evenly applied across the dataset
19
OLS the model
20
Exercise 2 Using the internal R Data set βtreesβ
We will look at whether an increase in volume is dependent on the height of the trees, or the trees girth or an interaction of these two factors Consider the formulation of a regression test: y = ax+b Which is the response variable which are the explanatory variables? Then perform a linear regression test
21
Reference material https://en.wikipedia.org/wiki/Linear_model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.