Basics of regression analysis

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Correlation and regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Linear regression models
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
The General Linear Model. The Simple Linear Model Linear Regression.
Variance and covariance M contains the mean Sums of squares General additive models.
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of regression analysis I Purpose of linear models Least-squares solution for linear models Analysis of diagnostics.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Elementary hypothesis testing
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
Factor Analysis Purpose of Factor Analysis
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Maximum likelihood (ML)
Elementary hypothesis testing
Chapter 4 Multiple Regression.
Resampling techniques
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Mixed models Various types of models and their relation
Chapter 11 Multiple Regression.
Ordinary least squares regression (OLS)
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
Linear and generalised linear models
Linear and generalised linear models
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Statistical inference for regression.
Maximum likelihood (ML)
Correlation and Regression Analysis
Linear regression models in matrix terms. The regression function in matrix terms.
Simple Linear Regression Analysis
Variance and covariance Sums of squares General linear models.
Separate multivariate observations
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
Method of Least Squares. Least Squares Method of Least Squares:  Deterministic approach The inputs u(1), u(2),..., u(N) are applied to the system The.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lesson Inference for Regression. Knowledge Objectives Identify the conditions necessary to do inference for regression. Explain what is meant by.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Chapter 8: Simple Linear Regression Yang Zhenlin.
Computacion Inteligente Least-Square Methods for System Identification.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
The simple linear regression model and parameter estimation
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
The regression model in matrix form
OVERVIEW OF LINEAR MODELS
OVERVIEW OF LINEAR MODELS
Presentation transcript:

Basics of regression analysis Purpose of linear models Least-squares solution for linear models Analysis of diagnostics

Reason for linear models Purpose of regression is to reveal statistical relations between input and output variables. Statistics cannot reveal functional relationship. It is purpose of other scientific studies. Statistics can help validation various functional relationship (models). Let us assume that we suspect that functional relationship is where  is a vector of unknown parameters, x=(x1,x2,,,xp) a vector of controllable parameters, and y is output,  is an error associated with the experiment. Then we can set for various values of x experiments and get output (or response) for them. If number of experiments is n then we will have n output values. Denote them as a vector y=(y1,y2,,,yn). Purpose of statistics is to evaluate parameter vector using input and output values. If function f is a linear function of the parameters and errors are additive then we are dealing with linear model. For this model we can write Linear model is linearly dependent on parameters but not on input variables. For example is a linear model. But is not.

Assumptions Basic assumptions for analysis of linear model are: the model is linear in parameters the error structure is additive Random errors have 0 mean, equal variances and they are uncorrelated. These assumptions are sufficient to deal with linear models. Uncorrelated with equal variance assumptions (number 3) can be removed. Then the treatments becomes a little bit more complicated. Note that for general solution normality assumption is not used. This assumption is necessary to design test statistics. If this assumption does not work then we can use bootstrap to design test statistic. These assumptions can be written in a vector form: where y, 0, I,  are vectors and X is a matrix. This matrix is called a design matrix, input matrix etc. I is nxn identity matrix.

One parameter case When we have one variable (x) for predictor and one for response then problem becomes considerable simpler. In this case we have: Now let us assume that we have observations for n values of x (x1,,,xn) the result of response is (y1,,,yn). If we assume that errors are independent, have equal variances and normally distributed then we can use least-squares: If we solve this minimisation problem we get for estimations for coefficients:

One parameter case If you divide denominator and numerator by n2 and use definitions of correlation and standard deviations then the second equation can be written as: In some sense slope is description of correlation between input and output variables. If slope is equal 0 then intercept becomes mean value of observations

Solution for general case Solution to least-squares with linear model and and given assumptions is: Let us show this. If we use the form of the model and write least squares equation (since we want to find solution with minimum least-squares error): and get the first and solve the equation then we can see that this solution is correct. If we use the formula for the solution and the expression of y then we can write: So solution is unbiased. Variance of estimation is: Here we used the form of the solution and the assumption number 3)

Variance To calculate covariance matrix we need to be able to calculate 2. Since it is the variance of the error term we can find it using the form of the solution. For the estimated error (denoted by r) we can write: If we use: It gives Since the matrix M is idempotent and symmetric, i.e. M2=M=MT, we can write: Where n is the number of the observations and p is the number of the fitted parameters. Then for unbiased estimator for the variance of the residual we can write:

Singular case: SVD and psuedoinversion The above given form of the solution is true if matrices X and XTX are non-singular. I.e. the rank of the matrix X is equal to the number of parameters. If it is not true then either singular value decomposition or eignevalue filtering techniques are used. Fortunately most good properties of the linear model remains. Singular value decomposition (SVD): Any nxp matrix can be decomposed in a form: Where U is nxn and V is pxp orthogonal matrices (inverse is equal to transpose). D is nxp diagonal matrix of the singular values. If X is singular then number of non-zero diagonal elements of D is less than p. Then for XTX we can write: DTD is pxp diagonal matrix. If the matrix is non-singular then we can write: Since DTD is a diagonal matrix therefore its inverse is also diagonal matrix. Main trick used in SVD technique for equation solution is that when diagonals are 0 or close to 0 then instead of their inversion zero is used. I.e. pseudo inverse is calculated using:

Singular case: Ridge regression Another technique to deal with singular case is ridge regression. In this technique a constant value added to the diagonal terms XTX before inverting it. Mathematically it is equivalent to Tikhonov regularisation for ill-posed problems. In this technique one of the problems is how to find regularisation parameter - . That is usually by trial and error.

R-squared One of the statistics used for goodness of fit is R-squared. It is defined as: Adjusted R-squared: n - the number of observations, p - the number of paramters

Analysis of diagnostics Residuals and hat matrix: Residuals are differences between observation and fitted values: H is called a hat matrix. Diagonal terms hi are leverage of the observations. If these values are close to one then that fitted value is determined by this observation. Sometimes hi’=hi/(1-hi) is used to enhance high leverages. Q-Q plot can be used to check normality assumption. Q-Q plot is plot of quantiles of two distributions. If assumption on distribution is correct then this plot should be nearly linear. If the distribution is normal then tests designed for normal distributions can be used. Otherwise bootstrap can be used to derive desired distributions.

Analysis of diagnostics: Cont. Other analysis tools include: Where hi is leverage, hi’ is enhanced leverage, s2 is unbiased estimator of 2, si2 is unbiased estimator of 2 after removal of i-th observation

Bootstrap Simplest application of bootstrap for this problem is as follows: Calculate residuals using Sample with replacement from the residual vector and denote them rrandom Design new “observations” using Estimate parameters Repeat steps 2 3 and 4 Estimate bootstrap estimation, variances, covariance matrix or the distribution Another technique for bootstrapping is: Resample observations and corresponding row of the design matrix simultaneously - (yi,x1i,x2i,,,,xpi),i=1,n. It meant to be less sensitive to misspecified models. Note that for some samples, the matrix may become singular and problem may become ill defined.

Limitations Statistical not causal link Regression analysis gives statistical links and may not say anything about causal link. An example: If you do regression analysis where number of fire-fighters are predictor variables and fire size is response then you will certainly find good fit. But it does not mean that bigger fires are caused by the number of fire-fighters. Processes are rarely linear In nature it is rarely case that processes are linear. Have a look example anscombe in R. It demonstrates it very nicely. Linearity works in explaining processes in the vicinity of a paint or in small regions of the space. If linearity does not hold then non-linear regressions might be more useful Assumptions may not hold Some or all assumption may not hold. If violation is the assumption about normality then it is not very serious usually. Most statistics are robust (F, t and other stats) in this sense. If this is serious then Maximum likelihood could b used. Independence and equal variance assumptions also should be carefully analysed. Violation of independence assumption might be very serious.

R commands R command lm - general linear model lm.ridge - ridge regression

References Berthold, M. and Hand, DJ (2003) “Intelligent data analysis” Stuart, A., Ord, JK, and Arnold, S. (1991) Kendall’s advanced Theory of statistics. Volume 2A. Classical Inference and the Linear models. Arnold publisher, London, Sydney, Auckland