1 Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Slides:



Advertisements
Similar presentations
Lecture 4. Linear Models for Regression
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
Chapter Outline 3.1 Introduction
Prediction with Regression
Pattern Recognition and Machine Learning
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
The General Linear Model. The Simple Linear Model Linear Regression.
Data mining and statistical learning - lecture 6
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
The Simple Linear Regression Model: Specification and Estimation
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Hypothesis Testing.
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Statistical Decision Theory, Bayes Classifier
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
The Simple Regression Model
Linear Methods for Classification
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Chapter 11 Multiple Regression.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Classification and Prediction: Regression Analysis
Relationships Among Variables
Ordinary Least Squares
Lecture 5 Correlation and Regression
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Introduction to Linear Regression and Correlation Analysis
Linear methods for regression
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
General Linear Model.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Regularized Least-Squares and Convex Optimization.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
(5) Notes on the Least Squares Estimate
The Simple Linear Regression Model: Specification and Estimation
Virtual COMSATS Inferential Statistics Lecture-26
Regression.
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Bias and Variance of the Estimator
6-1 Introduction To Empirical Models
The regression model in matrix form
Simple Linear Regression
Linear Model Selection and regularization
SOME PROBLEMS THAT MIGHT APPEAR IN DATA
Simple Linear Regression
Simple Linear Regression
Presentation transcript:

1 Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray

2 Assumption: Linear Regression Function Model assumption: Output Y is linear in the inputs X=(X 1, X 2, X 3,…, X p ) Predict the output by: Vector notation, 1 included in X Where, Also known as multiple-regression when p>1

3 Least Square Solution residual Known as least square solution For a new input The regression output is Residual sum of squares: In matrix-vector notation: Vector differentiation: Solution:

4 Bias-Variance Decomposition Estimator: Unbiased estimator!Ex. Show the last step Model:  has zero expectation same variance uncorrelated where Bias: Variance: Decomposition of EPE: Irreducible error=  2 Sq. bias=0 Variance=  2 (p/N) Linear

5 Gauss-Markov Theorem Gauss-Markov Theorem: least square estimate has the minimum variance among all linear unbiased estimators Interpretation: The estimator found by least squares is linear in y We have noticed that this estimator is unbiased, i.e., If we find any other unbiased estimator g(x 0 ) of f(x 0 ) that is linear in y too, i.e., then and Question: Is the LS the best estimator for the given linear additive model?

6 Subset Selection LS solution often has large variance (remember that variance is proportional to the number of inputs p, i.e., model complexity) If we decrease the number of input variables p, we can decrease the variance, however we then sacrifice the zero bias If this trade-off decreases test error, the solution can be accepted This reasoning leads to subset selection, i.e., select a subset from the p inputs for the regression computation Subset selection has another advantage– easy and focused interpretation of the input variables on the output

7 Subset Selection… Can we determine which  j s are insignificant? Yes, we can by statistical hypothesis testing! However, we need a model assumption:  is zero mean Gaussian with standard deviation 

8 Subset Selection: Statistical Significance Test The linear model with additive Gaussian noise has the following properties: Ex. Show this. So we can form a standardized coefficient or Z-score test for each coefficient: and v j is the j th diagonal element of (X T X) -1 Hypothesis testing principle says that a large value of Z-score should retain The coefficient, a small value should discard the coefficient How large/small – depends on the significance level where

9 Case Study: Prostate Cancer Output = log prostate-specific antigen Input = (log cancer volume, log prostate weight, age, log of benign prostatic hyperplacia, seminal vesicle invasion, log of capsular penetration, Gleason score, % of Gleason score 4 or 5) Goal: (1) predict the output given a novel input (2) Interpret the influence of the inputs on the output

10 Case Study… Scatter plot Hard to interpret which ones are most influencing Also we want to find out how the inputs jointly influence the output

11 Subset Selection on Prostate Cancer Data TermCoefficientStd. ErrorZ-score Intercept Lcavol Lweight Age Lbph Svi Lcp Gleasson Pgg Scores with magnitude greater than 2 indicate significant variables at 5% significance level

12 Coefficient Shrinkage: Ridge Regression Method One computational advantage is that the matrix is always invertible If L 2 norm is replaced by L 1 norm, the corresponding regression is called LASSO (see [HTF]) Non-negative penalty

13 Ridge Regression… coefficient Decreasing One way to determine is cross validation – we’ll learn about it later