Classification and Prediction: Regression Analysis

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

Linear Regression.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Prediction with Regression
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 10 Curve Fitting and Regression Analysis
Model Assessment, Selection and Averaging
Model Assessment and Selection
P M V Subbarao Professor Mechanical Engineering Department
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Chapter 13 Multiple Regression
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
Chapter 12 Multiple Regression
Statistical Methods Chichang Jou Tamkang University.
Curve-Fitting Regression
Least Square Regression
The Islamic University of Gaza Faculty of Engineering Civil Engineering Department Numerical Analysis ECIV 3306 Chapter 17 Least Square Regression.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Engineering Computation Curve Fitting 1 Curve Fitting By Least-Squares Regression and Spline Interpolation Part 7.
Ch. 14: The Multiple Regression Model building
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Relationships Among Variables
Calibration & Curve Fitting
Regression and Correlation Methods Judy Zhong Ph.D.
Least-Squares Regression
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Classification and Prediction (cont.) Pertemuan 10 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistical Methods Statistical Methods Descriptive Inferential
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Correlation & Regression Analysis
CpSc 881: Machine Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Machine Learning 5. Parametric Methods.
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
MBF1413 | Quantitative Methods Prepared by Dr Khairul Anuar 8: Time Series Analysis & Forecasting – Part 1
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Multiple Regression.
The simple linear regression model and parameter estimation
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
CSE 4705 Artificial Intelligence
CH 5: Multivariate Methods
Roberto Battiti, Mauro Brunato
Statistical Methods For Engineers
Genetic Algorithms (GA)
Multiple Regression.
What is Regression Analysis?
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Product moment correlation
Presentation transcript:

Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University

What Is Numerical Prediction (a.k.a. Estimation, Forecasting) (Numerical) prediction is similar to classification construct a model use model to predict continuous or ordered value for a given input Prediction is different from classification Classification refers to predicting categorical class label Prediction models continuous-valued functions Major method for prediction: regression model the relationship between one or more independent or predictor variables and a dependent or response variable Regression analysis Linear and multiple regression Non-linear regression Other regression methods: generalized linear model, Poisson regression, log-linear models, regression trees

Linear Regression Linear regression: involves a response variable y and a single predictor variable x  y = w0 + w1 x x y Goal: Using the data estimate weights (parameters) w0 and w1 for the line such that the prediction error is minimized

Linear Regression y x ei Slope = β1 xi Observed Value of y for xi Predicted Value of y for xi Error for this x value Intercept = w0 xi x

Linear Regression Linear regression: involves a response variable y and a single predictor variable x  y = w0 + w1 x The weights w0 (y-intercept) and w1 (slope) are regression coefficients Method of least squares: estimates the best-fitting straight line w0 and w1 are obtained by minimizing the sum of the squared errors (a.k.a. residuals) w1 can be obtained by setting the partial derivative of the SSE to 0 and solving for w1, ultimately resulting in:

Multiple Linear Regression Multiple linear regression: involves more than one predictor variable Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|) Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2 Solvable by extension of least square method Many nonlinear functions can be transformed into the above x1 x2 y

Least Squares Generalization Simple Least Squares: Determine linear coefficients ,  that minimize sum of squared error (SSE). Use standard (multivariate) differential calculus: differentiate SSE with respect to ,  find zeros of each partial differential equation solve for ,  One dimension:

Least Squares Generalization Multiple dimensions To simplify notation and derivation, change  to 0, and add a new feature x0 = 1 to feature vector x: x0 x1 x2 y 1 1 1 1 1

Least Squares Generalization Multiple dimensions Calculate SSE and determine :

Extending Application of Linear Regression The inputs X for linear regression can be: Original quantitative inputs Transformation of quantitative inputs, e.g. log, exp, square root, square, etc. Polynomial transformation example: y = 0 + 1x + 2x2 + 3x3 Dummy coding of categorical inputs Interactions between variables example: x3 = x1  x2 This allows use of linear regression techniques to fit much more complicated non-linear datasets.

Example of fitting polynomial curve with linear model

Regularization Complex models (lots of parameters) are often prone to overfitting Overfitting can be reduced by imposing a constraint on the overall magnitude of the parameters (i.e., by including coefficients as part of the optimization process) Two common types of regularization in linear regression: L2 regularization (a.k.a. ridge regression). Find  which minimizes:  is the regularization parameter: bigger  imposes more constraint L1 regularization (a.k.a. lasso). Find  which minimizes:

Example: Web Traffic Data

1D Poly Fit Example of too much “bias”  underfitting

Example: 1D and 2D Poly Fit

Example: 1D Ploy Fit Example of too much “variance”  overfitting

Bias-Variance Tradeoff

Bias-Variance Tradeoff Possible ways of dealing with high bias Get additional features More complex model (e.g., adding polynomial terms such as x12, x22 , x1.x2, etc.) Use smaller regularization coefficient l. Note: getting more training data won’t necessarily help in this case Possible ways dealing with high variance Use more training instances Reduce the number of features Use simpler models Use a larger regularization coefficient l.

Other Regression-Based Models Generalized linear models Foundation on which linear regression can be applied to modeling categorical response variables Variance of y is a function of the mean value of y, not a constant Logistic regression: models the probability of some event occurring as a linear function of a set of predictor variables Poisson regression: models the data that exhibit a Poisson distribution Log-linear models (for categorical data) Approximate discrete multidimensional prob. distributions Also useful for data compression and smoothing Regression trees and model trees Trees to predict continuous values rather than class labels

Regression Trees and Model Trees Regression tree: proposed in CART system (Breiman et al. 1984) CART: Classification And Regression Trees Each leaf stores a continuous-valued prediction It is the average value of the predicted attribute for the training instances that reach the leaf Model tree: proposed by Quinlan (1992) Each leaf holds a regression model—a multivariate linear equation for the predicted attribute A more general case than regression tree Regression and model trees tend to be more accurate than linear regression when instances are not represented well by simple linear models

Evaluating Numeric Prediction Prediction Accuracy Difference between predicted scores and the actual results (from evaluation set) Typically the accuracy of the model is measured in terms of variance (i.e., average of the squared differences) Common Metrics (pi = predicted target value for test instance i, ai = actual target value for instance i) Mean Absolute Error: Average loss over the test set Root Mean Squared Error: compute the standard deviation (i.e., square root of the co-variance between predicted and actual ratings)