Regression Techniques

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

The Role of r2 in Regression Target Goal: I can use r2 to explain the variation of y that is explained by the LSRL. D4: 3.2b Hw: pg 191 – 43, 46, 48,
Chapter 12 Simple Linear Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Chapter 12 Multiple Regression
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Chapter Topics Types of Regression Models
Business Statistics - QBM117 Least squares regression.
Classification and Prediction: Regression Analysis
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Introduction to Linear Regression and Correlation Analysis
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 6 & 7 Linear Regression & Correlation
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Bivariate Distributions Overview. I. Exploring Data Describing patterns and departures from patterns (20%-30%) Exploring analysis of data makes use of.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Regression. Population Covariance and Correlation.
Examining Relationships in Quantitative Research
Logistic Regression Database Marketing Instructor: N. Kumar.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Chapter 16 Data Analysis: Testing for Associations.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
AP STATISTICS LESSON 3 – 3 (DAY 2) The role of r 2 in regression.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Linear Regression
Multiple Regression.
The simple linear regression model and parameter estimation
Chapter 4: Basic Estimation Techniques
REGRESSION (R2).
Regression Analysis Module 3.
Basic Estimation Techniques
Linear Regression and Correlation Analysis
Correlation and Regression
Relationship with one independent variable
BPK 304W Correlation.
Correlation and Simple Linear Regression
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Multiple Regression.
Prepared by Lee Revere and John Large
AP STATISTICS LESSON 3 – 3 (DAY 2)
Least-Squares Regression
Correlation and Simple Linear Regression
Relationship with one independent variable
Least-Squares Regression
Least-Squares Regression
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Regression Techniques

Linear Regression A technique used to predict/model a continuous value from a set of continuous independent variables and their response/outcomes. The linear relationship can be shown as: y =𝑚𝑥+𝑐 Here, x is the independent variable which controls the dependent/outcome variable y m is the slope of the regression line and c is the intercept term

One variable linear regression: An example Years of Experience Salary ($K) 3 30 8 57 9 64 13 72 36 6 43 11 59 21 90 1 20 16 83 One variable linear regression: An example y = salary ($K) and x = experience (years) If an employee has 10 years of experience, what will be his expected salary? Target = minm (actual salary – expectation) y=𝑚𝑥+𝑐 (m = ? and c = ?) Regression coefficient, 𝑚= 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 ) ( 𝑦 𝑖 − 𝑦 ) 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 ) 2 Intercept, c= 𝑦 − 𝑚 𝑥

One variable linear regression Best model will have least error = minm (actual salary – expectation) 𝜀 𝑖 = 𝑦 𝑖 −𝑚 𝑥 𝑖 −𝑐 at i-th value 𝜀 1 = 𝑦 1 −3.5 𝑥 1 −23.6 =30 −3.5∗3 −23.6=−4.1 Sum of squared error, SSE = 𝑖−1 𝑁 𝜀 𝑖 2 Root-mean-squared error, RMSE = ( 𝑖−1 𝑁 𝜀 𝑖 2 𝑁 ) = 𝑆𝑆𝐸 𝑁

Goodness of fit Variability explained by the regression model can be explained by the R2 ( 𝑅 2 = 1- SSE/SST) 𝑅 2 =1− 𝑖=1 𝑁 ( 𝑓(𝑥 𝑖 )− 𝑦 ) 2 𝑖=1 𝑁 ( 𝑦 𝑖 − 𝑦 ) 2 = 1 - ( 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 )( 𝑦 𝑖 − 𝑦 )) 2 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 ) 2 𝑖=1 𝑁 ( 𝑦 𝑖 − 𝑦 ) 2 𝑅 2 varies from 0 to 1 𝑅 2 = 1 means perfect prediction with SSE = 0 𝑅 2 = 0 means poor prediction through average line (SSE = SST) Correlation co-efficient, r measures the strength of linear relationship or correlation between y and x r= 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 ) (𝑦 𝑖 − 𝑦 ) 𝑖=1 𝑁 ( 𝑥 𝑖 − 𝑥 ) 2 𝑖=1 𝑁 ( 𝑦 𝑖 − 𝑦 ) 2 = 𝑅 2

linear regression: an example Intercept Estimate – estimated co-efficient from linear regression std. error – variation of the coefficient from estimated value t value – ratio of estimate over std. error (target is to achieve higher t- value) pr (>t) – possibility of the estimate value close to 0 (target is to achieve a minimum pr value) Multiple R2 Adjusted R2 – correction factor for the number of explanatory variables 𝑅 2 =1−( 𝑁−1 𝑁−𝑑 )(1− 𝑅 2 ) , d = total variables P-value – significance of this model (the lower the better)

Exploring the Predictor variables

Relation between dependent and independent variables: Corr and R2

Multiple Linear Regression A set of predictor variables controlling the outcome, y The linear regression function can be extended as: y= 𝑚 1 𝑥 1 + 𝑚 2 𝑥 2 + …..+ 𝑐

Combining multiple variables for Regression Multiple co-linearity can impact on R2 Adding Year with Age or WinterRain with Age have similar impact, but removing any of these can degrade the model performance

Considering All variables for Regression Combining all variables have reduced the adjusted R2 Not all variables are significant Best model will be the one with maxm R2 and minm SSE However, an R2 = 1 is not always an indicator of a good predictor of the dependent variable Further testing on independent dataset can ensure the reliability of the regression model

Independent testing of the Model The best model has 4 Variables (AGST, Age, HarvestRain and WinterRain) And R2 = 0.83 Tested the best performing regression model on a new dataset SSE = 0.069 SST = 0.336 R2 = 1- SSE/SST = 0.7944

Regression with Categorical variables Linear regression is inapplicable to situations where response variables are categorical such as yes/no, fail/pass, alive/dead, good/bad Logistic regression – models the probability of occurrences of some categorical event as a linear function of a set of predictor variables The model assumes that the response variable Y follows a binomial distribution of logistic curve or ‘S’ shape curve log 𝑒 𝑌 1−𝑌 = 𝛽 0 + 𝑖=1 𝑁 𝛽 𝑖 𝑥 𝑖 converts to Y= 1 1+ 𝑒 −(𝛽 0 + 𝑖=1 𝑁 𝛽 𝑖 𝑥 𝑖 ) N = no. of predictor variables Y can be varied between 0 and 1 Y (𝛽 0 + 𝑖=1 𝑁 𝛽 𝑖 𝑥 𝑖 )

AN example Credit classification dataset https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/

Attributes of the Dataset

Trends in Good and Bad credit approval group

Finding facts Is there any trend/pattern within the dataset that can link to Good/Bad credit?

The Logistic regression model Considering Age, Loan amount and Duration of current credit account as predictors to predict the probability of Good or Bad Loan AIC is a factor computed from the ratio of number of variables over the number of observations