By Matt Bogard, M.S. May 12, 2011.  Single Variable Regression  Multivariable Regression  Logistic Regression  Data Mining vs. Classical Statistics.

Slides:



Advertisements
Similar presentations
Econometric Modeling Through EViews and EXCEL
Advertisements

Managerial Economics in a Global Economy
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Brief introduction on Logistic Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
The General Linear Model. The Simple Linear Model Linear Regression.
Chapter 13 Multiple Regression
Multiple regression analysis
x – independent variable (input)
Chapter 10 Simple Regression.
Chapter 12 Multiple Regression
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Topic 3: Regression.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 732G21/732A35/732G28. Formal statement  Y i is i th response value  β 0 β 1 model parameters, regression parameters (intercept, slope)  X i is i.
Linear Regression/Correlation
Classification and Prediction: Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 15 Basics of Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
5.2 Input Selection 5.3 Stopped Training
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Logistic Regression Database Marketing Instructor: N. Kumar.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Chapter 13 Multiple Regression
Regression Analysis Relationship with one independent variable.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Logistic Regression Analysis Gerrit Rooks
Machine Learning 5. Parametric Methods.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Chapter 13 Simple Linear Regression
Chapter 7. Classification and Prediction
Regression Analysis AGEC 784.
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Deep Feedforward Networks
John Loucks St. Edward’s University . SLIDES . BY.
Relationship with one independent variable
Regression Analysis Week 4.
6-1 Introduction To Empirical Models
Presenter: Georgi Nalbantov
Relationship with one independent variable
Product moment correlation
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

By Matt Bogard, M.S. May 12, 2011

 Single Variable Regression  Multivariable Regression  Logistic Regression  Data Mining vs. Classical Statistics  Decision Trees  Neural Networks

 Can we describe this relationship with an equation for a line?  Fitting a line to the data- gives us the equation (regression equation)  How well does this line fit our data? How well does it describe the relationship between the variables (x & y)?  Interpretation & Inference

 The goal then is to minimize the sum of squared residuals. That is minimize:  ∑ e i 2 = ∑ (y i - b 0 - b 1 X i ) 2 with respect to b 0 and b 1.  This can be accomplished by taking the partial derivatives of ∑ e i 2 with respect to each coefficient and setting it equal to zero.  ∂ ∑ e i 2 / ∂ b 0 = 2 ∑ (y i - b 0 - b 1 X i ) (-1) = 0  ∂ ∑ e i 2 / ∂ b 1 = 2 ∑(y i - b 0 - b 1 X i ) (-X i ) = 0  Solving for b 0 and b 1 yields:  b 0 = y bar - b 1 ∑ X bar )  b 1 = ∑ (( X i - Xbar) (y i - ybar)) / ∑ ( X i - Xbar) = SS( X,y) / SS(X).

Larger R 2 -> better fit Larger F -> significance of β (model)  R 2 = SSR/SST  MSR = SSR/df  MSE = SSE/df  F =MSR/MSE  E(MSE) = σ 2  E(MSR) = σ 2 + β∑(x-xbar) 2  If β = 0 then F = σ 2 / σ 2  If β ≠ 0 then F=[σ 2 +β∑(x-xbar) 2 ]/ σ 2

 VAR(b j ) & SE(b j )  (will discuss later)  Test Ho: b j = β o  t= (b j -β o )/ SE(b j )  Note: if Ho: b j = 0 = β o then t = (b j / SE(b j ) and gives the same results as the F-test in a single variable regression

LIBNAME ADHOC ‘file path’; /* GET DATA */ DATA LSD; INPUT SCORE CONC; CARDS; ; RUN; PROC REG; MODEL SCORE=CONC; PLOT SCORE*CONC; /*PLOTS REGRESSION LINE FIT TO DATA*/ RUN; QUIT;

PROC GLM DATA = LSD; MODEL SCORE=CONC; RUN; QUIT;

 y = bo + b 1 X 1 + b 2 X 2 + e  Often viewed in the context of matrices  Represented by y = bX + e  b = (X T X) -1 X T y = X T y / (X’X) ~ S(XY)/SS(x)  b <- solve( t(x) %*% x )%*%( t(x) %*% y )

 (1) E(y|x) = βX ‘we are estimating a linear approximation to the conditional expectation of y '  (2) E(e) = 0 ‘white noise error terms’  (3) VAR(e) = σ 2 I ‘constant variance’ ‘no heteroskedasticity & no serial correlation  (4) Rank(X) = k ‘no perfect multicollinearity’

 Why are we concerned with the error terms?  Recall b = (X’X) -1 X’y hence our b estimate does not depend on e and E(b) = β  VAR(b) = s 2 (X’X) -1 where s 2 = MSE = e’e/n-k ~∑e i 2 /n-1  Note SE(b) = √ VAR(b) and t= (b j -β o )/ SE(b j )  Note F = MSR/MSE and E(MSE) = σ 2  If we have σ i 2 vs σ 2 then we run into issues related to hypothesis testing and making inferences

 Maybe rank(X) = k, but there is still some correlation between the X variables in the regression.

 Blue  b x1  Green  b x2  Red -> correlation between x1 and x2  As corr(x1,x2) increases, blue and green decrease, red increases (circles overlap)  Decreased information used to estimate b’s, leads to increased variance in the estimates

 R 2 : b’s jointly can still explain variation in y.  Research: inferences about the specific relationship between X 1 and Y,  rely on SE(b) which are inflated by multicollinearity  Forecasting/Prediction: we are more comfortable with multicollinearity (Greene,1990; Kennedy, 2003; Studenmund; 2001)

DATA REG; INPUT INTEREST INFLATION INVESTMENT ; CARDS; ; RUN;

PROC REG DATA = REG; MODEL INVESTMENT = INTEREST INFLATION/VIF ; RUN; QUIT;

 Ex: retention = Y/N  If y = { 0 or 1} then E[y|X] = P i a probability interpretation  Estimated probabilities outside (0,1)  e~binomial  var(e) = n*p*(1-p) which violates assumption of uniform variance

 Note however, despite theoretical concerns, OLS is used quite often without practical implications  Example: Statistical Alternatives for Studying College Student Retention: A Comparative Analysis of Logit, Probit, and Linear Regression. Dey & Astin. Research in Higher Education Vol 34 No 5, 1993.

D i = probability (y|x) = 1 = e Xβ / ( 1 + e Xβ )

 Choose β’s to maximize the likelihood of the sample being observed.  Maximizes the likelihood that data comes from a ‘real world’ characterized by one set of β’s vs another.

 L(β) = ∏ e Xβ / (1 + e Xβ ) ∏ 1/(1 + e Xβ )  the product of densities which give p(y=1) and p(y=0)  Take ln of both sides, choose β to maximize, → β MLE

 NOT minimizing sums of squares, not fitting a line to data  NO R 2  Changes in Log-likelihood are compared for full vs. restricted models to provide measures of ‘deviance’  Deviance is used for fit statistics such as AIC, chi-square test, pseudo- r- square

from Applied Choice Analysis. Hensher, Rose & Greene  Based on ratios of deviance for full vs. restricted model. Not directly comparable to R 2 from OLS

 % correct predictions  % correct 1’s  % correct 0’s

 β = change in the log of odds of y given a change in X  e β = odds ratio

PROC LOGISTIC

ODS GRAPHICS ON; ODS HTML; PROC LOGISTIC DATA = ADHOC.LOGIT PLOTS = ROC OUTMODEL = MODEL1; MODEL CLASS = X1 X2/ RSQ LACKFIT; SCORE OUT = SCORE1 FITSTAT; RUN; QUIT;

 “There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown”  "Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems." From Statistical Modeling: The Two Cultures. Statistical Science 2001, Vol. 16, No. 3, 199–231. Leo Breiman.

 Classical Statistics: Focus is on hypothesis testing of causes and effects and interpretability of models. Model Choice is based on parameter significance and In-sample Goodness-of-fit.  Example: Regression, Logit/Probit, Duration Models, Discriminant Analysis Machine Learning: Focus is on Predictive Accuracy even in the face of lack of interpretability of models. Model Choice is based on Cross Validation of Predictive Accuracy using Partitioned Data Sets.  Example : Classification and Regression Trees, Neural Nets, K- Nearest Neighbors, Association Rules, Cluster Analysis

 ‘prediction error over an independent test sample’  A function of the bias and variance a model exhibits across multiple data sets  There is a bias- variance trade off related to model complexity

 Partition data into training, validation, and test samples (if data is sufficient)  Other methods: k-fold cross validation, random forests, ensemble models  Choose inputs (and model specification) that optimizes model performance on test and validation data

 “Tree-based methods partition the feature space into a set of rectangles, and then fit a simple model (like a constant) in each one.” (Trevor Hastie, Robert Tibshirani & Jerome Friedman, 2009)

 Each split creates a cross tabulation  The split is evaluated with a chi-square  Pearson's Chi-squared test  data: tab1 X-squared = , df = 1, p-value = 4.546e-13

 A nonlinear model of complex relationships composed of multiple 'hidden' layers (similar to composite functions)  Y = f(g(h(x)) or  x -> hidden layers ->Y

 ACTIVATION FUNCTION: formula used for transforming values from inputs and the outputs in a neural network.  COMBINATION FUNCTION: formula used for combining transformed values from activation functions in neural networks.  HIDDEN LAYER: The layer between input and output layers in a neural network.  RADIAL BASIS FUNCTION: A combination function that is based on the Euclidean distance between inputs and weights

 Hidden Layer: h 1 = logit(w 10 +w 11 x 1 + w 12 x 2 ) h 2 = logit(w 20 +w 21 x 1 + w 22 x 2 ) h 3 = logit(w 30 +w 31 x 1 + w 32 x 2 ) h 4 = logit(w 40 +w 41 x 1 + w 42 x 2 )  Output Layer: Y= W 0 + W 1 h 1 + W 2 h 2 + W 3 h 3 + W 4 h 4

 There is no ‘theoretically sound’ criteria for architecture selection in terms of the # of hidden units & hidden layers  The Autoneural node ‘automates’ some of these choices to a limited extent  SAS Global Forum 2011: there was a presentation utilizing genetic algorithms  Neural Networks don’t address model selection – typically pre-filter inputs via use of decision trees & regression nodes  Interpretation is a challenge- finance companies employ them for marketing purposes don’t use in areas subject to litigation (loan approvals)

 Selection of Target Sites for Mobile DNA Integration in the Human Genome Berry C, Hannenhalli S, Leipzig J, Bushman FD, 2006 Selection of Target Sites for Mobile DNA Integration in the Human Genome. PLoS Comput Biol 2(11): e157. doi: /journal.pcbi (supporting information Text S1.)  Econometric Analysis. William H. Greene  A Guide to Econometrics. Kennedy. 5 th Ed  Statistical Alternatives for Studying College Student Retention: A Comparative Analysis of Logit, Probit, and Linear Regression. Dey & Astin. Research in Higher Education Vol 34 No 5,  Statistical Modeling: The Two Cultures. Statistical Science 2001, Vol. 16, No. 3, 199–231. Leo Breiman.  Applied Choice Analysis. Hensher, Rose & Greene  The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition Trevor Hastie, Robert Tibshirani & Jerome Friedman The Elements of Statistical Learning:  A Course in Econometrics. Arthur S. Goldberger  SAS Enterprise Miner  R Statistical Package