Chapter 4: Introduction to Predictive Modeling: Regressions

Name: Chapter 4: Introduction to Predictive Modeling: Regressions
Uploaded: 2017-09-06T07:09:57+00:00
Duration: PTM38S40
Channel: Elinor Douglas
Description: Chapter 4: Introduction to Predictive Modeling: Regressions

Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...

Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Best model from sequence Optimize complexity Optimize complexity. ...

Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity.

Linear Regression Prediction Formula
input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize: ∑( yi – yi )2 training data ^ squared error function ...

Linear Regression Prediction Formula
input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize. ∑( yi – yi )2 training data ^ squared error function ...

Logistic Regression Prediction Formula
^ log p 1 – p ( ) ^ ^ ^ = w0 + w1 x1 + w2 x2 logit scores ...

( ) Logit Link Function log p 1 – p = w0 + w1 x1 + w2 x2 · · ^ ^ ^ ^
logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ...

( ) Logit Link Function log p 1 – p logit( p ) = w0 + w1 x1 + w2 x2 ·
^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = w0 + w1 x1 + w2 x2 = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^ ...

Logit Link Function logit scores ...

Simple Prediction Illustration – Regressions
Predict dot color for each x1 and x2. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...

0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...

0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Find parameter estimates by maximizing log-likelihood function ...

0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2. ...

4.01 Multiple Choice Poll What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

4.01 Multiple Choice Poll – Correct Answer
What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

Regressions: Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

Missing Values and Regression Modeling
Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...

Training Data inputs target Consequence: missing values can significantly reduce your amount of training data for regression modeling! Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...

Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs ...

Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values. ...

Missing Values and the Prediction Formula
Problem 2: Prediction formulas cannot score cases with missing values. ...

Missing Value Issues Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values. ...

Missing Value Causes Manage missing values. Non-applicable measurement
No match on merge Non-disclosed measurement ...

Missing Value Remedies
Manage missing values. Synthetic distribution Non-applicable measurement No match on merge Estimation xi = f(x1, … ,xp) Non-disclosed measurement ...

Managing Missing Values
This demonstration illustrates how to impute synthetic data values and create missing value indicators.

Running the Regression Node
This demonstration illustrates using the Regression tool.

Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

Sequential Selection – Forward
Input p-value Entry Cutoff ...

Sequential Selection – Forward
Input p-value Entry Cutoff

Sequential Selection – Backward
Input p-value Stay Cutoff ...

Sequential Selection – Backward
Input p-value Stay Cutoff

Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

4.02 Poll The three sequential selection methods for building regression models can never lead to the same model for the same set of data.  True  False Type answer here

4.02 Poll – Correct Answer The three sequential selection methods for building regression models can never lead to the same model for the same set of data.  True  False Type answer here

Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the model.

Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...

Model Fit versus Complexity
Model fit statistic 2 6 Evaluate each sequence step. 3 5 validation 4 training 1 ...

Select Model with Optimal Validation Fit
Model fit statistic Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6 ...

Optimizing Complexity
This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

Beyond the Prediction Formula

Manage missing values Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

Logistic Regression Prediction Formula
= w0 + w1 x1 + w2 x2 ^ log p 1 – p ( ) logit scores ...

Odds Ratios and Doubling Amounts
= w0 + w1 x1 + w2 x2 ^ log p 1 – p ( ) logit scores Δxi consequence Odds ratio: Amount odds change with unit change in input. 1  odds  exp(wi) Doubling amount: How much does an input have to change to double the odds? 0.69 wi  odds 2 ...

Interpreting a Regression Model
This demonstration illustrates interpreting a regression model using odds ratios.

Extreme Distributions and Regressions
Original Input Scale true association standard regression standard regression true association skewed input distribution high leverage points ...

Extreme Distributions and Regressions
Original Input Scale Regularized Scale true association standard regression standard regression true association skewed input distribution high leverage points more symmetric distribution ...

Regularizing Input Transformations
Original Input Scale Original Input Scale Regularized Scale standard regression skewed input distribution high leverage points more symmetric distribution ...

Original Input Scale Original Input Scale Regularized Scale standard regression regularized estimate standard regression regularized estimate ...

Original Input Scale Regularized Scale true association standard regression regularized estimate standard regression regularized estimate true association ...

4.03 Multiple Choice Poll Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here

4.03 Multiple Choice Poll – Correct Answer
Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here

Transforming Inputs This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.

Nonnumeric Input Coding
Level DA DB DC DD DE DF DG DH DI A B C D E F G H I 1 ...

Coding Redundancy Level DA DB DC DD DE DF DG DH DI DI 1
1 A B C D E F G H I 1 ...

Coding Consolidation Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0
A B C D E F G H I 1 ...

Coding Consolidation Level DABCD DB DC DD DEF DF DGH DH DI
A B C D E F G H I 1

Recoding Categorical Inputs
This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.

Standard Logistic Regression
p 1 – p ^ ( ) 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 log = w0 + w1 x1 + w2 x2 ^ ^ ^ ^

Polynomial Logistic Regression
( p 1 – p ^ ) 0.40 0.50 0.60 0.70 0.40 0.50 0.60 0.70 0.30 0.80 log = w0 + w1 x1 + w2 x2 ^ ^ ^ 1.0 ^ 0.9 quadratic terms + w3 x1 + w4 x2 2 ^ + w5 x1 x2 0.8 0.7 ^ 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1 ...

Adding Polynomial Regression Terms Selectively
This demonstration illustrates how to add polynomial regression terms selectively.

Adding Polynomial Regression Terms Autonomously (Self-Study)
This demonstration illustrates how to add polynomial regression terms autonomously.

Exercises This exercise reinforces the concepts discussed previously.

Regression Tools Review
Replace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios. Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation. continued...

Regression Tools Review
Consolidate levels of a nonnumeric input using the Replacement Editor window. Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.

Chapter 4: Introduction to Predictive Modeling: Regressions

Similar presentations

Presentation on theme: "Chapter 4: Introduction to Predictive Modeling: Regressions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4: Introduction to Predictive Modeling: Regressions

Similar presentations

Presentation on theme: "Chapter 4: Introduction to Predictive Modeling: Regressions"— Presentation transcript:

Similar presentations

About project

Feedback