Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Model Essentials – Regressions Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...
Model Essentials – Regressions Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Best model from sequence Optimize complexity Optimize complexity. ...
Model Essentials – Regressions Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity.
Linear Regression Prediction Formula input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 · · prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize: ∑( yi – yi )2 training data ^ squared error function ...
Linear Regression Prediction Formula input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 · · prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize. ∑( yi – yi )2 training data ^ squared error function ...
Logistic Regression Prediction Formula ^ log p 1 – p ( ) ^ ^ ^ = w0 + w1 x1 + w2 x2 · · logit scores ...
( ) Logit Link Function log p 1 – p = w0 + w1 x1 + w2 x2 · · ^ ^ ^ ^ logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ...
( ) Logit Link Function log p 1 – p = w0 + w1 x1 + w2 x2 · · ^ ^ ^ ^ logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ...
( ) Logit Link Function log p 1 – p logit( p ) = w0 + w1 x1 + w2 x2 · ^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = w0 + w1 x1 + w2 x2 · · = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^ ...
( ) Logit Link Function log p 1 – p logit( p ) = w0 + w1 x1 + w2 x2 · ^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = w0 + w1 x1 + w2 x2 · · = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^ ...
Logit Link Function logit scores ...
Simple Prediction Illustration – Regressions Predict dot color for each x1 and x2. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...
Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...
Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Find parameter estimates by maximizing log-likelihood function ...
Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Find parameter estimates by maximizing log-likelihood function ...
Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2. ...
4.01 Multiple Choice Poll What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1
4.01 Multiple Choice Poll – Correct Answer What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1
Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Missing Values and Regression Modeling Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...
Missing Values and Regression Modeling Training Data inputs target Consequence: missing values can significantly reduce your amount of training data for regression modeling! Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...
Missing Values and Regression Modeling Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs ...
Missing Values and the Prediction Formula Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Values and the Prediction Formula Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Values and the Prediction Formula Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Values and the Prediction Formula Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values. ...
Missing Value Causes Manage missing values. Non-applicable measurement No match on merge Non-disclosed measurement ...
Missing Value Remedies Manage missing values. Synthetic distribution Non-applicable measurement No match on merge Estimation xi = f(x1, … ,xp) Non-disclosed measurement ...
Managing Missing Values This demonstration illustrates how to impute synthetic data values and create missing value indicators.
Running the Regression Node This demonstration illustrates using the Regression tool.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Model Essentials – Regressions Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.
Sequential Selection – Forward Input p-value Entry Cutoff ...
Sequential Selection – Forward Input p-value Entry Cutoff ...
Sequential Selection – Forward Input p-value Entry Cutoff ...
Sequential Selection – Forward Input p-value Entry Cutoff ...
Sequential Selection – Forward Input p-value Entry Cutoff
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff ...
Sequential Selection – Backward Input p-value Stay Cutoff
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff ...
Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff
4.02 Poll The three sequential selection methods for building regression models can never lead to the same model for the same set of data. True False Type answer here
4.02 Poll – Correct Answer The three sequential selection methods for building regression models can never lead to the same model for the same set of data. True False Type answer here
Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the model.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Model Essentials – Regressions Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...
Model Fit versus Complexity Model fit statistic 2 6 Evaluate each sequence step. 3 5 validation 4 training 1 ...
Select Model with Optimal Validation Fit Model fit statistic Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6 ...
Optimizing Complexity This demonstration illustrates tuning a regression model to give optimal performance on the validation data.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Beyond the Prediction Formula Manage missing values Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Logistic Regression Prediction Formula = w0 + w1 x1 + w2 x2 ^ · log p 1 – p ( ) logit scores ...
Odds Ratios and Doubling Amounts = w0 + w1 x1 + w2 x2 ^ · log p 1 – p ( ) logit scores Δxi consequence Odds ratio: Amount odds change with unit change in input. 1 odds exp(wi) Doubling amount: How much does an input have to change to double the odds? 0.69 wi odds 2 ...
Interpreting a Regression Model This demonstration illustrates interpreting a regression model using odds ratios.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Extreme Distributions and Regressions Original Input Scale true association standard regression standard regression true association skewed input distribution high leverage points ...
Extreme Distributions and Regressions Original Input Scale Regularized Scale true association standard regression standard regression true association skewed input distribution high leverage points more symmetric distribution ...
Regularizing Input Transformations Original Input Scale Original Input Scale Regularized Scale standard regression skewed input distribution high leverage points more symmetric distribution ...
Regularizing Input Transformations Original Input Scale Original Input Scale Regularized Scale standard regression regularized estimate standard regression regularized estimate ...
Regularizing Input Transformations Original Input Scale Regularized Scale true association standard regression regularized estimate standard regression regularized estimate true association ...
4.03 Multiple Choice Poll Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here
4.03 Multiple Choice Poll – Correct Answer Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here
Transforming Inputs This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Nonnumeric Input Coding Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1 ...
Coding Redundancy Level DA DB DC DD DE DF DG DH DI DI 1 1 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1 ...
Coding Consolidation Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1 ...
Coding Consolidation Level DABCD DB DC DD DEF DF DGH DH DI 1 0 0 0 0 0 0 0 A 1 1 0 0 0 0 0 0 B 1 0 1 0 0 0 0 0 C 1 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 1 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 1 1 H 0 0 0 0 0 0 0 0 I 1
Recoding Categorical Inputs This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.
Chapter 4: Introduction to Predictive Modeling: Regressions 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...
Standard Logistic Regression p 1 – p ^ ( ) 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 log = w0 + w1 x1 + w2 x2 ^ ^ ^ · ^
Polynomial Logistic Regression ( p 1 – p ^ ) 0.40 0.50 0.60 0.70 0.40 0.50 0.60 0.70 0.30 0.80 log = w0 + w1 x1 + w2 x2 ^ ^ ^ · · 1.0 ^ 0.9 quadratic terms + w3 x1 + w4 x2 2 ^ + w5 x1 x2 0.8 0.7 ^ 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1 ...
Adding Polynomial Regression Terms Selectively This demonstration illustrates how to add polynomial regression terms selectively.
Adding Polynomial Regression Terms Autonomously (Self-Study) This demonstration illustrates how to add polynomial regression terms autonomously.
Exercises This exercise reinforces the concepts discussed previously.
Regression Tools Review Replace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios. Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation. continued...
Regression Tools Review Consolidate levels of a nonnumeric input using the Replacement Editor window. Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.