1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

3 Model Essentials – Regressions Predict new cases. Select useful inputs. Optimize complexity....

4 Model Essentials – Regressions Best model from sequence Sequential selection Predict new cases. Select useful inputs Optimize complexity Select useful inputs. Optimize complexity....

5 Model Essentials – Regressions Best model from sequence Sequential selection Predict new cases. Select useful inputs. Optimize complexity.

6 Linear Regression Prediction Formula parameter estimate input measurement intercept estimate = w 0 + w 1 x 1 + w 2 x 2 ^^^ y ·· prediction estimate ^ Choose intercept and parameter estimates to minimize : ∑ ( y i – y i ) 2 training data ^ squared error function...

7 Linear Regression Prediction Formula parameter estimate input measurement intercept estimate = w 0 + w 1 x 1 + w 2 x 2 ^^^ y ·· prediction estimate ^ Choose intercept and parameter estimates to minimize. ∑ ( y i – y i ) 2 training data ^ squared error function...

8 Logistic Regression Prediction Formula = w 0 + w 1 x 1 + w 2 x 2 ^^^ ·· logit scores... ^ log p 1 – p () ^

9 Logit Link Function = w 0 + w 1 x 1 + w 2 x 2 ^^^ ··... logit link function 0 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ^ log p 1 – p () ^ logit scores

10 Logit Link Function = w 0 + w 1 x 1 + w 2 x 2 ^^^ ·· logit scores... logit link function 0 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ^ log p 1 – p () ^

11 Logit Link Function = w 0 + w 1 x 1 + w 2 x 2 ^^^ ··... ^ log p 1 – p () ^ 1 1 + e - logit( p ) p =p = ^ ^ ^ logit( p ) To obtain prediction estimates, the logit equation is solved for p. ^ =

12 Logit Link Function = w 0 + w 1 x 1 + w 2 x 2 ^^^ ··... ^ log p 1 – p () ^ 1 1 + e - logit( p ) p =p = ^ ^ ^ logit( p ) To obtain prediction estimates, the logit equation is solved for p. ^ =

13 Logit Link Function...

14 Simple Prediction Illustration – Regressions Predict dot color for each x 1 and x 2. You need intercept and parameter estimates.... 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

15 Simple Prediction Illustration – Regressions You need intercept and parameter estimates.... 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

16 Simple Prediction Illustration – Regressions log-likelihood function Find parameter estimates by maximizing... 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

17 Simple Prediction Illustration – Regressions log-likelihood function Find parameter estimates by maximizing... 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

18 Simple Prediction Illustration – Regressions 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x 1 and x 2....

20 4.01 Multiple Choice Poll What is the logistic regression prediction for the indicated point? a.-0.243 b.0.56 c.yellow d.It depends … 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

21 4.01 Multiple Choice Poll – Correct Answer What is the logistic regression prediction for the indicated point? a.-0.243 b.0.56 c.yellow d.It depends … 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

22 Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Account for nonlinearities. Handle extreme or unusual values. Use nonnumeric inputs....

23 Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Account for nonlinearities. Handle extreme or unusual values. Use nonnumeric inputs....

24 Missing Values and Regression Modeling Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored....

25 Consequence: missing values can significantly reduce your amount of training data for regression modeling! Missing Values and Regression Modeling Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored....

26 Missing Values and Regression Modeling Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs...

27 Missing Values and the Prediction Formula Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values....

28 Missing Values and the Prediction Formula Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values....

29 Missing Values and the Prediction Formula Problem 2: Prediction formulas cannot score cases with missing values....

30 Missing Values and the Prediction Formula Problem 2: Prediction formulas cannot score cases with missing values....

31 Missing Value Issues Manage missing values. Problem 2: Prediction formulas cannot score cases with missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored....

32 Missing Value Issues Manage missing values. Problem 2: Prediction formulas cannot score cases with missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored....

33 Missing Value Causes Manage missing values. Non-applicable measurement No match on merge Non-disclosed measurement...

34 Missing Value Remedies Manage missing values. Non-applicable measurement No match on merge Non-disclosed measurement...

35 Managing Missing Values This demonstration illustrates how to impute synthetic data values and create missing value indicators.

36 Running the Regression Node This demonstration illustrates using the Regression tool.

38 Prediction formula Model Essentials – Regressions Best model from sequence Sequential selection Predict new cases. Select useful inputs Optimize complexity. Select useful inputs.

39 Sequential Selection – Forward Entry Cutoff Input p -value...

43 Sequential Selection – Forward Entry Cutoff Input p -value

44 Sequential Selection – Backward Stay Cutoff Input p -value...

51 Sequential Selection – Backward Stay Cutoff Input p -value

52 Sequential Selection – Stepwise Input p -value Entry Cutoff Stay Cutoff...

58 Sequential Selection – Stepwise Input p -value Entry Cutoff Stay Cutoff

59 Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the model.

61 Model Essentials – Regressions Predict new cases. Select useful inputs. Optimize complexity. Prediction formula Sequential selection...

62 Model Fit versus Complexity 1 2 3 4 5 6 Model fit statistic training validation...

63 Select Model with Optimal Validation Fit 123456 Model fit statistic Evaluate each sequence step....

64 Optimizing Complexity This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

66 Beyond the Prediction Formula Manage missing values. Interpret the model. Account for nonlinearities. Handle extreme or unusual values. Use nonnumeric inputs....

67 Beyond the Prediction Formula Manage missing values Interpret the model. Account for nonlinearities. Handle extreme or unusual values. Use nonnumeric inputs....

68 Logistic Regression Prediction Formula... = w 0 + w 1 x 1 + w 2 x 2 ^^^ ·· ^ log p 1 – p () ^ logit scores

69 Odds Ratios and Doubling Amounts Odds ratio : Amount odds change with unit change in input. Doubling amount : Input change is required to double odds. 1  odds  exp( w i )  odds  2 0.69 w i Δx i consequence... = w 0 + w 1 x 1 + w 2 x 2 ^^^ ·· ^ log p 1 – p () ^ logit scores

70 Interpreting a Regression Model This demonstration illustrates interpreting a regression model using odds ratios.

74 Extreme Distributions and Regressions high leverage points skewed input distribution standard regression true association standard regression true association Original Input Scale...

75 Extreme Distributions and Regressions high leverage points skewed input distribution standard regression true association standard regression true association Original Input Scale more symmetric distribution Regularized Scale...

76 Original Input Scale Regularizing Input Transformations more symmetric distribution Regularized Scale standard regression... Original Input Scale high leverage points skewed input distribution

77 Regularizing Input Transformations Regularized Scale standard regression... Original Input Scale regularized estimate

78 Regularizing Input Transformations Regularized Scale standard regression... Original Input Scale regularized estimate true association

79 Transforming Inputs This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.

83 Nonnumeric Input Coding LevelDIDI 1000000010000000 DADBDCDDDEDFDGDHDADBDCDDDEDFDGDH 0 0001000000010000 0100000001000000 0010000000100000 0000100000001000 0000010000000100 0000001000000010 0000000100000001 0000000000000000 0 0 0 0 0 0 0 1 A B C D E F G H I...

84 DIDI 0 0 0 0 0 0 0 0 1 DIDI 0 0 0 0 0 0 0 0 1 Coding Redundancy Level 1000000010000000 DADBDCDDDEDFDGDHDADBDCDDDEDFDGDH 0001000000010000 0100000001000000 0010000000100000 0000100000001000 0000010000000100 0000001000000010 0000000100000001 0000000000000000 A B C D E F G H I...

85 DIDI 0 0 0 0 0 0 0 0 1 Coding Consolidation Level 1000000010000000 DADBDCDDDEDFDGDHDADBDCDDDEDFDGDH 0001000000010000 0100000001000000 0010000000100000 0000100000001000 0000010000000100 0000001000000010 0000000100000001 0000000000000000 A B C D E F G H I...

86 DIDI 0 0 0 0 0 0 0 0 1 Coding Consolidation Level 1000000010000000 D ABCD D B D C D D D EF D F D GH D H 1001000010010000 1100000011000000 1010000010100000 0000100000001000 0000110000001100 0000001000000010 0000001100000011 0000000000000000 A B C D E F G H I

87 Recoding Categorical Inputs This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.

91 Standard Logistic Regression = w 0 + w 1 x 1 + w 2 x 2 ^ ^^^ log p 1 – p ( ) ^ · 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.50 0.60 0.70

92 Polynomial Logistic Regression = w 0 + w 1 x 1 + w 2 x 2 ^ ^^^ log p 1 – p ( ) ^ ·· quadratic terms + w 3 x 1 + w 4 x 2 22 ^^ + w 5 x 1 x 2 0.00.50.10.20.30.40.60.70.80.91.0 x1x1 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2x2 0.40 0.500.600.70 0.30 0.60 0.70 0.80...

93 Adding Polynomial Regression Terms Selectively This demonstration illustrates how to add polynomial regression terms selectively.

94 Adding Polynomial Regression Terms Autonomously (Self-Study) This demonstration illustrates how to add polynomial regression terms autonomously.

95 Exercises This exercise reinforces the concepts discussed previously.

96 Regression Tools Review Replace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios. Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation. continued...

97 Regression Tools Review Consolidate levels of a nonnumeric input using the Replacement Editor window. Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.

Similar presentations

Presentation on theme: "1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.

Similar presentations

Presentation on theme: "1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4."— Presentation transcript:

Similar presentations

About project

Feedback