Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Introduction to Predictive Modeling: Regressions

Similar presentations


Presentation on theme: "Chapter 4: Introduction to Predictive Modeling: Regressions"— Presentation transcript:

1 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

2 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

3 Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...

4 Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Best model from sequence Optimize complexity Optimize complexity. ...

5 Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity.

6 Linear Regression Prediction Formula
input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize: ∑( yi – yi )2 training data ^ squared error function ...

7 Linear Regression Prediction Formula
input measurement ^ ^ ^ ^ y = w0 + w1 x1 + w2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize. ∑( yi – yi )2 training data ^ squared error function ...

8 Logistic Regression Prediction Formula
^ log p 1 – p ( ) ^ ^ ^ = w0 + w1 x1 + w2 x2 logit scores ...

9 ( ) Logit Link Function log p 1 – p = w0 + w1 x1 + w2 x2 · · ^ ^ ^ ^
logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ...

10 ( ) Logit Link Function log p 1 – p = w0 + w1 x1 + w2 x2 · · ^ ^ ^ ^
logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞). ...

11 ( ) Logit Link Function log p 1 – p logit( p ) = w0 + w1 x1 + w2 x2 ·
^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = w0 + w1 x1 + w2 x2 = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^ ...

12 ( ) Logit Link Function log p 1 – p logit( p ) = w0 + w1 x1 + w2 x2 ·
^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = w0 + w1 x1 + w2 x2 = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^ ...

13 Logit Link Function logit scores ...

14 Simple Prediction Illustration – Regressions
Predict dot color for each x1 and x2. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...

15 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 You need intercept and parameter estimates. ...

16 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Find parameter estimates by maximizing log-likelihood function ...

17 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Find parameter estimates by maximizing log-likelihood function ...

18 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2. ...

19

20 4.01 Multiple Choice Poll What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

21 4.01 Multiple Choice Poll – Correct Answer
What is the logistic regression prediction for the indicated point? 0.243 0.56 yellow It depends. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 0.50 Type answer here 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

22 Regressions: Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

23 Regressions: Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

24 Missing Values and Regression Modeling
Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...

25 Missing Values and Regression Modeling
Training Data inputs target Consequence: missing values can significantly reduce your amount of training data for regression modeling! Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. ...

26 Missing Values and Regression Modeling
Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs ...

27 Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values. ...

28 Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values. ...

29 Missing Values and the Prediction Formula
Problem 2: Prediction formulas cannot score cases with missing values. ...

30 Missing Values and the Prediction Formula
Problem 2: Prediction formulas cannot score cases with missing values. ...

31 Missing Value Issues Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values. ...

32 Missing Value Issues Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values. ...

33 Missing Value Causes Manage missing values. Non-applicable measurement
No match on merge Non-disclosed measurement ...

34 Missing Value Remedies
Manage missing values. Synthetic distribution Non-applicable measurement No match on merge Estimation xi = f(x1, … ,xp) Non-disclosed measurement ...

35 Managing Missing Values
This demonstration illustrates how to impute synthetic data values and create missing value indicators.

36 Running the Regression Node
This demonstration illustrates using the Regression tool.

37 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

38 Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Sequential selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

39 Sequential Selection – Forward
Input p-value Entry Cutoff ...

40 Sequential Selection – Forward
Input p-value Entry Cutoff ...

41 Sequential Selection – Forward
Input p-value Entry Cutoff ...

42 Sequential Selection – Forward
Input p-value Entry Cutoff ...

43 Sequential Selection – Forward
Input p-value Entry Cutoff

44 Sequential Selection – Backward
Input p-value Stay Cutoff ...

45 Sequential Selection – Backward
Input p-value Stay Cutoff ...

46 Sequential Selection – Backward
Input p-value Stay Cutoff ...

47 Sequential Selection – Backward
Input p-value Stay Cutoff ...

48 Sequential Selection – Backward
Input p-value Stay Cutoff ...

49 Sequential Selection – Backward
Input p-value Stay Cutoff ...

50 Sequential Selection – Backward
Input p-value Stay Cutoff ...

51 Sequential Selection – Backward
Input p-value Stay Cutoff

52 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

53 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

54 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

55 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

56 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

57 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff ...

58 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

59

60 4.02 Poll The three sequential selection methods for building regression models can never lead to the same model for the same set of data.  True  False Type answer here

61 4.02 Poll – Correct Answer The three sequential selection methods for building regression models can never lead to the same model for the same set of data.  True  False Type answer here

62 Selecting Inputs This demonstration illustrates using stepwise selection to choose inputs for the model.

63 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

64 Model Essentials – Regressions
Prediction formula Predict new cases. Sequential selection Select useful inputs. Best model from sequence Optimize complexity. ...

65 Model Fit versus Complexity
Model fit statistic 2 6 Evaluate each sequence step. 3 5 validation 4 training 1 ...

66 Select Model with Optimal Validation Fit
Model fit statistic Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6 ...

67 Optimizing Complexity
This demonstration illustrates tuning a regression model to give optimal performance on the validation data.

68 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

69 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

70 Beyond the Prediction Formula
Manage missing values Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

71 Logistic Regression Prediction Formula
= w0 + w1 x1 + w2 x2 ^ log p 1 – p ( ) logit scores ...

72 Odds Ratios and Doubling Amounts
= w0 + w1 x1 + w2 x2 ^ log p 1 – p ( ) logit scores Δxi consequence Odds ratio: Amount odds change with unit change in input. 1  odds  exp(wi) Doubling amount: How much does an input have to change to double the odds? 0.69 wi  odds 2 ...

73 Interpreting a Regression Model
This demonstration illustrates interpreting a regression model using odds ratios.

74 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

75 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

76 Extreme Distributions and Regressions
Original Input Scale true association standard regression standard regression true association skewed input distribution high leverage points ...

77 Extreme Distributions and Regressions
Original Input Scale Regularized Scale true association standard regression standard regression true association skewed input distribution high leverage points more symmetric distribution ...

78 Regularizing Input Transformations
Original Input Scale Original Input Scale Regularized Scale standard regression skewed input distribution high leverage points more symmetric distribution ...

79 Regularizing Input Transformations
Original Input Scale Original Input Scale Regularized Scale standard regression regularized estimate standard regression regularized estimate ...

80 Regularizing Input Transformations
Original Input Scale Regularized Scale true association standard regression regularized estimate standard regression regularized estimate true association ...

81

82 4.03 Multiple Choice Poll Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here

83 4.03 Multiple Choice Poll – Correct Answer
Which statement below is true about transformations of input variables in a regression analysis? They are never a good idea. They help model assumptions match the assumptions of maximum likelihood estimation. They are performed to reduce the bias in model predictions. They typically are done on nominal (categorical) inputs. Type answer here

84 Transforming Inputs This demonstration illustrates using the Transform Variables tool to apply standard transformations to a set of inputs.

85 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

86 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

87 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

88 Nonnumeric Input Coding
Level DA DB DC DD DE DF DG DH DI A B C D E F G H I 1 ...

89 Coding Redundancy Level DA DB DC DD DE DF DG DH DI DI 1
1 A B C D E F G H I 1 ...

90 Coding Consolidation Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0
A B C D E F G H I 1 ...

91 Coding Consolidation Level DABCD DB DC DD DEF DF DGH DH DI
A B C D E F G H I 1

92 Recoding Categorical Inputs
This demonstration illustrates using the Replacement tool to facilitate the process of combining input levels.

93 Chapter 4: Introduction to Predictive Modeling: Regressions
4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4 Interpreting Regression Models 4.5 Transforming Inputs 4.6 Categorical Inputs 4.7 Polynomial Regressions (Self-Study)

94 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

95 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities. ...

96 Standard Logistic Regression
p 1 – p ^ ( ) 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1 x2 0.40 0.50 0.60 0.70 log = w0 + w1 x1 + w2 x2 ^ ^ ^ ^

97 Polynomial Logistic Regression
( p 1 – p ^ ) 0.40 0.50 0.60 0.70 0.40 0.50 0.60 0.70 0.30 0.80 log = w0 + w1 x1 + w2 x2 ^ ^ ^ 1.0 ^ 0.9 quadratic terms + w3 x1 + w4 x2 2 ^ + w5 x1 x2 0.8 0.7 ^ 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1 ...

98 Adding Polynomial Regression Terms Selectively
This demonstration illustrates how to add polynomial regression terms selectively.

99 Adding Polynomial Regression Terms Autonomously (Self-Study)
This demonstration illustrates how to add polynomial regression terms autonomously.

100 Exercises This exercise reinforces the concepts discussed previously.

101 Regression Tools Review
Replace missing values for interval (means) and categorical data (mode). Create a unique replacement indicator. Create linear and logistic regression models. Select inputs with a sequential selection method and appropriate fit statistic. Interpret models with odds ratios. Regularize distributions of inputs. Typical transformations control for input skewness via a log transformation. continued...

102 Regression Tools Review
Consolidate levels of a nonnumeric input using the Replacement Editor window. Add polynomial terms to a regression either by hand or by an autonomous exhaustive search.


Download ppt "Chapter 4: Introduction to Predictive Modeling: Regressions"

Similar presentations


Ads by Google