Chapter 6: Model Assessment

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Advertisements

Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Learning Algorithm Evaluation
Copyright © 2011 Pearson Education, Inc. Statistical Tests Chapter 16.
Confidence Intervals This chapter presents the beginning of inferential statistics. We introduce methods for estimating values of these important population.
“I Don’t Need Enterprise Miner”
Statistics for the Behavioral Sciences
Chapter 7 Introduction to Sampling Distributions
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Heuristic alignment algorithms and cost matrices
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
CAP and ROC curves.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
BPS - 3rd Ed. Chapter 131 Confidence intervals: the basics.
Determining the Size of
Applied Business Forecasting and Planning
Measures of Variability: Range, Variance, and Standard Deviation
Decision Tree Models in Data Mining
Zhangxi Lin ISQS Texas Tech University Note: Most slides in this file are sourced from Course Notes Lecture Notes 8 Continuous and Multiple.
©2013 Cengage Learning. All Rights Reserved. Business Management, 13e Data Analysis and Decision Making Mathematics and Management Basic.
Simple Linear Regression
Normal Curves and Sampling Distributions
Chapter 2: Accessing and Assaying Prepared Data
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
Chapter 8 Introduction to Hypothesis Testing
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
5.2 Input Selection 5.3 Stopped Training
Chapter 2 Frequency Distributions
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Quality Control Lecture 5
Chapter 16 The Chi-Square Statistic
TEKS (6.10) Probability and statistics. The student uses statistical representations to analyze data. The student is expected to: (B) identify mean (using.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Chapter 4: Introduction to Predictive Modeling: Regressions
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
Losing Weight (a) If we were to repeat the sampling procedure many times, on average, the sample proportion would be within 3 percentage points of the.
1 Chapter 2: Logistic Regression and Correspondence Analysis 2.1 Fitting Ordinal Logistic Regression Models 2.2 Fitting Nominal Logistic Regression Models.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Excel Charts.
1 Chapter 6: Using Prompts in Tasks and Queries 6.1 Prompting in Projects 6.2 Creating and Using Prompts in Tasks 6.3 Creating and Using Prompts in Queries.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Examining Relationships in Quantitative Research
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Chapter 10 The t Test for Two Independent Samples
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Chapter 6: Analyzing and Interpreting Quantitative Data
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Evaluating Classification Performance
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
1 Where we are going : a graphic: Hypothesis Testing. 1 2 Paired 2 or more Means Variances Proportions Categories Slopes Ho: / CI Samples Ho: / CI / CI.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Evaluation – next steps
Regression Analysis Part D Model Building
Statistics for Managers using Excel 3rd Edition
12 Inferential Analysis.
Inferential Statistics:
Introduction to Data Mining and Classification
Advanced Analytics Using Enterprise Miner
12 Inferential Analysis.
Chapter 7: Introduction to Sampling Distributions
Computational Intelligence: Methods and Applications
Presentation transcript:

Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Summary Statistics Summary Prediction Type Statistic Accuracy/Misclassification Profit/Loss Inverse prior threshold Decisions ROC Index (concordance) Gini coefficient Rankings Average squared error SBC/Likelihood Estimates ...

Summary Statistics Summary Prediction Type Statistic Accuracy/Misclassification Profit/Loss Inverse prior threshold Decisions ROC Index (concordance) Gini coefficient Rankings Average squared error SBC/Likelihood Estimates ...

Summary Statistics Summary Prediction Type Statistic Accuracy/Misclassification Profit/Loss Inverse prior threshold Decisions ROC Index (concordance) Gini coefficient Rankings Average squared error SBC/Likelihood Estimates

Comparing Models with Summary Statistics This demonstration illustrates the use of the Model Comparison tool, which collects assessment information from attached modeling nodes and enables you to easily compare model performance measures.

Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Statistical Graphics – ROC Chart 0.0 1.0 captured response fraction (sensitivity) false positive fraction (1-specificity) The ROC chart illustrates a tradeoff between a captured response fraction and a false positive fraction. ...

Statistical Graphics – ROC Chart 0.0 1.0 captured response fraction (sensitivity) false positive fraction (1-specificity) The ROC chart illustrates a tradeoff between a captured response fraction and a false positive fraction. ...

Statistical Graphics – ROC Chart 0.0 1.0 Each point on the ROC chart corresponds to a specific fraction of cases, ordered by their predicted value. ...

Statistical Graphics – ROC Chart 0.0 1.0 Each point on the ROC chart corresponds to a specific fraction of cases, ordered by their predicted value. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% For example, this point on the ROC chart corresponds to the 40% of cases with the highest predicted values. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% For example, this point on the ROC chart corresponds to the 40% of cases with the highest predicted values. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% The y-coordinate shows the fraction of primary outcome cases captured in the top 40% of all cases. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% The y-coordinate shows the fraction of primary outcome cases captured in the top 40% of all cases. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% The x-coordinate shows the fraction of secondary outcome cases captured in the top 40% of all cases. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% The x-coordinate shows the fraction of secondary outcome cases captured in the top 40% of all cases. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% Repeat for all selection fractions. ...

Statistical Graphics – ROC Chart 0.0 1.0 top 40% Repeat for all selection fractions. ...

Statistical Graphics – ROC Chart 0.0 1.0 weak model strong model ...

Statistical Graphics – ROC Index 0.0 1.0 weak model ROC Index < 0.6 strong model ROC Index > 0.7 ...

Comparing Models with ROC Charts This demonstration illustrates the use of ROC charts to compare models.

Statistical Graphics – Response Chart 100% cumulative percent response 50% 0% percent selected 100% The response chart shows the expected response rate for various selection percentages. ...

Statistical Graphics – Response Chart 50% 100% 0% cumulative percent response percent selected The response chart shows the expected response rate for various selection percentages. ...

Statistical Graphics – Response Chart 50% 100% 0% Each point on the response chart corresponds to a specific fraction of cases, ordered by their predicted values. ...

Statistical Graphics – Response Chart 50% 100% 0% Each point on the response chart corresponds to a specific fraction of cases, ordered by their predicted values. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% For example, this point on the response chart corresponds to the 40% of cases with the highest predicted values. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% For example, this point on the response chart corresponds to the 40% of cases with the highest predicted values. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% 40% The x-coordinate shows the percentage of selected cases. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% 40% The x-coordinate shows the percentage of selected cases. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% 40% The y-coordinate shows the percentage of primary outcome cases found in the top 40%. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% 40% The y-coordinate shows the percentage of primary outcome cases found in the top 40%. ...

Statistical Graphics – Response Chart 50% 100% 0% top 40% 40% Repeat for all selection fractions. ...

6.01 Poll In practice, modelers often use several tools, sometimes both graphical and numerical, to choose a best model.  True  False Type answer here

6.01 Poll – Correct Answer In practice, modelers often use several tools, sometimes both graphical and numerical, to choose a best model.  True  False Type answer here

Comparing Models with Score Rankings Plots This demonstration illustrates comparing models with Score Rankings plots.

Adjusting for Separate Sampling This demonstration illustrates how to adjust for separate sampling in SAS Enterprise Miner.

Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Outcome Overrepresentation A common predictive modeling practice is to build models from a sample with a primary outcome proportion different from the original population. ...

Outcome Overrepresentation A common predictive modeling practice is to build models from a sample with a primary outcome proportion different from the original population. ...

Separate Sampling secondary outcome primary outcome Target-based samples are created by considering the primary outcome cases separately from the secondary outcome cases. ...

Separate Sampling secondary outcome primary outcome Target-based samples are created by considering the primary outcome cases separately from the secondary outcome cases. ...

Separate Sampling Select some cases. Select all cases. secondary outcome primary outcome Select some cases. Select all cases. ...

Separate Sampling Select some cases. Select all cases. secondary outcome primary outcome Select some cases. Select all cases. ...

The Modeling Sample + Similar predictive power with smaller case count − Must adjust assessment statistics and graphics − Must adjust prediction estimates for bias ...

Adjusting for Separate Sampling (continued) This demonstration illustrates how to adjust for separate sampling in SAS Enterprise Miner.

Creating a Profit Matrix This demonstration illustrates how to create a profit matrix.

Chapter 6: Model Assessment 6.1 Model Fit Statistics 6.2 Statistical Graphics 6.3 Adjusting for Separate Sampling 6.4 Profit Matrices

Profit Matrices 15.14 -0.68 solicit ignore primary outcome secondary primary outcome secondary outcome -0.68 profit distribution for solicit decision

Profit Matrices 15.14 -0.68 solicit ignore primary outcome secondary primary outcome secondary outcome -0.68 profit distribution for solicit decision

Decision Expected Profits solicit ignore 15.14 primary outcome secondary outcome -0.68 Expected Profit Solicit = 15.14 p1 – 0.68 p0 Expected Profit Ignore = 0 Choose the larger. ^ ...

Decision Threshold 15.14 -0.68 solicit ignore primary outcome primary outcome secondary outcome -0.68 decision threshold ^ p1 ≥ 0.68 / 15.82  Solicit ^ p1 < 0.68 / 15.82  Ignore

Average Profit 15.14 -0.68 solicit ignore primary outcome secondary primary outcome secondary outcome -0.68 average profit Average profit = (15.14NPS – 0.68 NSS ) / N NPS = # solicited primary outcome cases NSS = # solicited secondary outcome cases N = total number of assessment cases

Evaluating Model Profit This demonstration illustrates viewing the consequences of incorporating a profit matrix.

Viewing Additional Assessments This demonstration illustrates several other assessments of possible interest.

Optimizing with Profit (Self-Study) This demonstration illustrates optimizing your model strictly on profit.

Exercises This exercise reinforces the concepts discussed previously.

Assessment Tools Review Compare model summary statistics and statistical graphics. Create decision data; add prior probabilities and profit matrices. Tune models with average squared error or appropriate profit matrix. Obtain means and other statistics on data source variables.