Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Correlation and regression
Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Correlation and Linear Regression.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Understanding Research Results
Introduction to Linear Regression and Correlation Analysis
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Evaluation – next steps
Covariance and correlation
Chapter 15 Correlation and Regression
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Correlation. Correlation is a measure of the strength of the relation between two or more variables. Any correlation coefficient has two parts – Valence:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Chapter 14 Correlation and Regression
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
CORRELATION ANALYSIS.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Pearson Product-Moment Correlation Test PowerPoint.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Chapter 11: Linear Regression E370, Spring From Simple Regression to Multiple Regression.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Science Credibility: Evaluating What’s Been Learned
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Evaluation – next steps
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Data Science Algorithms: The Basic Methods
Statistics for Managers using Microsoft Excel 3rd Edition
Basic Estimation Techniques
Data Science Algorithms: The Basic Methods
Chapter 12: Regression Diagnostics
CHAPTER fourteen Correlation and Regression Analysis
Linear Models: Building Linear Functions from Data
Theme 7 Correlation.
Stats Club Marnie Brennan
Basic Statistical Terms
The Least-Squares Line Introduction
Correlation and Regression
An Introduction to Correlational Research
Introduction to Regression
Presentation transcript:

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Credibility: Evaluating What’s Been Learned Issues: training, testing, tuning Predicting performance Holdout, cross-validation, bootstrap Comparing schemes: the t-test Predicting probabilities: loss functions Cost-sensitive measures Evaluating numeric prediction The Minimum Description Length principle

Evaluating Numeric Prediction Same strategies: independent training, validation and test sets, significance tests, etc. (avoid cross-validation and bootstrapping for reporting) Difference: error measures Actual target values: y1 y2 …yN Predicted target values: y^1 y^2 … y^N Most popular measure: mean-squared error Easy to manipulate mathematically

Other Measures The root mean-squared error : The mean absolute error is less sensitive to outliers than the mean-squared error: Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)

Improvement on the Mean How much does the scheme improve on simply predicting the average? The relative squared error is: Root relative squared error Relative absolute error

Correlation Coefficient Measures the statistical correlation between the predicted values and the actual values Pearson product-moment correlation coefficient, rho Scale independent, between –1 and +1 Good performance leads to large values

Pearson product-moment correlation coefficient Examples of scatter diagrams with different values of correlation coefficient (ρ)

Pearson product-moment correlation coefficient Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). Note: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

Which Measure? Best to look at all of them Often it doesn’t matter Student Q: In what situations would we want to use the correlation coefficient as a performance measure for numeric prediction? Best to look at all of them Often it doesn’t matter Example: 0.91 0.89 0.88 Correlation coefficient 30.4% 34.8% 40.1% 43.1% Relative absolute error 35.8% 39.4% 57.2% 42.2% Root rel squared error 29.2 33.4 38.5 41.3 Mean absolute error 57.4 63.3 91.7 67.8 Root mean-squared error D C B A D best C second-best A, B arguable