Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Slides:

Advertisements

Similar presentations

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Advertisements

Correlation and regression

Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Correlation and Linear Regression.

Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.

Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.

Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.

Basic Statistical Concepts Psych 231: Research Methods in Psychology.

Basic Statistical Concepts

Statistics Psych 231: Research Methods in Psychology.

Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.

Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.

1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.

Introduction to Regression Analysis, Chapter 13,

Relationships Among Variables

Understanding Research Results

Introduction to Linear Regression and Correlation Analysis

@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人：黃子齊

Evaluation – next steps

Covariance and correlation

Chapter 15 Correlation and Regression

Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &

1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.

© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.

Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.

Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.

REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.

Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.

Creating a Residual Plot and Investigating the Correlation Coefficient.

Correlation. Correlation is a measure of the strength of the relation between two or more variables. Any correlation coefficient has two parts – Valence:

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.

Chapter 14 Correlation and Regression

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.

Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.

LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.

Psychology 202a Advanced Psychological Statistics October 22, 2015.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 

CORRELATION ANALYSIS.

©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.

Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Pearson Product-Moment Correlation Test PowerPoint.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.

Chapter 11: Linear Regression E370, Spring From Simple Regression to Multiple Regression.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Data Science Credibility: Evaluating What’s Been Learned

Chapter 4: Basic Estimation Techniques

Chapter 4 Basic Estimation Techniques

Evaluation – next steps

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Data Science Algorithms: The Basic Methods

Statistics for Managers using Microsoft Excel 3rd Edition

Basic Estimation Techniques

Data Science Algorithms: The Basic Methods

Chapter 12: Regression Diagnostics

CHAPTER fourteen Correlation and Regression Analysis

Linear Models: Building Linear Functions from Data

Theme 7 Correlation.

Stats Club Marnie Brennan

Basic Statistical Terms

The Least-Squares Line Introduction

Correlation and Regression

An Introduction to Correlational Research

Introduction to Regression

Presentation transcript:

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Credibility: Evaluating What’s Been Learned Issues: training, testing, tuning Predicting performance Holdout, cross-validation, bootstrap Comparing schemes: the t-test Predicting probabilities: loss functions Cost-sensitive measures Evaluating numeric prediction The Minimum Description Length principle

Evaluating Numeric Prediction Same strategies: independent training, validation and test sets, significance tests, etc. (avoid cross-validation and bootstrapping for reporting) Difference: error measures Actual target values: y1 y2 …yN Predicted target values: y^1 y^2 … y^N Most popular measure: mean-squared error Easy to manipulate mathematically

Other Measures The root mean-squared error : The mean absolute error is less sensitive to outliers than the mean-squared error: Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)

Improvement on the Mean How much does the scheme improve on simply predicting the average? The relative squared error is: Root relative squared error Relative absolute error

Correlation Coefficient Measures the statistical correlation between the predicted values and the actual values Pearson product-moment correlation coefficient, rho Scale independent, between –1 and +1 Good performance leads to large values

Pearson product-moment correlation coefficient Examples of scatter diagrams with different values of correlation coefficient (ρ)

Pearson product-moment correlation coefficient Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). Note: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

Which Measure? Best to look at all of them Often it doesn’t matter Student Q: In what situations would we want to use the correlation coefficient as a performance measure for numeric prediction? Best to look at all of them Often it doesn’t matter Example: 0.91 0.89 0.88 Correlation coefficient 30.4% 34.8% 40.1% 43.1% Relative absolute error 35.8% 39.4% 57.2% 42.2% Root rel squared error 29.2 33.4 38.5 41.3 Mean absolute error 57.4 63.3 91.7 67.8 Root mean-squared error D C B A D best C second-best A, B arguable