Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Slides:



Advertisements
Similar presentations
Copyright © Allyn & Bacon (2007) Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents.
Advertisements

Boosting Approach to ML
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
A Physics Toolkit Chapter Physics Energy, matter and their relationship Understanding the physical world Careers –Scientists, astronomers, engineers,
Model Assessment, Selection and Averaging
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Model Evaluation Instructor: Qiang Yang
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Model Evaluation Instructor: Qiang Yang
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Bayesian Learning Rong Jin.
Evaluation – next steps
2015/7/15University of Waikato1 Significance tests Significance tests tell us how confident we can be that there really is a difference Null hypothesis:
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Understanding Research Results
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
Regression and Correlation Methods Judy Zhong Ph.D.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Evaluation – next steps
Chapter 15 Correlation and Regression
Chapter 6 & 7 Linear Regression & Correlation
Slides for “Data Mining” by I. H. Witten and E. Frank.
Analyzing and Interpreting Quantitative Data
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Credibility: Evaluating what’s been learned This Lecture based on Ch 5 of Witten & Frank Plan for this week 3 classes before Midterm Paper and Survey discussion.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
INTRODUCTION TO Machine Learning 3rd Edition
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Chapter 14 Correlation and Regression
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten and E. Frank.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Practical Machine Learning Tools and Techniques
Data Science Credibility: Evaluating What’s Been Learned
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Data Science Algorithms: The Basic Methods
Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.
Data Science Algorithms: The Basic Methods
Chapter 12: Regression Diagnostics
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Lecture 11.
November 2017 Dr. Dale Parson
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Regression
Data Mining Practical Machine Learning Tools and Techniques
Slides for Chapter 5, Evaluation
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Evaluating Numeric Prediction ● Same strategies: independent test set, cross- validation, significance tests, etc. ● Difference: error measures ● Actual target values: y 1 y 2 …y N ● Predicted target values: y ^ 1 y ^ 2 … y ^ N ● Most popular measure: mean-squared error ● Easy to manipulate mathematically

Rodney Nielsen, Human Intelligence & Language Technologies Lab Other Measures ● The root mean-squared error : ● The mean absolute error is less sensitive to outliers than the mean-squared error: ● Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Improvement on the Mean ● How much does the scheme improve on simply predicting the average? ● The relative squared error is: ● Root relative squared error ● Relative absolute error

Rodney Nielsen, Human Intelligence & Language Technologies Lab Correlation Coefficient ● Measures the statistical correlation between the predicted values and the actual values ● Pearson product-moment correlation coefficient, rho ● Scale independent, between –1 and +1 ● Good performance leads to large values

Rodney Nielsen, Human Intelligence & Language Technologies Lab Pearson product-moment correlation coefficient Examples of scatter diagrams with different values of correlation coefficient (ρ)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Pearson product-moment correlation coefficient Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

Rodney Nielsen, Human Intelligence & Language Technologies Lab Which Measure? ● Best to look at all of them ● Often it doesn’t matter ● Example: Correlation coefficient 30.4%34.8%40.1%43.1%Relative absolute error 35.8%39.4%57.2%42.2%Root rel squared error Mean absolute error Root mean-squared error DCBA ● D best ● C second-best ● A, B arguable

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab The MDL Principle ● MDL stands for minimum description length ● The description length is defined as: space required to describe a theory + space required to describe the theory’s mistakes ● In our case the theory is the classifier and the mistakes are the errors on the training data ● Aim: we seek a classifier with minimal DL ● MDL principle is a model selection criterion

Rodney Nielsen, Human Intelligence & Language Technologies Lab Model Selection Criteria ● Model selection criteria attempt to find a good compromise between: ● The complexity of a model ● Its prediction accuracy on the training data ● Reasoning: a good model is a simple model that achieves high accuracy on the given data ● Occam’s Razor : the best theory is the smallest one that describes all the facts William of Ockham, born in the village of Ockham in Surrey (England) about 1285, was the most influential philosopher of the 14th century and a controversial theologian.

Rodney Nielsen, Human Intelligence & Language Technologies Lab Elegance vs. Errors ● Theory 1: very simple, elegant theory that explains the data almost perfectly ● Theory 2: significantly more complex theory that reproduces the data without mistakes ● Theory 1 is probably preferable ● Classical example: Kepler’s three laws on planetary motion  Less accurate than Copernicus’s latest refinement of the Ptolemaic theory of epicycles

Rodney Nielsen, Human Intelligence & Language Technologies Lab MDL and Compression ● MDL principle relates to data compression: ● The best theory is the one that compresses the data the most ● I.e. to compress a dataset we generate a model and then store the model and its mistakes ● We need to compute (a) size of the model, and (b) space needed to encode the errors ● (b) use the informational loss function ● (a) need a method to encode the model

Rodney Nielsen, Human Intelligence & Language Technologies Lab MDL and Bayes’ Theorem ● L(T) = “length” of the theory ● L(E|T) = training set encoded wrt the theory ● Description length = L(T) + L(E|T) ● Bayes’ theorem gives a posteriori probability of a theory given the data: ● Equivalent to: constant

Rodney Nielsen, Human Intelligence & Language Technologies Lab MDL and MAP ● MAP stands for maximum a posteriori probability ● Finding the MAP theory corresponds to finding the MDL theory ● Difficult bit in applying the MAP principle: determining the prior probability of the theory Pr(T) ● Corresponds to difficult part in applying the MDL principle: coding scheme for the theory ● I.e., if we know a priori that a particular theory is more likely we need fewer bits to encode it

Rodney Nielsen, Human Intelligence & Language Technologies Lab Discussion of MDL Principle ● Advantage: makes full use of the training data when selecting a model ● Disadvantage 1: appropriate coding scheme/prior probabilities for theories are crucial ● Disadvantage 2: no guarantee that the MDL theory is the one which minimizes the expected error ● Note: Occam’s Razor is an axiom! ● Epicurus’ principle of multiple explanations: keep all theories that are consistent with the data

Rodney Nielsen, Human Intelligence & Language Technologies Lab MDL and Clustering ● Description length of theory: bits needed to encode the clusters  e.g. cluster centers ● Description length of data given theory: encode cluster membership and position relative to cluster  e.g. distance to cluster center ● Works if coding scheme uses less code space for small numbers than for large ones ● With nominal attributes, must communicate probability distributions for each cluster