INTRODUCTION TO Machine Learning 3rd Edition

Slides:



Advertisements
Similar presentations
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Advertisements

Evaluating Classifiers
Machine Learning Group University College Dublin Evaluation in Machine Learning Pádraig Cunningham.
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
Learning Algorithm Evaluation
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
INTRODUCTION TO Machine Learning 3rd Edition
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Spring 2003Data Mining by H. Liu, ASU1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Dealing With Statistical Uncertainty Richard Mott Wellcome Trust Centre for Human Genetics.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Horng-Chyi HorngStatistics II41 Inference on the Mean of a Population - Variance Known H 0 :  =  0 H 0 :  =  0 H 1 :    0, where  0 is a specified.
Experimental Evaluation
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Choosing Statistical Procedures
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.
One Sample  M ean μ, Variance σ 2, Proportion π Two Samples  M eans, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π Multiple.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
AM Recitation 2/10/11.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
ANOVA: Analysis of Variance.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
INTRODUCTION TO Machine Learning 3rd Edition
Machine Learning Chapter 5. Evaluating Hypotheses
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased – error:
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis Theory PhD course.
Simulation: Sensitivity, Bootstrap, and Power
Evaluation and Its Methods
Learning Algorithm Evaluation
Evaluating Hypotheses
INTRODUCTION TO Machine Learning
CHAPTER 6 Statistical Inference & Hypothesis Testing
Evaluation and Its Methods
Evaluation and Its Methods
Presentation transcript:

INTRODUCTION TO Machine Learning 3rd Edition Lecture Slides for INTRODUCTION TO Machine Learning 3rd Edition ETHEM ALPAYDIN © The MIT Press, 2014 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e

CHAPTER 19: design and analysis of machine learning experiments

Introduction Questions: Training/validation/test sets Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ? Training/validation/test sets Resampling methods: K-fold cross-validation

Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Cost-sensitive learning

Factors and Response Response function based on output to be maximized Depends on controllable factors Uncontrollable factors introduce randomness Find the configuration of controllable factors that maximizes response and minimally affected by uncontrollable factors

Strategies of Experimentation How to search the factor space? Response surface design for approximating and maximizing the response function in terms of the controllable factors

Guidelines for ML experiments Aim of the study Selection of the response variable Choice of factors and levels Choice of experimental design Performing the experiment Statistical Analysis of the Data Conclusions and Recommendations

Resampling and K-Fold Cross-Validation The need for multiple training/validation sets {Xi,Vi}i: Training/validation sets of fold i K-fold cross-validation: Divide X into k, Xi,i=1,...,K Ti share K-2 parts

5×2 Cross-Validation 5 times 2 fold cross-validation (Dietterich, 1998)

Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws that is, only 36.8% is new!

Performance Measures Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

ROC Curve

Precision and Recall

Interval Estimation X = { xt }t where xt ~ N ( μ, σ2) m ~ N ( μ, σ2/N) 100(1- α) percent confidence interval

100(1- α) percent one-sided confidence interval When σ2 is not known:

Hypothesis Testing Reject a null hypothesis if not supported by the sample with enough confidence X = { xt }t where xt ~ N ( μ, σ2) H0: μ = μ0 vs. H1: μ ≠ μ0 Accept H0 with level of significance α if μ0 is in the 100(1- α) confidence interval Two-sided test

One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0 Accept if Variance unknown: Use t, instead of z Accept H0: μ = μ0 if

Assessing Error: H0:p ≤ p0 vs. H1:p > p0 Single training/validation set: Binomial Test If error prob is p0, prob that there are e errors or less in N validation trials is Accept if this prob is less than 1- α N=100, e=20 1- α

Normal Approximation to the Binomial Number of errors X is approx N with mean Np0 and var Np0(1-p0) Accept if this prob for X = e is less than z1-α 1- α

Paired t Test Multiple training/validation sets xti = 1 if instance t misclassified on fold i Error rate of fold i: With m and s2 average and var of pi , we accept p0 or less error if is less than tα,K-1

Comparing Classifiers: H0:μ0=μ1 vs. H1:μ0≠μ1 Single training/validation set: McNemar’s Test Under H0, we expect e01= e10=(e01+ e10)/2 Accept if < X2α,1

K-Fold CV Paired t Test Use K-fold cv to get K training/validation folds pi1, pi2: Errors of classifiers 1 and 2 on fold i pi = pi1 – pi2 : Paired difference on fold i The null hypothesis is whether pi has mean 0

5×2 cv Paired t Test Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998) pi(j) : difference btw errors of 1 and 2 on fold j=1, 2 of replication i=1,...,5 Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5

5×2 cv Paired F Test Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

Comparing L>2 Algorithms: Analysis of Variance (Anova) Errors of L algorithms on K folds We construct two estimators to σ2 . One is valid if H0 is true, the other is always valid. We reject H0 if the two estimators disagree.

ANOVA table If ANOVA rejects, we do pairwise posthoc tests

Comparison over Multiple Datasets Comparing two algorithms: Sign test: Count how many times A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing multiple algorithms Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference

Multivariate Tests Instead of testing using a single performance measure, e.g., error, use multiple measures for better discrimination, e.g., [fp-rate,fn-rate] Compare p-dimensional distributions Parametric case: Assume p-variate Gaussians

Multivariate Pairwise Comparison Paired differences: Hotelling’s multivariate T2 test For p=1, reduces to paired t test

Multivariate ANOVA Comparsion of L>2 algorithms