Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.

Slides:



Advertisements
Similar presentations
Imbalanced data David Kauchak CS 451 – Fall 2013.
Advertisements

Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
Model generalization Test error Bias, variance and complexity
HUDM4122 Probability and Statistical Inference March 30, 2015.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Confidence Intervals for Proportions
The loss function, the normal equation,
Evaluation.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Evaluation.
Ensemble Learning: An Introduction
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Maximum Likelihood (ML), Expectation Maximization (EM)
SVM Support Vectors Machines
Experimental Evaluation
Ensemble Learning (2), Tree and Forest
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Evaluating Classifiers
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
CLassification TESTING Testing classifier accuracy
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Diagnosis using computers. One disease Three therapies.
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
The Broad Institute of MIT and Harvard Classification / Prediction.
Learning from Observations Chapter 18 Through
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Evaluating Results of Learning Blaž Zupan
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CpSc 881: Machine Learning Evaluating Hypotheses.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Flat clustering approaches
Classification Ensemble Methods 1
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Data Science Credibility: Evaluating What’s Been Learned
Support Vector Machines
Evaluating Classifiers
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Bias and Variance of the Estimator
Claudio Lottaz and Rainer Spang
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Hyperparameters, bias-variance tradeoff, validation
Overfitting and Underfitting
Model generalization Brief summary of methods
Evaluation and Its Methods
Introduction to Machine learning
Support Vector Machines 2
Claudio Lottaz and Rainer Spang
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis

A short test on what you have learned so far What is overfitting and what is the overfitting disaster ? 2. What is the difference between prediction and separation ? 3. How does Regularization work? 4. How is Regularization implemented PAM and in SVM ?

1. How much regularization is good? - adaptive model selection - 2. If I have found a signature, how do I know whether it is meaningful and predictive or not? - validation - Open Problems:

Model Selection & Validation We only discuss Cross-Validation Chapter 7

Cross-Validation Train Select Train Select Chop up the training data (don‘t touch the test data) into 10 sets Train on 9 of them and predict the other Iterate, leave every set out once - 10-Fold Cross Validation -

Leave one out Cross-Validation Essentially the same But you only leave one sample out at a time and predict it using the others Good for small training sets Train Select Train Select 1 1

Model Selection with separate data TrainingTest Selection Split of some samples for Model Selection Train the model on the training data with different choices for the regularization parameter Apply it to the selection data and optimize this parameter - Adaptive Model Selection - Test how good you are doing on the test data - Validation -

How much shrinkage is good in PAM ? Train Select Train Select Compute the CV-Performance for several values of  Pick the  that gives you the smallest number of CV-Misclassifications Adaptive Model Selection PAM does this routinely

Model Selection Output of PAM Small , many genes poor performance due to overfitting High , few genes, poor performance due to lack of information – underfitting - The optimal  is somewhere in the middle

Adaptive Model Selection of SVM SVM optimize the margin of separation There are theoretical results connecting the margin to an upper bound of the test error (V. Vapnik) - structural risk minimization -

The overfitting underfitting trade off Model Complexity: -max number of genes -shrinkage parameter -minimal margin -etc

Population mean: Genes have a certain mean expression and correlation in the population

Sample mean: We observe average expression and empirical correlation

Fitted model:

Regularization

Validation How well did I do? Can I use my signature for clinical diagnosis? How well will it perform? How does it compare to traditional methods?

Validation A Internal 1. Independent Test Set 2. Nested CV B External (FDR) 1. Completely new prospective study

Test and Training Data TrainingTest 2/3 1/3 Split your profiles randomly into a training set and a test set Train your model only using the data in the training set (define centroids, calculate normal vectors for large margin separators,...) Apply the model to the test data...

Recall the idea of a test set? Take some patients from the original training samples and blind the outcome These are now called test samples Only the remaining samples are still training samples. Use them to learn how to predict Predict the test samples and compare the predicted outcome to the true outcome ok ok mistake

Validation Ideas You want to validate the predictive performance of your signature Validation is usually done using an independent test set This mimics the prediction of new patients Information of the outcome of the patients must not be used at any time before the final prediction

Scenario 1 1. You train a SVM on using all genes and all patients and you observe not a single misclassification 2. You conclude that your signature does not make any (or only very little) mistakes What is wrong ?

The most important consequence of understanding the overfitting disaster: If you find a separating signature, it does not mean (yet) that you have a top publication in most cases it means nothing.

Scenario 2 1. You find the 500 genes with the highest average fold change between all type A patients and all type B patients 2. You split the patients into a test and a training set. Using only the training set you fit a SVM and applying it to both the test and trainings data, you observe 5% errors. 3. You conclude that your signature will be wrong in only 5% of all future cases What is wrong ?

Gene selection is part of training and must not be separated from it You can not select 20 genes using all your data and then with this 20 genes split test and training data and evaluate your method. There is a difference between a model that restricts signatures to depend on only 20 genes and a data set that only contains 20 genes Your validation result will look much better than it should - selection bias -

Out-of-loop and in-loop gene selection The selection bias

Scenario 3 1. You run PAM using adaptive model selection. CV Performance varies between 5% -10% 2. You choose the optimal  which yields 5% misclassifications 3. You conclude that your signature will be wrong in only 5% of all future cases What is wrong ?

Adaptive model selection is part of the training and not part of the validation Choosing the optimal  always means choosing the optimal  for your training data The performance on new patients is in general a little worse You can see this using test data

Scenario 4 1. You split your data in test and training data 2. Using only the training data you rum PAM including adaptive model selection. The optimal CV-Error is achieved for  =3 3. You apply the  =3 signature to the test data and observe an error of 7% 4. You conclude that your signature will be wrong in not more than 7% of all future cases What is wrong ?

What you get is an estimation of performance and estimators have variance. If the test set is mall this variance can be big. Nested 10-fold- CV Variance from 100 random partitions.

DOs AND DONTs : 1. Decide on your diagnosis model (PAM,SVM,etc...) and don‘t change your mind later on 2. Split your profiles randomly into a training set and a test set 3. Put the data in the test set away... far away 4. Train your model only using the data in the training set (select genes, define centroids, calculate normal vectors for large margin separators, perform adaptive model selection...) don‘t even think of touching the test data at this time 5. Apply the model to the test data... don‘t even think of changing the model at this time 6. Do steps 1-5 only once and accept the result... don‘t even think of optimizing this procedure

Thank you