Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.

Slides:



Advertisements
Similar presentations
ROI measurement: Finding the Fallacies. ROI How ROI is calculated Some examples of what ROIs are How to know when it is calculated wrong, as it usually.
Advertisements

The Impact of Drug Benefit Caps Geoffrey Joyce, PhD.
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
CHAPTER 16 Life Tables.
Feature Selection 1 Feature Selection for Image Retrieval By Karina Zapién Arreola January 21th, 2005.
Phase II/III Design: Case Study
One step equations Add Subtract Multiply Divide Addition X + 5 = -9 X = X = X = X = X = 2.
Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Relating Gene Expression to a Phenotype and External Biological Information Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Lecture 22: Evaluation April 24, 2010.
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
. Differentially Expressed Genes, Class Discovery & Classification.
Predictive Classifiers Based on High Dimensional Data Development & Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Statistical Challenges for Predictive Onclogy Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Topics in the Development and Validation of Gene Expression Profiling Based Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch.
Ensemble Learning (2), Tree and Forest
Use of Genomics in Clinical Trial Design and How to Critically Evaluate Claims for Prognostic & Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering and Classification In Gene Expression Data Carlo Colantuoni Slide Acknowledgements: Elizabeth Garrett-Mayer, Rafael Irizarry,
Today Evaluation Measures Accuracy Significance Testing
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Evaluating Classifiers
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Validation of Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Gene Expression Profiling Illustrated Using BRB-ArrayTools.
Analysis of Molecular and Clinical Data at PolyomX Adrian Driga 1, Kathryn Graham 1, 2, Sambasivarao Damaraju 1, 2, Jennifer Listgarten 3, Russ Greiner.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
Statistical Aspects of the Development and Validation of Predictive Classifiers for High Dimensional Data Richard Simon, D.Sc. Chief, Biometric Research.
Alexander Statnikov Discovery Systems Laboratory Department of Biomedical Informatics Vanderbilt University 10/3/
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Gene Expression Profiling. Good Microarray Studies Have Clear Objectives Class Comparison (gene finding) –Find genes whose expression differs among predetermined.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Evaluating Results of Learning Blaž Zupan
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Understanding lack of validity: Bias
Validation methods.
Ari Moesriami Barmawi. Choosing Reference The Structure of Paper.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Science Credibility: Evaluating What’s Been Learned
David Amar, Tom Hait, and Ron Shamir
Classification with Gene Expression Data
Kenneth I. Aston, Ph. D. , Philip J. Uren, Ph. D. , Timothy G
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
Evaluating Results of Learning
Boosting For Tumor Classification With Gene Expression Data
Kenneth I. Aston, Ph. D. , Philip J. Uren, Ph. D. , Timothy G
Evaluating Models Part 2
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
CSCI N317 Computation for Scientific Applications Unit Weka
Volume 5, Issue 6, Pages e3 (December 2017)
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Differential gene expression in whole blood from SJIA patients and healthy controls. A. Data were normalized in Beadstudio using the "average" method and.
Systematic review of atopic dermatitis disease definition in studies using routinely-collected health data M.P. Dizon, A.M. Yu, R.K. Singh, J. Wan, M-M.
EN1 expression in breast cancer and clinical outcome.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December 31, 2004 Analyzed gene expression of more than 1000 probes Related gene expression to clinical outcome

Types of Clinical Outcome Survival or disease-free survival Response to therapy

90 publications identified that met criteria –Abstracted information for all 90 Performed detailed review of statistical analysis for the 42 papers published in 2004

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Gene Finding Don’t use only fold-changes between groups to select genes Don’t use a.05 significance threshold to select the differentially expressed genes Do use a method to control the number of false positive differentially expressed genes

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Supervised Prediction Do frame a therapeutically relevant question and select a homogeneous set of patients Don’t violate the fundamental principle of classifier validation, i.e. no preliminary use of the test samples

Do’s & Don’ts Separate Test Set Don’t use any information from the test set in developing the classifier –e.g. don’t use the test set to re-calibrate the classifier for a new platform Do access the test set once and only for testing the fully specified classifier developed with the training data

Cross Validation Cross-validation simulates the process of separately developing a model on one set of data and predicting for a test set of data not used in developing the model The cross-validated estimate of misclassification error is an estimate of the prediction error for model fit using specified algorithm to full dataset

Cross validation is only valid if the test set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates cross- validation. With proper cross-validation, the model must be developed from scratch for each leave-one-out training set. This means that feature selection must be repeated for each leave-one-out training set.

Do’s & Don’ts Cross Validation Don’t use the same set of features for all iterations Do report error estimates for all classification methods tried, not just the one with the smallest error estimate Don’t consider that retaining a small separate test set adds value to a correctly cross-validated estimate of accuracy Do report the fully specified classifier with its parameters

Do’s & Don’ts Cross Validation Do report the permutation statistical significance of the cross-validation estimate of prediction error

Major Flaws Found in 40 Studies Published in 2004 Inadequate control of multiple comparisons in gene finding –9/23 studies had unclear or inadequate methods to deal with false positives 10,000 genes x.05 significance level = 500 false positives Misleading report of prediction accuracy –12/28 reports based on incomplete cross-validation Misleading use of cluster analysis –13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes 50% of studies contained one or more major flaws

Conclusion Do Use BRB-ArrayTools or a statistical collaborator Do Use BRB-ArrayTools if you are a statistical collaborator