The Impact of Functional Redundancy on Molecular Signatures

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Random Forest Predrag Radenković 3237/10
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Chapter 7 – Classification and Regression Trees
CMPUT 466/551 Principal Source: CMU
Chapter 7 – Classification and Regression Trees
Sparse vs. Ensemble Approaches to Supervised Learning
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Chapter 9 – Classification and Regression Trees
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Hirophysics.com The Genetic Algorithm vs. Simulated Annealing Charles Barnes PHY 327.
Decision tree and random forest
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning with Spark MLlib
Data Transformation: Normalization
Evaluating Classifiers
Introduction to Machine Learning and Tree Based Methods
Computational Intelligence: Methods and Applications
LECTURE 11: Advanced Discriminant Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Artificial Intelligence Approach to Precision Oncology
Rule Induction for Classification Using
Presented by: Dr Beatriz de la Iglesia
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Basic machine learning background with Python scikit-learn
Data Mining (and machine learning)
Vincent Granville, Ph.D. Co-Founder, DSC
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
ECE 471/571 – Lecture 12 Decision Tree.
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Making Statistical Inferences
Hyperparameters, bias-variance tradeoff, validation
Experiments in Machine Learning
Summary and Recommendations
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Ensembles.
Linear Model Selection and regularization
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Clustering Wei Wang.
Ensemble learning Reminder - Bagging of Trees Random Forest
Summary and Recommendations
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

The Impact of Functional Redundancy on Molecular Signatures Alexandru G. Floares SAIA - Solutions of Artificial Intelligence Applications, Cluj, Romania Adaug footnote cu denumirea conferintei, etc.

Signatures Uniqueness. Wishful Thinking? Discovering signatures from Big Omics Data, using Machine Learning, leads to Precision Medicine Prevailing conception: For a given biomedical condition (e.g., cancer) and class of biomolecules (e.g., miRNA), there should be an unique signature. However, many proposed classifiers, for similar data & biomedical problems, have almost non-overlapping lists of molecules. Alexandru Floares - Rome, October, 2016

The Fundamental Non-Uniqueness of Molecular Signatures If we want the signature to be the minimal list of most relevant biomarkers, it cannot be unique. This is not a technical problem but a fundamental one: Evolution seems to favor functional redundancy which is at the foundation of complex robustness. The most obvious functional redundancy for miRNA: many miRNAs regulate the same mRNA and mRNAs are functionally redundant, too. The fundamental functional redundancy will lead to equivalent but different minimal signatures. Alexandru Floares - Rome, October, 2016

Functional Redundancy Implications Many intriguing omics facts might be simple reflections of the functional redundancy. E.g., cancer classifiers of similar data may have almost non- overlapping lists of molecules. The unexpected heterogeneity of cancer mutations - similar phenotypes but different mutations patterns. Redundancy manifests itself in all normal & pathological functions, in diagnosis, prognosis, treatment, and so on. Alexandru Floares - Rome, October, 2016

Accuracy, Robustness and Transparency What are realistic goals if uniqueness is impossible? Predictive models should be: Accurate: e.g., > 95% Robust: generalize well to new unseen cases Transparent: e.g., Trees/Rules We will illustrate some aspects of this vision, using Alexandru Floares - Rome, October, 2016

The TCGA miRNA Datasets The biggest NGS miRNA data: 9 Cancers (C) and Normal (N). Focus on two binary classifications: C vs N and BRCA vs N. Details in these papers (ResearchGate, Alexandru Floares): Bigger Data is Better for Molecular Diagnosis Tests Based on Decision Trees (Best Paper Award DMBD 2016) Exploring the Functional Redundancy of miRNA in Cancer with Computational Intelligence Alexandru Floares - Rome, October, 2016

Maximum Relevance minimum Redundance Using ML we can build powerful predictive biomedical models. Strong bias toward the highest accuracy & the minimum # variables. Justified: we want accurate and cheap omics tests. Follows the general principle of the Maximum Relevance and minimum Redundance (MRmR). This strategy alone is inadequate for exploring, understanding, and pragmatically exploiting redundancy. For redundant systems we should develop redundant models! Alexandru Floares - Rome, October, 2016

Unifying the MRmR and MRMR Principles We developed ML methodologies capable of producing both: The best omics tests - most accurate & min # variables, based on the Maximum Relevance minimum Redundance (MRmR) Principle, and The best redundant models - most accurate & max # relevant variables, based on the Maximum Relevance Maximum Redundance (MRMR) Principle. Alexandru Floares - Rome, October, 2016

Ideal & The Best Classifier Highly performant with minimal data preprocessing: resistance to outliers, resistance to reasonable missing values, resistance to correlated variables, and resistance to unbalanced classes. Capable of generating large sets of highly accurate models, both of the MRmR and of the MRMR class, Resistant to overfitting. We tested: RF, Boosted C5 DT, SVM, GP, DL NN, and SGB – the best. Alexandru Floares - Rome, October, 2016

Hyperparameter Optimization (Grid Search) I Not especially useful for miRNAs (AUC 0.99), but for other omics data. Hyperparameter optimization & ensemble methods increase the models pool size used to select the MRmR and MRMR models. Even if the accuracy is not much increased by parameter tuning, other aspects of the models could be improved. E.g., a larger team of shorter trees could generalize better than a smaller ensemble of larger trees. Alexandru Floares - Rome, October, 2016

Hyperparameter Optimization (Grid Search) II Optimal number of bins for CV. Values: 5, 10, 20, 50. Best: 10-fold CV. Learn Rate. Values: 0.001, 0.01, 0.1. Best: 0.01 (test AUC = 1) Max Nodes. Values: 2, 4, 6, 8, 9. Best: 2 (AUC 1.000). Interesting: it indicates that miRNAs interactions are not essential, the ensemble being composed of stump trees. Min Child. Values: 1, 2, 5, 10, 25, 50, 100, 200. Best: 5. Subsample. Values: 0.1, 0.2, 0.25, 0.3, 0.5, 0.75, and 0.9. Best: 0.5, gives the best accuracy and prevents overfitting. Alexandru Floares - Rome, October, 2016

miRNAs Importance: Redundancy Signature?

Alexandru Floares - Rome, October, 2016 Long Tail & Redundancy Few variables with high importance and a very long tail of variables with slowly decreasing importance. For non-redundant systems either there is no such long tail, or it represents mainly noise. Functional Analysis shows that the long tail contains cancer related miRNAs, not noise. Thus, for redundant systems long tail could be a redundancy mark. Alexandru Floares - Rome, October, 2016

Long Tail & Redundancy: Univariate Models Build first a model with all 644 features. Select the features with importance > 3.5 (136 features; 21%). Build a univariate model for each of the 136 selected features. Misclassification error is surprisingly good for all univariate models. It ranges from 0.1068 to 0.2705 with a mean of 0.2206. This is another evidence of the underlying redundancy. Alexandru Floares - Rome, October, 2016

Long Tail & Redundancy: In Silico Knockdown miRNAs Eliminate one miR at a time from the bottom/top of the imp list. The results are typical too for a redundant system: More than 100 miRNAs can be removed, from the 136 with importance > 3.5 (73%), and the AUC remains unchanged at 1.00. MRmR model: max AUC 1.00 & min # miRNAs 7. Candidate for omics Dx test: AUC > 0.95: AUC 0.96, 4 miRNAs, and AUC 0.96, 3 miRNAs. MRMR model: max AUC 1.00 & max # miRNAs 136 Alexandru Floares - Rome, October, 2016

Learning Curves BRCA vs NORM I Training size increases from 22 to 865 in steps of 9 Class proportions of the original dataset preserved. Each dataset was partitioned into: 75% training set, 25% for fresh test set. Only the training set was used for 3-fold CV CV was repeated 100 times.

Learning Curves BRCA vs NORM II Repeating CV 100 times, we mimic 100 studies, with various # of different patients. All remaining data were used again for testing the generalization capability, especially for small sample size simulated studies. For example: data set: 22 patients, ~16 (75%) for CV, 6 (25%) for test, 843 (865 total – 22 data set) for generalization testing.

Algorithms for Dx Tests I We deliberately chose simple & transparent algorithms, to be useful for the biomedical community. We used C5 and CART decision trees (DT) algorithms, their advantages making them one of the best choices for omics tests: Implicitly perform feature selection. Discover nonlinear relationships and interactions.

Algorithms for Dx Tests II Require relatively little effort from users for data preparation: DT do not need variable scaling, DT can deal with a reasonable amount of missing values, DT are not affected by outliers. Easy to interpret and explain. Can generate rules helping experts to formalize their knowledge. Usually, we use ensemble methods, & hyperparameter optimization, with boosted C5 , Random Forests, XGBoost, Deep Learning.

C5 AUC vs Sample Size & Fitted Power Law AUC ↑with the sample size Faster for small data sizes Slower for bigger data sizes Min 0.8523, Mean 0.9646, Max 0.9873. Best Fit Power Law: AUC = −0.5636X−0.5461 + 0.9931 Goodness of fit: SSE = 0.0018 R−square = 0.968

C5 AUC vs No Predictors & Fitted Power Law AUC ↑ with the # of predictors. Min 1, Mean 6, Max 11, Best Fit Power Law: AUC = −0.07035X−1.053+ 0.9845 Goodness of fit: SSE = 0.002932, R − square = 0.9502,

CART AUC vs Sample Size: Results & Fitted CART AUC ↑with the sample size. # miRs = cst = 6! ~ constant for ≥ 100 patients Best Fit Power Law: AUC = -1078X-2.514 + 0.954 Goodness of fit: SSE = 0.00189, R-square = 0.9933

Signatures Depends on the Sample Size and on the Classification Algorithms Accuracy: increases in various way with the sample size. The Number of Predictors: either increases (usually) with the sample size or it remains constant. The list of relevant biomarkers is changing with the sample size, partially due to functional redundancy. Robustness: All C5 and CART classifiers generalize well, on remaining data, despite using different signatures, which looks equivalent.

Alexandru Floares - Rome, October, 2016 Conclusion Functional redundancy is a fundamental, but scarcely investigated, property of living systems, related to their amazing robust complexity. We proposed the first ML methodology capable to develop models, from the best Dx tests to the best redundancy explorers. It unifies two general principle: Maximum Relevance & Minimum Redundance, and Maximum Relevance & Maximum Redundance. The signatures should not be unique but the models can be highly accurate, robust and transparent. Alexandru Floares - Rome, October, 2016