Presentation is loading. Please wait.

Presentation is loading. Please wait.

K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,

Similar presentations


Presentation on theme: "K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,"— Presentation transcript:

1 K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat, Jan Struyf, Hendrik Blockeel, Dragi Kocev, Sašo Džeroski K.U.Leuven Department of Computer Science

2 Classification: a common machine learning task e.g., Given: genes with known function Task: predict function for new genes Special case: hierarchical multi-label classification (HMC) gene can have multiple functions functions are organized in a hierarchy tree (e.g., MIPS FunCat) DAG (e.g., Gene Ontology) Hierarchy constraint: if gene is labeled with function X, then it is also labeled with all parents of X Hierarchical Multi-Label Classification (HMC) for Gene Function Prediction

3 K.U.Leuven Department of Computer Science Predictions in Functional Genomics S. cerevisiae (13 datasets) and A. thaliana (12 datasets) two of biology’s model organisms most genes are annotated, ideal for testing purposes method can be applied to other organisms Data based on sequence statistics, phenotype, secondary structure, homology, microarray data,…

4 K.U.Leuven Department of Computer Science Predictive Clustering Trees Our focus is on decision trees Advantages: fast to build, noise-resistant, fast to apply, accurate predictions, easy to interpret, … General framework: predictive clustering trees (PCTs) PCT-algo genes with features and known functions Name A 1 A 2 … A n 1 … 5 5/1 … 40 40/3 40/16 … G1 … … … … x x x x x G2 … … … … x x x x G3 … … … … x x G4 … … … … x x x G5 … … … … x x x G6 … … … … x x x … … … … … … … … InputAlgorithmOutput top-down induction of PCTs PCT

5 K.U.Leuven Department of Computer Science Clus-SCClus- HSC Clus-HMC Hierarchy constraint Identifies global feats Predictive performance Model size Efficiency Standard approach learns one tree per class Special-purpose approach learns one tree per class + hierarchy constraint Our approach learns one single tree for all classes Decision Trees for HMC: Different Approaches

6 K.U.Leuven Department of Computer Science Predictive Clustering Forests 50 predictions 50 bootstrap replicates Training set Ensembles Less interpretability Better performance Algorithm: Clus-HMC-Ens … 1 2 n 3 Clus-HMC 50 PCTs … Test set combined prediction Clus-HMC L1L1 L2L2 L3L3 LnLn L

7 K.U.Leuven Department of Computer Science Clus-SCClus- HSC Clus-HMCClus-HMC-Ens Hierarchy constraint Identifies global feats Predictive performance Model size Efficiency Standard approach learns one tree per class Special-purpose approach learns one tree per class + hierarchy constraint Our approach learns one single tree for all classes Variant of our approach learns forest Decision Trees for HMC: Different Approaches

8 K.U.Leuven Department of Computer Science Evaluation: precision-recall precision: percentage of predicted functions that are correct (TP/(TP+FP)) recall: percentage of actual functions predicted by the algorithm (TP/(TP+FN)) Average PR curve –Consider (instance,class) couples –Couple is (predicted) true if instance (is predicted to have) has class Evaluation TPFN FPTN

9 K.U.Leuven Department of Computer Science S. cerevisiae-FunCat (hom) A. thaliana-GO (seq) S. cerevisiae-FunCat (expr)A. thaliana-GO (interpro) Clus-HMC-Ens better than Clus-HMC (average AUC improvement of 7%) Clus-HMC better than C4.5H (state-of-the-art system for HMC) (for the same recall of C4.5H, average precision improvement of 20.9%)

10 K.U.Leuven Department of Computer Science

11 Comparison with SVMs (Barutcuoglu et al.) –Learn SVM per class –Correct for HC violations with bayesian model

12 K.U.Leuven Department of Computer Science Clus-HMC outperforms (or is comparable to) state-of-the-art methods on functional genomics tasks Ensembles of Clus-HMC are able to boost performance, if the user is willing to give up on interpretability “Revenge of the decision trees” Conclusions


Download ppt "K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,"

Similar presentations


Ads by Google