Ensembles for predicting structured outputs

Slides:



Advertisements
Similar presentations
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Advertisements

Decision trees for hierarchical multilabel classification A case study in functional genomics.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Indian Statistical Institute Kolkata
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Ensemble Learning: An Introduction
Sparse vs. Ensemble Approaches to Supervised Learning
Scalable Text Mining with Sparse Generative Models
Ensemble Learning (2), Tree and Forest
Online Learning Algorithms
K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Hierarchical multilabel classification trees for gene function prediction Leander Schietgat Hendrik Blockeel Jan Struyf Katholieke Universiteit Leuven.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
The Broad Institute of MIT and Harvard Classification / Prediction.
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
COMP24111: Machine Learning Ensemble Models Gavin Brown
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
NTU & MSRA Ming-Feng Tsai
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Decision Trees for Hierarchical Multilabel Classification : A Case Study in Functional Genomics Hendrik Blockeel 1, Leander Schietgat 1, Jan Struyf 1,2,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Decision trees for hierarchical multi-label classification.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Combining Bagging and Random Subspaces to Create Better Ensembles
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
David Amar, Tom Hait, and Ron Shamir
An Empirical Comparison of Supervised Learning Algorithms
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Trees, bagging, boosting, and stacking
Source: Procedia Computer Science(2015)70:
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
Introduction Feature Extraction Discussions Conclusions Results
Machine Learning Week 1.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Somi Jacob and Christian Bach
Classification Breakdown
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
CAMCOS Report Day December 9th, 2015 San Jose State University
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Ensembles for predicting structured outputs Dragi Kocev 1

Motivation Increasing amounts of structured data Vectors Hierarchies – trees, DAGs,... Sequences – time series 2

Motivation Increasing amounts of structured data Vectors Hierarchies – trees, DAGs,... Sequences – time series Success of ensemble methods in simple classification and regression 3

Outline Background Ensembles of PCTs Experimental evaluation Structured outputs Predictive Clustering Trees (PCTs) PCTs for HMLC Ensembles of PCTs Experimental evaluation Application in functional genomics Conclusions 4

Structured outputs Target in supervised learning Single discrete or continuous variable Target in structured prediction Vector of discrete or continuous variables Hierarchy – tree or DAG Sequence – time series Solutions De-composition to simpler problems Exploitation of the structure 5

Predictive Clustering Trees Standard Top-Down Induction of DTs Hierarchy of clusters Distance measure: minimization of intra-cluster variance Instantiation of the variance for different tasks 6

CLUS System where the PCTs framework is implemented (KULeuven & JSI) 7

PCTs – Multiple numeric targets Condition of vegetation in Victoria, Australia Habitat Hectares Index Large Trees, Tree Canopy Cover, Understorey, Litter, Logs, Weeds and Recruitment Euclidean distance Kocev et al., Ecological Modelling 220(8):1159-1168, 2009 8

PCTs – Multiple discrete targets Mediana – Slovenian media space 8000 Questionnaires about reading and TV habits and life style of Slovenians What type of people read: Delo, Dnevnik, Ekipa, Slovenske Novice, Večer Skranjc et al., Informatica 25(3):357-363, 2001 9

PCTs - HMLC Metabolism Energy Respiration Fermentation Transcription 10

PCTs - HMLC Metabolism (1) Energy (2) Respiration (3) Fermentation (4) Transcription (5) 11

PCTs - HMLC Metabolism (1) Energy (2) Respiration (3) Fermentation (4) Transcription (5) 12

PCTs - HMLC Metabolism (1) Energy (2) Respiration (3) Fermentation (4) Transcription (5) 13

PCTs - HMLC Metabolism (1) Energy (2) Transcription (5) Respiration (3) Fermentation (4) 14

PCTs - HMLC Metabolism (1) Energy (2) Transcription (5) Respiration (3) Fermentation (4) Vens et al., Machine Learning 73(2):185-214, 2008 15

PCTs - HMLC A leaf stores the mean label: proportion of the examples belonging to each class Prediction is made by using a user-defined threshold 16

PCTs - HMLC A leaf stores the mean label: proportion of the examples belonging to each class Prediction is made by using a user-defined threshold FunCAT annotation 16. Protein with binding function or cofactor requirement 16.13. C-compound binding 16.13.1 sugar binding ….. 17

Outline Background Ensembles of PCTs Experimental evaluation Structured outputs Predictive Clustering Trees (PCTs) PCTs for HMLC Ensembles of PCTs Experimental evaluation Application in functional genomics Conclusions 18

Ensemble Methods Ensembles are a set of predictive models Unstable base classifiers Voting schemes to combine the predictions into a single prediction Ensemble learning approaches Modification of the data Modification of the algorithm 19

Ensemble Methods Ensembles are a set of predictive models Unstable base classifiers Voting schemes to combine the predictions into a single prediction Ensemble learning approaches Modification of the data Modification of the algorithm Random Forest Bagging 20

Ensemble Methods: Algorithm Training set … 1 2 n 3 (Randomized) Decision tree algorithm n classifiers Test set L n predictions vote n bootstrap replicates Image from van Assche, PhD Thesis, 2008 21

Ensemble Methods: Algorithm Training set … 1 2 n 3 (Randomized) Decision tree algorithm n classifiers Test set L n predictions vote n bootstrap replicates CLUS CLUS CLUS CLUS Image from van Assche, PhD Thesis, 2008 22

Ensembles for structured outputs PCTs as base classifiers Voting schemes for the structured outputs MT Classification: majority and probability distribution vote MT Regression and HMLC: average For an arbitrary structure: prototype calculation function Memory efficient implementation 23

Outline Background Ensembles of PCTs Experimental evaluation Structured outputs Predictive Clustering Trees (PCTs) PCTs for HMLC Ensembles of PCTs Experimental evaluation Application in functional genomics Conclusions 24

Experimental hypotheses How many base classifiers are enough? Do ensembles of PCTs lift the predictive performance of a single PCT? Are ensembles of MT/HMLC PCTs better than ensembles for each target/class separately? Which approach is more efficient in terms of time and size of the models? 25

Descriptive attributes Datasets Datasets Examples Descriptive attributes Size of Output MT Regression 14 154..60607 4..160 2..14 MT Classification 11 154..10368 4..294 HMLC 10 988..10000 80..74435 36..571 Datasets with multiple targets Mainly environmental data acquired in EU and Slovenian projects HMLC Image classification Text classification Functional genomics 26

Experimental design Types of models PCTs and ST trees (with F-test pruning) Ensembles of PCTs and ensembles of ST trees PCTs for HMLC weight for the distance - 0.75 Number of base classifiers (unpruned) Classification: 10, 25, 50, 75, 100, 250, 500, 1000 Regression: 10, 25, 50, 75, 100, 150, 250 HMLC: 10, 25, 50, 75, 100 27

Experimental design (ctd.) Random forest – feature subset size Multiple Targets: log HMLC: 10% 10-fold cross-validation MT Classification - Accuracy MT Regression - correlation coefficient, RMSE, RRMSE HMLC - Precision-Recall (PR) curves, Area under PRCs Friedman and Nemenyi statistical tests 28

Precision-Recall Curves PR curve plots Precision as a function of the Recall Combination of the curves per class Micro-averaging: Area under the average PRC Macro-averaging: Average area under the PRCs 29

Precision-Recall Curves PR curve plots Precision as a function of the Recall Combination of the curves per class Micro-averaging: Area under the average PRC Macro-averaging: Average area under the PRCs 30

Results – Regression (RRMSE) MT Bagging ST Bagging 31

Results – Regression (RRMSE) MT Bagging ST Bagging MT Random forest ST Random forest 32

Results – Regression (RRMSE) 33

Results – Regression (RRMSE) MT Ensembles perform better than single MT tree (stat. sign.) ST Ensembles faster to learn than ST ensembles (stat. sign.) smaller models than ST ensembles (stat. sign.) 34

Results – Classification MT Bagging ST Bagging 35

Results – Classification MT Bagging ST Bagging MT Random forest ST Random forest 36

Results – Classification 37

Results – Classification MT Ensembles perform better than single MT tree (stat. sign.) faster to learn than ST ensembles (stat. sign.) smaller models than ST ensembles (stat. sign.) MT Bagging is better than ST Bagging, while MT random forest is worse than ST random forest 38

Results – HMLC HMLC Bagging HSLC Bagging 39

Results – HMLC HMLC Bagging HSLC Bagging HMLC Random forest HSLC Random forest 40

Results – HMLC 41

Results – HMLC HMLC Ensembles perform better than single HMLC tree (stat. sign.) faster to learn than HSLC ensembles (stat. sign.) smaller models than HSLC ensembles (stat. sign.) HMLC Bagging is better than HSLC Bagging, while HMLC random forest is worse than HSLC random forest 42

Results - Summary Ensembles of PCTs: Perform significantly better than single PCT Perform better than ensembles for the sub- components Smaller and faster to learn than the ensembles for the sub-components For multiple targets the ratio is ~ 3 For HMLC the ratio is ~ 4.5 43

Outline Background Ensembles of PCTs Experimental evaluation Structured outputs Predictive Clustering Trees (PCTs) PCTs for HMLC Ensembles of PCTs Experimental evaluation Application in functional genomics Conclusions 44

Ensembles of PCTs - functional genomics Automatic prediction of the multiple functions of the ORFs in a genome Bagging of PCTs for HMLC compared with state-of- the-art approaches Datasets for three organisms S. Cerevisae, A. Thaliana and M. musculus Two annotation schemes FunCAT and GO (Gene Ontology) Shietgat et al., BMC Bioinformatics 11:2, 2010 45

Background – network based approaches Usage of known functions of genes nearby in a functional association network GeneFAS (Chen and Xu, 2004) GeneMANIA (Mostafavi et al., 2008) – combines multiple networks from genomic and proteomic data KIM (Kim et al., 2008) – combines predictions of a network with predictions from Naïve Bayes Funckenstein (Tian et al., 2008) – logistic regression to combine predictions from network and random forest 46

Background – kernel based approaches KLR (Lee et al., 2006) – combination of Markov random fields and SVMs with diffusion kernels and using them in kernel logistic regression CSVM (Obozinski et al., 2008) – SVM for each function separately, and then reconciliated to enforce the hierarchy constraint BSVM (Barutcuoglu et al., 2006) – SVM for each function separately, then predictions combined using a Bayesian network BSVM+ (Guan et al., 2008) – extension of BSVM, uses Naïve Bayes to combine results over the data sources 47

Datasets Saccharomyces Cerevisiae (D0-D12) sequence statistics, phenotype, secondary structure, homology and expression Arabidopsis Thaliana (D13-D18) sequence statistics, expression, predicted SCOP class, predicted secondary structure, InterPro and homology Each dataset annotated with FunCAT and GO Mus Musculus (D19) MouseFunc challenge, various data sources 48

Experimental hypotheses Is bagging of PCTs better than single PCTs for HMLC and C4.5/H? (D1-D18) Compare Bagging of PCTs with BSVM on D0 Compare Bagging of PCTs with the approaches from MouseFunc Challenge (D19) 49

Results – Bagging of PCTs vs. PCTs 50

Results – Bagging vs. PCTs vs. C4.5H 51

Results – Bagging of PCTs vs. BSVM Comparison by AUROC Average over the AUROC for all functions Bagging of PCTs: 0.871 BSVM: 0.854 Bagging of PCTs scores better on 73 of the 105 GO functions than BSVM (p = 4.37·10-5) 52

Results – MouseFunc Comparison by Different teams use different features Division of the dataset Three branches of GO (BP, MF and CC) Four ranges of specificity: number of genes by which each term is annotated with (3-10, 11-30, 31- 100 and 101-300) 53

Results – MouseFunc (2) Bagging of PCTs is significantly better (p < 0.01) than BSVM+, CSVM, GeneFAS and KIM better than KLR and GeneMANIA Significantly worse (p < 0.01) than Funckenstein 54

Results – MouseFunc (3) Bagging of PCTs is significantly better (p < 0.01) than KIM not different from BSVM+, KLR, CSVM, GeneFAS Significantly worse (p < 0.01) than Funckenstein and GeneMANIA 55

Results – MouseFunc (4) Bagging of PCTs is not different from KLR, CSVM, GeneFAS and KIM Significantly worse (p < 0.01) than Funckenstein, BSVM+ and GeneMANIA 56

Summary – Functional genomics Bagging of PCTs outperforms single PCT and C4.5H/M (Clare, 2003) Bagging of PCTs outperforms a statistical learner based on SVMs (BSVM) for S. Cerevisiae Bagging of PCTs is competitive to statistical and network based methods for the M. Musculus data 57

Outline Background Ensembles of PCTs Experimental evaluation Structured outputs Predictive Clustering Trees (PCTs) PCTs for HMLC Ensembles of PCTs Experimental evaluation Application in functional genomics Conclusions 58

Conclusions Lift the predictive performance of single PCT Better than ensembles for each of the sub-component of the output Competitive with state-of-the-art approaches in functional genomics Applicability to wide range of problems Different type and sizes of outputs Small and large datasets 59

Questions? 60