Gene id GO0000166GO0000267... GO0001883 Feature_1Feature_2... Feature_t 962510 1 2.991-2.2960.68 2111 1 -0.013-1.0480.767 8451900 0 -0.8430.1551.396...

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Decision trees for hierarchical multilabel classification A case study in functional genomics.
Microarray Data Analysis Day 2
Discovery Challenge Gene expression datasets On behalf of Olivier Gandrillon.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Decision Trees Chapter 18 From Data to Knowledge.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
K.U.Leuven Department of Computer Science Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat,
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Automatic methods for functional annotation of sequences Petri Törönen.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Image recognition using analysis of the frequency domain features 1.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Networks and Interactions Boo Virk v1.0.
Hierarchical multilabel classification trees for gene function prediction Leander Schietgat Hendrik Blockeel Jan Struyf Katholieke Universiteit Leuven.
Basic Data Mining Technique
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Bioinformatics and Computational Biology
Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.
Lecture Notes for Chapter 4 Introduction to Data Mining
T HE S CREENERS WERE CREATED BY MAN. T HEY EVOLVED.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Compiling Information and Inferring Useful Knowledge for Systems Biology by Text Mining the Literature Anália Lourenço IBB – Institute for Biotechnology.
Combining Bagging and Random Subspaces to Create Better Ensembles
Cell Image Analysis, Part 2
Networks and Interactions
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
An Artificial Intelligence Approach to Precision Oncology
Inflamm Intest Dis 2016;1: DOI: /
Interferons: Type I José Ignacio Saldana, Imperial College London, UK
Research Areas Christoph F. Eick
Data Mining Functionalities (2)
Ensembles for predicting structured outputs
INTERLEUKIN 10 (IL-10) CATEGORY: RECEPTORS & MOLECULES
Unit III Information Essential to Life Processes
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Introduction to Bioinformatic
Ensemble learning Reminder - Bagging of Trees Random Forest
Interpretation of Similar Gene Expression Reordering
Extracting Information from Diverse and Noisy Scanned Document Images
Clusters from the functional network
Semi-Automatic Data-Driven Ontology Construction System
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Evaluating Classifiers for Disease Gene Discovery
Presentation transcript:

Gene id GO GO GO Feature_1Feature_2... Feature_t Ensembles of PCTs and Rule ensembles Predictive clustering relates gene annotations to phenotype properties extracted from images Dragi Kocev, Bernard Ženko, Petra Paul, Coenraad Kuijl, Jacques Neefjes, and Sašo Džeroski Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia {dragi.kocev, bernard.zenko, Division of Cell Biology and Centre for Biomedical Genetics, NKI, Netherlands {p.paul, c.kuijl, IntroductionData description Results Methodology Conclusions  Induced by standard TDIDT algorithm  Able to make a prediction for given structured output  Heuristic score: minimization of intra-cluster variance  Definition of a distance and prototype function for a given output  Medoid is taken as a prototype in each leaf  Cosine similarity as distance measure Predictive clustering trees Training set … 1 2 n 3 (Randomized) Decision tree algorithm n classifiers … n bootstrap replicates (Randomized) Decision tree algorithm (Randomized) Decision tree algorithm (Randomized) Decision tree algorithm Vote Ensemble prediction Convert tree to rules Optimization and regularization Set of rules  Ensembles are able to lift the predictive performance of a single predictive clustering tree  Random forests are efficient to learn  Ensembles are not interpretable −conversion to fitted rule ensembles  Each tree from the ensembles is converted to a set of rules  Using some optimization techniques, select the rule set with best performance  Easy to interpret by domain experts C1C2C3C4C5C6C7C8 Mean Cells Intensity StdIntensity GFP Mean Cells Texture AngularSecondMoment GFP Mean Cytoplasm Intensity IntegratedIntensityE GFP Mean Cytoplasm Texture InfoMeas1 GFP Mean Means ClassII per Cells AreaShape Eccentricity Mean Means ClassII per Cells Texture Entropy GFP Mean Cells Children EE Count Mean Means EE per Cells AreaShape Perimeter Mean Nuclei AreaShape Solidity Mean Means Golgi per Cells Intensity IntegratedIntensity RFP Mean Means Golgi per Cells Intensity IntegratedIntensityE RFP Mean Means Golgi per Cells RadialIntensityDist FracAtD RFP Mean Means Golgi per Cells RadialIntensityDist FracAtD RFP Size of Cluster Genes involved in the defense response (GO ) and regulation of metabolic processes (GO ) Genes involved in receptor binding (GO ) and are present in the cytoplasm (GO )  siRNA screen performed on 269 genes, from which 20 were hypothetical  Each gene is described by: −its annotation with terms from the Gene Ontology −resulting phenotypes (images from confocal microscopy)  Only the GO terms that are used to annotate at least 1 gene from the ones analysed (in total - 334)  CellProfiler for extracting features from the images: in total 700 features, from which 13 most relevant to the study are used  Grouping genes with similar phenotypes upon siRNA mediated downregulation  Phenotypes are described by features extracted from images using some free general-purpose or custom-made software (e.g., CellProfiler)  siRNA screen designed to study MHC Class II antigen presentation −A major regulatory process in the immune system −Controls most aspects of the adaptive immune response −Strongly linked to almost all autoimmune diseases IF GO = 1 AND GO = 1 THEN Genes involved in regulation (GO ) and in particular cellular nucleobase, nucleoside, nucleotide and nucleic acid metabolic processes (GO )  Application of the predictive clustering paradigm for analysis of phenotype images from siRNA screen for MHC Class II antigen presentation  The clusters and their descriptions are obtained in a single step  Identified and described groups of genes which yield similar phenotypes upon siRNA mediated downregulation