Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
Microarray Data Analysis Day 2
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Viral & Prokaryotic Genetics “Simple” Model Systems.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Sexually dimorphic gene expression in somatic tissues. Authors: J. Isensee and P.Ruiz Noppinger Center for Cardiovascular Research, Center for Gender in.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Gene expression analysis summary Where are we now?
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
09 / 23 / Predicting Protein Function Using Machine-Learned Hierarchical Classifiers Roman Eisner Supervisors: Duane Szafron.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Comprehensive Gene Expression Analysis of Prostate Cancer Reveals Distinct Transcriptional Programs Associated With Metastatic Disease Kevin Paiz-Ramirez.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Cell Biology Course Info and Introduction. What is Cell Biology? Investigation of Biological Systems –Biochemistry –Molecular Biology –Genetics/Molecular.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Automatic methods for functional annotation of sequences Petri Törönen.
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
Synthetic biology: New engineering rules for emerging discipline Andrianantoandro E; Basu S; Karig D K; Weiss R. Molecular Systems Biology 2006.
Chapter 11: Cell Communication. Essential Knowledge 2.e.2 – Timing and coordination of physiological events are regulated by multiple mechanisms (11.1).
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Department of Cancer Research and Molecular Medicine Norwegian University of Science and Technology Trondheim, Norway Gastrointestinal systems biology.
ICML-Tutorial, Banff, Canada, 2004 Measured by gene expression microarrays Gene Regulation System Biology Gene expression: two-phase process 1.Gene is.
Cellular macromolecule catabolism cellular macromolecule metabolism cytoplasm organization and biogenesis establishment of cellular localization intracellular.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
DNA Microarray Data Analysis using Artificial Neural Network Models. by Venkatanand Venkatachalapathy (‘Venkat’) ECE/ CS/ ME 539 Course Project.
Sudhakar Jonnalagadda and Rajagopalan Srinivasan
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Cluster validation Integration ICES Bioinformatics.
GO-Slim term Cluster frequency cytoplasm 1944 out of 2727 genes, 71.3% 70 out of 97 genes, 72.2% out of 72 genes, 86.1% out.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
0 Dpa Control pI 4-7 (Linear) 170 kDa Biotic stress pI 4-7 (Linear) 170 kDa kDa
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Gene Expression Profile in Proliferation and Apoptosis of Human Hepatic Stellate Cell Using Microarray 신혜원 병리학교실.
Protein. Protein and Roles 1: biological process unknown 1.1 Structural categories 1.2 organism categories 1.3 cellular component o unlocalized.
The transcriptional program in the response of human fibroblasts to serum Iyer et. al. (1999) Presented by: Paya Sarraf Brendan Finicle.
(3) Gene Expression Gene Expression (A) What is Gene Expression?
Down-regulated genes in evolved normomutable variants
Functional Genomics in Evolutionary Research
Skin Pharmacol Physiol 2017;30: DOI: /
Dept of Biomedical Informatics University of Pittsburgh
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
4- Protein identification
Loyola Marymount University
Schematic of cellular role categories of theoretical (open bars) and identified proteins on a 2-D electrophoresis gel, pH 4–7 (black bars), in L. casei.
Pangenomes and core genomes of 13 M. florum strains.
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Milk-associated proteomes.
Presentation transcript:

Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid

J. Komorowski and A. Lägreid Joint work with Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik

J. Komorowski and A. Lägreid Selected Challenges in Gene- expression Analysis Function similarity corresponds to expression similarity but: –Functionally corelated genes may be expression-wise dissimilar (e.g. anti-coregulated) –Genes usually have multiple function –Measurements may be approximate and contradictory Can we obtain clusters of biologically related genes? Can we build models that classify unknown genes to functional classes, that are human legible, and that handle approximate and often contradictory data? How can we re-use biological knowledge?

J. Komorowski and A. Lägreid Data Data material –Serum starved fibroblasts, 8,613 genes Added serum to medium at time = 0 Used starved fibroblasts as reference Measured gene activity at various time points –493 genes found to be differentially expressed Results –278 genes known (3 repeats) –212 genes unknown, (uncharacterized) –211 genes given hypothetical function with 88% quality

J. Komorowski and A. Lägreid Fibroblast - serum response quiescent non-proliferating proliferating serum samples for microarray analysis

J. Komorowski and A. Lägreid quiescent non-proliferating proliferating protein synthesis lipid synthesis stress response cellmotility re-entry cell cycle organellebiogenesis transcription Processes

J. Komorowski and A. Lägreid quiescent non-proliferating proliferating immediate early delayed immediate early intermediate late primarysecondarytertiary Dynamic processes

J. Komorowski and A. Lägreid quiescent non-proliferating proliferating primarysecondarytertiary Protein appears after the transcript

J. Komorowski and A. Lägreid genetranscriptprotein Protein dynamics are not always similar to transcript dynamics

J. Komorowski and A. Lägreid Molecular mechanisms of transcriptional response immediate early response genes delayed immediate early response genes intermediate/late response genes effectors = cellular response response serum = signal immediate early response factors secondary transcription factors

J. Komorowski and A. Lägreid quiescent non-proliferating proliferating protein synthesis DNA synthesis energy metabolism cell motility stress response cell motility cell adhesion DNA synthesis lipid synthesis cell cycle regulation The dynamics of cellular processes cell proliferation, negative regulation

J. Komorowski and A. Lägreid 0 - 4(Increasing) AND (Decreasing) AND (Constant) => GO(cell proliferation) Methodology 1. Mining functional classes from an ontology 2. Extracting features for learning 3. Inducing minimal decision rules using rough sets 4. The function of unknown genes is predicted using the rules !

J. Komorowski and A. Lägreid Gene Ontology

J. Komorowski and A. Lägreid Energy pathwaysDNA metabolism Amino acid and derivative metabolism Protein targeting Lipid metabolismTransportIon hemostasisIntracellular traffic Cell deathCell motilityStress response Organelle organization and biogenesis OncogenesisCell cycleCell adhesion Cell surface receptor linked signal transduction Intracellular signaling cascade Developmental processesBlood coagulationCirculation Biological processes from GO

J. Komorowski and A. Lägreid Hierchical Clustering of the Fibroblast Data It’s not a cluster!

J. Komorowski and A. Lägreid Gene Ontology vs. Clusters found by Iyer et al.

J. Komorowski and A. Lägreid Template-based feature synthesis 12 measurement points, 55 possible intervals of length >2

J. Komorowski and A. Lägreid Examples of template definitions

J. Komorowski and A. Lägreid Rule example 1 RuleCovered genes 0 - 4(Constant) AND (Increasing) => GO(protein metabolism and modification) OR GO(mesoderm development) OR GO(protein biosynthesis) M35296 J02783 D13748 X05130 X60957 D13748 U90918 (unknown)

J. Komorowski and A. Lägreid Rule example 2 Rule Covered genes 0 - 4(Increasing) AND (Decreasing) AND (Constant) => GO(cell proliferation) OR GO(cell-cell signaling) OR GO(intracellular signaling cascade) OR GO(oncogenesis) Y07909 X58377 U66468 X58377 X85106 Y07909

J. Komorowski and A. Lägreid Classification using template- based rules IF … THEN … IF 0 - 4(Constant) AND (Increasing) THEN GO(prot. met. and mod.) OR … IF … THEN IF … THEN … … +4 Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions

J. Komorowski and A. Lägreid Cross validation estimates Iyer et al. A: Coverage: 84% Precision: 50% B: Coverage: 71% Precision: 60% C: Coverage: 39% Precision: 90% Coverage = TP/(TP+FN) Precision = TP/(TP+FP)

J. Komorowski and A. Lägreid Cross validation estimates Cho et al. Coverage: 58% Precision: 61% Coverage = TP/(TP+FN) Precision = TP/(TP+FP)

J. Komorowski and A. Lägreid Protein Metabolism and Modification ABC D E A – annotations B – false negatives C – false positives D – true positives E – pred. unknown gene

J. Komorowski and A. Lägreid Re-classification of the Known Genes

J. Komorowski and A. Lägreid Co-classifications for the Unknown Genes

J. Komorowski and A. Lägreid Conclusions Our methodology –Incorporates background biological knowledge –Handles well the noise and incompleteness in the microarray data –Can be objectively evaluated –Predicts multiple functions per gene –Can reclassify known genes and provide possible new functions of the known genes –Can provide hypotheses about the function of unknown genes Experimental work needs to be done to confirm our predictions

J. Komorowski and A. Lägreid Genomic ROSETTA: