 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Gene Ontology John Pinney
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Mutual Information Mathematical Biology Seminar
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Protein and Function Databases
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Ch10. Intermolecular Interactions and Biological Pathways
Automatic methods for functional annotation of sequences Petri Törönen.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Networks and Interactions Boo Virk v1.0.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Gene expression analysis
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Introduction to biological molecular networks
A literature network of human genes for high-throughput analysis of gene expression Speaker : Shih-Te, YangShih-Te, Yang Advisor : Ueng-Cheng, YangUeng-Cheng,
Cluster validation Integration ICES Bioinformatics.
Flat clustering approaches
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Bioinformatics support at School of Biological Sciences
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Set Enrichment Analysis. GSEA: Key Features Ranks all genes on array based on their differential expression Identifies gene sets whose member genes.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
David Amar, Tom Hait, and Ron Shamir
Clustering Manpreet S. Katari.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
GO : the Gene Ontology & Functional enrichment analysis
Gene expression analysis
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Presentation transcript:

 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.

3 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

  FAA: Functional Annotation Analysis  GO: Gene Ontology  Pathway  DEG: Differentially Expressed Genes  GSEA: Gene Set Enrichment Analysis  Biological Interpretation and Biological Semantics  Concept lattice analysis 4 Glossary

 Pathway and Ontology-Based Analysis  GO and biological pathway-based analysis:  one of the most powerful methods for inferring the biological meanings of expression changes  list of genes obtained by:  differential expression analysis  co-expression analysis (or clustering)

 6 Pathway and Ontology-Based Analysis

7

  Attributes can be applied for FAA:  transcription factor binding  clinical phenotypes like disease associations  MeSH (Medical Subject Heading) terms  microRNA binding sites  protein family memberships  chromosomal bands, etc  GO terms  biological pathways 8 Pathway and Ontology-Based Analysis

  Features may have their own ontological structures  GO has a structure as a DAG (Directed Acyclic Graph) 9 Pathway and Ontology-Based Analysis

  DEGs: 10 Pathway and Ontology-Based Analysis

11 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

  DEGs:  3 techniques which help obtain DEGs:  t -test  Wilcoxon’s rank sum test  ANOVA  Need to note that multiple-hypothesis-testing problem should be properly managed 12 Pathway and Ontology-Based Analysis

  Co-expression analysis 13 Pathway and Ontology-Based Analysis

  Co-expression analysis  puts similar expression profiles together and different ones apart  Returning genes that are assumed to be co-regulated  Clustering algorithms:  hierarchical-tree clustering  partitional clustering 14 Pathway and Ontology-Based Analysis

  Pathways are powerful resources for the understanding of shared biological processes  E.g.: KEGG, MetaCyc and BioCarta (signaling pathways) 15 Pathway and Ontology-Based Analysis

  MetaCyc :  an experimentally determined non-redundant metabolic pathway database  It is the largest collection  containing over 1400 metabolic pathways 16 Pathway and Ontology-Based Analysis

  Ontology / GO :  providing a shared understanding of a certain domain of information  controlled vocabularies  DAG structures with 3 vocabularies of GO:  Molecular Function (MF)  Cellular Compartment (CC)  Biological Process (BP) 17 Pathway and Ontology-Based Analysis

  Common Gos:  MIPS: integrated source, protein properties, variety of complete genomes  MeSH: clinical including disease names  OMIM (Online Mendelian Inheritance in Man)  UMLS (Unified Medical Language System) 18 Pathway and Ontology-Based Analysis

  GO enrichment test:  For example  if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’  only 1% of the genes in the whole human genome fall into this functional category 19 Pathway and Ontology-Based Analysis

  Common statistical tests:  Chi-square  binomial  hypergeometric tests 20 Pathway and Ontology-Based Analysis

  hypergeometric test: 21 Pathway and Ontology-Based Analysis

  Avoid pitfalls when using hypergeometric test  Choice of background, that makes substantial impact on the result.  All genes having at least one GO annotation  all genes ever known in genome databases  all genes on the microarray  GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories 22 Pathway and Ontology-Based Analysis

  Common Tools  DAVID  ArrayX- Path  Pathway Miner  EASE  GOFish  GOTree etc. 23 Pathway and Ontology-Based Analysis

24

 25 Gene Set-Wise Differential Expression Analysis

26 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

  Evaluates coordinated differential expression of gene groups  Gene Set Enrichment Analysis (GSEA)  The first developed in this category  evaluates for each a pre-defined gene set the significant association with phenotypic classes 27 Gene Set-Wise Differential Expression Analysis

  Difference between FAA and GSEA:  FAA: find over-represented GO terms from a interesting gene list  GSEA: obtain the pre-defined gene list first and test the changes under different conditions. 28 Gene Set-Wise Differential Expression Analysis

29

  Advantages of gene set-wise differential expression analysis:  successfully identified modest but coordinated changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis.  (many tiny expression changes can collectively create a big change)  straightforward biological interpretation because the gene sets are defined by biological knowledge 30 Gene Set-Wise Differential Expression Analysis

  Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation, 31 Gene Set-Wise Differential Expression Analysis

  Typical gene sets:  regulatory-motif  function-related  disease-related sets  Database:  MSigDB:  6769 gene sets  classified into five different collections  Has some interesting extensions 32 Gene Set-Wise Differential Expression Analysis

 33 Differential Co-Expression Analysis

34 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

  Co-expression analysis:  determines the degree of co-expression of a cluster of genes under a certain condition  Differential co-expression analysis:  determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions 35 Differential Co-Expression Analysis

  3 major types:  (a) differential co-expression of gene cluster(s)  (b) gene pair-wise differential co- expression  (c) differential co-expression of paired gene sets 36 Differential Co-Expression Analysis

37

  Type (a), identify differentially co-expressed gene cluster(s) between two conditions  Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co- expression of genes: 38 Differential Co-Expression Analysis

 39 Differential Co-Expression Analysis Type (a) cont.

  Type (b) 40 Differential Co-Expression Analysis

  Type (b), identify differentially co-expressed gene pairs  Techniques:  F -statistic  A meta-analytic approach 41 Differential Co-Expression Analysis

  Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs.  Thus the interpretation may also be improved by ontology and pathway-based annotation analysis. 42 Differential Co-Expression Analysis

  Type (c), dCoxS (differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions  Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed. 43 Differential Co-Expression Analysis

  Type (c) cont.  To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample- wise distances regardless of whether the two pathways have the same number of genes or not. 44 Differential Co-Expression Analysis

  Type (c) cont. 45 Differential Co-Expression Analysis

 46 Biological Interpretation and Biological Semantics

47 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

  Biomedical semantics provides rich descriptions for biomedical domain knowledge.  Motivation for Biological Semantics:  GO has limitations:  The result of GO is typically a long unordered list of annotations  Most of the analysis tools evaluate only one cluster at a time  time-consuming to read the massive annotation lists  hard to manually assemble  Many annotations are redundant 48 Biological Interpretation and Biological Semantics

  Introducing BioLattice:  a mathematical framework  based on concept lattice analysis  organize traditional clusters and associated annotations into a lattice of concepts  A graphical summary  considers gene expression clusters as objects and annotations as attributes  Thus, complex relations among clusters and annotations are clarified, ordered and visualized. 49 Biological Interpretation and Biological Semantics

  Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added 50 Biological Interpretation and Biological Semantics

51

  Tool to construct BioLattice:  The Ganter algorithm Biological Interpretation and Biological Semantics

53

  Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments. 54 Conclusion

55 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice

56