2 Outline Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.
3 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed Genes GSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological Semantics Concept lattice analysis 4 Glossary
Pathway and Ontology-Based Analysis GO and biological pathway-based analysis: one of the most powerful methods for inferring the biological meanings of expression changes list of genes obtained by: differential expression analysis co-expression analysis (or clustering)
6 Pathway and Ontology-Based Analysis
7
Attributes can be applied for FAA: transcription factor binding clinical phenotypes like disease associations MeSH (Medical Subject Heading) terms microRNA binding sites protein family memberships chromosomal bands, etc GO terms biological pathways 8 Pathway and Ontology-Based Analysis
Features may have their own ontological structures GO has a structure as a DAG (Directed Acyclic Graph) 9 Pathway and Ontology-Based Analysis
DEGs: 10 Pathway and Ontology-Based Analysis
11 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
DEGs: 3 techniques which help obtain DEGs: t -test Wilcoxon’s rank sum test ANOVA Need to note that multiple-hypothesis-testing problem should be properly managed 12 Pathway and Ontology-Based Analysis
Co-expression analysis 13 Pathway and Ontology-Based Analysis
Co-expression analysis puts similar expression profiles together and different ones apart Returning genes that are assumed to be co-regulated Clustering algorithms: hierarchical-tree clustering partitional clustering 14 Pathway and Ontology-Based Analysis
Pathways are powerful resources for the understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling pathways) 15 Pathway and Ontology-Based Analysis
MetaCyc : an experimentally determined non-redundant metabolic pathway database It is the largest collection containing over 1400 metabolic pathways 16 Pathway and Ontology-Based Analysis
Ontology / GO : providing a shared understanding of a certain domain of information controlled vocabularies DAG structures with 3 vocabularies of GO: Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP) 17 Pathway and Ontology-Based Analysis
Common Gos: MIPS: integrated source, protein properties, variety of complete genomes MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System) 18 Pathway and Ontology-Based Analysis
GO enrichment test: For example if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’ only 1% of the genes in the whole human genome fall into this functional category 19 Pathway and Ontology-Based Analysis
Common statistical tests: Chi-square binomial hypergeometric tests 20 Pathway and Ontology-Based Analysis
hypergeometric test: 21 Pathway and Ontology-Based Analysis
Avoid pitfalls when using hypergeometric test Choice of background, that makes substantial impact on the result. All genes having at least one GO annotation all genes ever known in genome databases all genes on the microarray GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories 22 Pathway and Ontology-Based Analysis
Common Tools DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc. 23 Pathway and Ontology-Based Analysis
24
25 Gene Set-Wise Differential Expression Analysis
26 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
Evaluates coordinated differential expression of gene groups Gene Set Enrichment Analysis (GSEA) The first developed in this category evaluates for each a pre-defined gene set the significant association with phenotypic classes 27 Gene Set-Wise Differential Expression Analysis
Difference between FAA and GSEA: FAA: find over-represented GO terms from a interesting gene list GSEA: obtain the pre-defined gene list first and test the changes under different conditions. 28 Gene Set-Wise Differential Expression Analysis
29
Advantages of gene set-wise differential expression analysis: successfully identified modest but coordinated changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis. (many tiny expression changes can collectively create a big change) straightforward biological interpretation because the gene sets are defined by biological knowledge 30 Gene Set-Wise Differential Expression Analysis
Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation, 31 Gene Set-Wise Differential Expression Analysis
Typical gene sets: regulatory-motif function-related disease-related sets Database: MSigDB: 6769 gene sets classified into five different collections Has some interesting extensions 32 Gene Set-Wise Differential Expression Analysis
33 Differential Co-Expression Analysis
34 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
Co-expression analysis: determines the degree of co-expression of a cluster of genes under a certain condition Differential co-expression analysis: determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions 35 Differential Co-Expression Analysis
3 major types: (a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets 36 Differential Co-Expression Analysis
37
Type (a), identify differentially co-expressed gene cluster(s) between two conditions Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co- expression of genes: 38 Differential Co-Expression Analysis
39 Differential Co-Expression Analysis Type (a) cont.
Type (b) 40 Differential Co-Expression Analysis
Type (b), identify differentially co-expressed gene pairs Techniques: F -statistic A meta-analytic approach 41 Differential Co-Expression Analysis
Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs. Thus the interpretation may also be improved by ontology and pathway-based annotation analysis. 42 Differential Co-Expression Analysis
Type (c), dCoxS (differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed. 43 Differential Co-Expression Analysis
Type (c) cont. To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample- wise distances regardless of whether the two pathways have the same number of genes or not. 44 Differential Co-Expression Analysis
Type (c) cont. 45 Differential Co-Expression Analysis
46 Biological Interpretation and Biological Semantics
47 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
Biomedical semantics provides rich descriptions for biomedical domain knowledge. Motivation for Biological Semantics: GO has limitations: The result of GO is typically a long unordered list of annotations Most of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assemble Many annotations are redundant 48 Biological Interpretation and Biological Semantics
Introducing BioLattice: a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations into a lattice of concepts A graphical summary considers gene expression clusters as objects and annotations as attributes Thus, complex relations among clusters and annotations are clarified, ordered and visualized. 49 Biological Interpretation and Biological Semantics
Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added 50 Biological Interpretation and Biological Semantics
51
Tool to construct BioLattice: The Ganter algorithm Biological Interpretation and Biological Semantics
53
Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments. 54 Conclusion
55 Input: Microarray / RNA seq DEG: Differentially Expressed Genesco-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair FAA: Functional Annotation Analysis: Gene Ontology (GO) or Pathway analysis Gene list with annotations Visualization, sematic assembling and knowledge learning: Concept lattice analysis : BioLattice
56