Pathway and network analysis Functional interpretation of gene lists

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Applications of GO. Goals of Gene Ontology Project.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Understanding protein lists from proteomics studies Bing Zhang Department of Biomedical Informatics Vanderbilt University
Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
Gene Set Enrichment Analysis (GSEA)
Biological Pathways & Networks
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Copyright OpenHelix. No use or reproduction without express written consent1.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The ‘regulates’ relationships Chris, David, Tanya.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Tutorial session 3 Network analysis Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
24th Feb 2006 Jane Lomax GO Further. 24th Feb 2006 Jane Lomax GO annotations Where do the links between genes and GO terms come from?
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Introduction to biological molecular networks
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
GO enrichment and GOrilla
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Gene Annotation & Gene Ontology
Canadian Bioinformatics Workshops
Annotating with GO: an overview
GO : the Gene Ontology & Functional enrichment analysis
Overview Gene Ontology Introduction Biological network data
Gene expression analysis
Presentation transcript:

Pathway and network analysis Functional interpretation of gene lists Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Omics studies generate lists of interesting genes log2(ratio) 92546_r_at 92545_f_at 96055_at 102105_f_at 102700_at 161361_s_at 92202_g_at 103548_at 100947_at 101869_s_at 102727_at 160708_at …... -log10(p value) Microarray RNA-Seq Differential expression Proteomics Lists of genes with potential biological interest Clustering

Organizing genes based on pathways

Advantages of pathway analysis Better interpretation From interesting genes to interesting biological themes Improved robustness Robust against noise in the data Improved sensitivity Detecting minor but concordant changes in a pathway

Pathway databases Databases Limitation BioCarta (http://www.biocarta.com/genes/index.asp) KEGG (http://www.genome.jp/kegg/pathway.html) MetaCyc (http://metacyc.org) Pathway commons (http://www.pathwaycommons.org) Reactome (http://www.reactome.org) STKE (http://stke.sciencemag.org/cm) Signaling Gateway (http://www.signaling-gateway.org) Wikipathways (http://www.wikipathways.org) Limitation Limited coverage Inconsistency among different databases Relationship between pathways is not defined

Gene Ontology Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products Three organizing principles: molecular function, biological process, and cellular component Dopamine receptor D2, the product of human gene DRD2 molecular function: dopamine receptor activity biological process: synaptic transmission cellular component: plasma membrane Terms in GO are linked by several types of relationships Is_a (e.g. plasma membrane is_a membrane) Part_of (e.g. membrane is part_of cell) Has part Regulates Occurs in

Gene Ontology

Annotating genes using GO terms Two types of GO annotations Electronic annotation Manual annotation All annotations must: be attributed to a source indicate what evidence was found to support the GO term-gene/protein association Types of evidence codes Experimental codes - IDA, IMP, IGI, IPI, IEP Computational codes - ISS, IEA, RCA, IGC Author statement - TAS, NAS Other codes - IC, ND IDA: inferred from direct assay IMP: inferred from mutant phenotype IGI: inferred from genetic interaction IPI: inferred from physical interaction IEP: inferred from expression pattern ISS: inferred from sequence or structure IEA: inferred from electronic annotation RCA: inferred from reviewed computational analysis IGC: inferred from genomic context TAS: traceable author statement NAS: non-traceable author statement IC: inferred by curator ND: no biological data available ND is used when the curator has determined that there is no existing literature to support an annotation. NOT the same as having no annotation at all No annotation means that no one has looked yet

Annotating genes using GO terms …… DLGAP1 discs, large (Drosophila) homolog-associated protein 1 DLGAP2 discs, large (Drosophila) homolog-associated protein 2 DNM1 dynamin 1 DOC2A double C2-like domains, alpha DRD1 dopamine receptor D1 DRD1IP dopamine receptor D1 interacting protein DRD2 dopamine receptor D2 DRD3 dopamine receptor D3 DRD4 dopamine receptor D4 DRD5 dopamine receptor D5 Parent Cell-cell signaling Synaptic transmission 226 human genes Child

Access GO Downloads (http://www.geneontology.org) Web-based access Ontologies http://www.geneontology.org/page/download-ontology Annotations http://www.geneontology.org/page/download-annotations Web-based access AmiGO: http://www.godatabase.org QuickGO: http://www.ebi.ac.uk/QuickGO

Coverage of GO annotations Homo sapiens Mus musculus #term #gene GO/BP 6502 15228 6227 15709 GO/MF 3144 16389 2961 17287 GO/CC 947 16765 882 16801

Over-representation analysis: concept 98 Hoxa5 Hoxa11 Ltbp3 Sox4 Foxc1 Edn1 Ror2 Gnag Smad3 Wdr5 Trp63 Sox9 Pax1 Acd Rai1 Pitx1 …… Sash1 Cd24a Agt Psrc1 Ctla2b Angptl4 Depdc7 Sorbs1 Macrod1 Enpp2 Tmem176a …… 1842 581 Observe compare 65 1842 581 Differentially expressed genes (581 genes) Expect Is the observed overlap significantly larger than the expected value? Development (1842 genes)

Over-representation analysis: method Significant genes Non-significant genes Total genes in the group k j-k j Other genes n-k m-n-j+k m-j n m-n m Hypergeometric test: given a total of m genes where j genes are in the functional group, if we pick n genes randomly, what is the probability of having k or more genes from the group? Observed k n j m Zhang et.al. Nucleic Acids Res. 33:W741, 2005

Over-representation analysis: limitations Arbitrary thresholding Ignoring the order of genes in the significant gene list

Gene Set Enrichment Analysis: concept Do genes in a gene set tend to locate at the top or bottom of the ranked gene list?

Gene Set Enrichment Analysis: method -1/(n-k) +1/k k: Number of genes in the gene set S n: Number of all genes in the ranked gene list Subramanian et.al. PNAS 102:15545, 2005 http://www.broad.mit.edu/gsea/

Pathway-based analysis Organizing genes by Pathways Gene Ontology Enrichment analysis methods Over-representation analysis Gene Set enrichment analysis Major limitation Existing knowledge on pathways or gene functions is far from complete

Biological networks Networks Nodes Edges Physical interaction networks Protein-protein interaction network Proteins Physical interaction, undirected Signaling network Modification, directed Gene regulatory network TFs/miRNAs Target genes Physical interaction, Metabolic network Metabolites Metabolic reaction, Functional association networks Co-expression network Genes/proteins Co-expression, undirected Genetic network Genes Genetic interaction,

Properties of complex networks Human protein-protein interaction network 9,198 proteins and 36,707 interactions Scale-free (hubs) Hierarchical modular Small world (six degree separation)

Network visualization Network visualization tools Cytoscape (http://www.cytoscape.org) Gehlenborg et al. Nature Methods, 7:S56, 2010

Network distance vs functional similarity Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process. Network-based gene function prediction Network-based disease gene prediction Sharan et al. Mol Syst Biol, 3:88, 2007

Organizing genes based on network modules Protein-protein interaction modules Transcriptional regulatory modules Transcription factor targets miRNA targets Network module-based analysis Over-representation analysis GSEA TF

WebGestalt: http://www.webgestalt.org 92546_r_at 92545_f_at 96055_at 102105_f_at 102700_at …… Jul. 1, 2013 – Jun. 30, 2014 49,136 visits from 18,213 visitors ~200 ID types Statistical analysis ~60K gene sets Zhang et.al. Nucleic Acids Res. 33:W741, 2005 Wang et al. Nucleic Acids Res. 41:W77, 2013

WebGestalt output: Enriched GO terms Response to unfolded proteins 12 genes adjp=1.32e-08

WebGestalt output: an enriched pathways Input genes TGF Beta Signaling Pathway

WebGestalt output: enriched network modules

GSEA: http://www.broadinstitute.org/gsea

GSEA: output

Summary Organizing genes by “gene sets” Enrichment analysis methods Pathways Gene Ontology Network modules Enrichment analysis methods Over-representation analysis: WebGestalt Gene Set enrichment analysis: GSEA Tools WebGestalt (Over-representation analysis) GSEA (Gene set enrichment analysis) Manuals for WebGestalt and GSEA in the reading folder