Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pathway and network analysis Functional interpretation of gene lists

Similar presentations


Presentation on theme: "Pathway and network analysis Functional interpretation of gene lists"— Presentation transcript:

1 Pathway and network analysis Functional interpretation of gene lists
Bing Zhang Department of Biomedical Informatics Vanderbilt University

2 Omics studies generate lists of interesting genes
log2(ratio) 92546_r_at 92545_f_at 96055_at 102105_f_at 102700_at 161361_s_at 92202_g_at 103548_at 100947_at 101869_s_at 102727_at 160708_at …... -log10(p value) Microarray RNA-Seq Differential expression Proteomics Lists of genes with potential biological interest Clustering

3 Organizing genes based on pathways

4 Advantages of pathway analysis
Better interpretation From interesting genes to interesting biological themes Improved robustness Robust against noise in the data Improved sensitivity Detecting minor but concordant changes in a pathway

5 Pathway databases Databases Limitation
BioCarta ( KEGG ( MetaCyc ( Pathway commons ( Reactome ( STKE ( Signaling Gateway ( Wikipathways ( Limitation Limited coverage Inconsistency among different databases Relationship between pathways is not defined

6 Gene Ontology Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products Three organizing principles: molecular function, biological process, and cellular component Dopamine receptor D2, the product of human gene DRD2 molecular function: dopamine receptor activity biological process: synaptic transmission cellular component: plasma membrane Terms in GO are linked by several types of relationships Is_a (e.g. plasma membrane is_a membrane) Part_of (e.g. membrane is part_of cell) Has part Regulates Occurs in

7 Gene Ontology

8 Annotating genes using GO terms
Two types of GO annotations Electronic annotation Manual annotation All annotations must: be attributed to a source indicate what evidence was found to support the GO term-gene/protein association Types of evidence codes Experimental codes - IDA, IMP, IGI, IPI, IEP Computational codes - ISS, IEA, RCA, IGC Author statement - TAS, NAS Other codes - IC, ND IDA: inferred from direct assay IMP: inferred from mutant phenotype IGI: inferred from genetic interaction IPI: inferred from physical interaction IEP: inferred from expression pattern ISS: inferred from sequence or structure IEA: inferred from electronic annotation RCA: inferred from reviewed computational analysis IGC: inferred from genomic context TAS: traceable author statement NAS: non-traceable author statement IC: inferred by curator ND: no biological data available ND is used when the curator has determined that there is no existing literature to support an annotation. NOT the same as having no annotation at all No annotation means that no one has looked yet

9 Annotating genes using GO terms
…… DLGAP1 discs, large (Drosophila) homolog-associated protein 1 DLGAP2 discs, large (Drosophila) homolog-associated protein 2 DNM1 dynamin 1 DOC2A double C2-like domains, alpha DRD1 dopamine receptor D1 DRD1IP dopamine receptor D1 interacting protein DRD2 dopamine receptor D2 DRD3 dopamine receptor D3 DRD4 dopamine receptor D4 DRD5 dopamine receptor D5 Parent Cell-cell signaling Synaptic transmission 226 human genes Child

10 Access GO Downloads (http://www.geneontology.org) Web-based access
Ontologies Annotations Web-based access AmiGO: QuickGO:

11 Coverage of GO annotations
Homo sapiens Mus musculus #term #gene GO/BP 6502 15228 6227 15709 GO/MF 3144 16389 2961 17287 GO/CC 947 16765 882 16801

12 Over-representation analysis: concept
98 Hoxa5 Hoxa11 Ltbp3 Sox4 Foxc1 Edn1 Ror2 Gnag Smad3 Wdr5 Trp63 Sox9 Pax1 Acd Rai1 Pitx1 …… Sash1 Cd24a Agt Psrc1 Ctla2b Angptl4 Depdc7 Sorbs1 Macrod1 Enpp2 Tmem176a …… 1842 581 Observe compare 65 1842 581 Differentially expressed genes (581 genes) Expect Is the observed overlap significantly larger than the expected value? Development (1842 genes)

13 Over-representation analysis: method
Significant genes Non-significant genes Total genes in the group k j-k j Other genes n-k m-n-j+k m-j n m-n m Hypergeometric test: given a total of m genes where j genes are in the functional group, if we pick n genes randomly, what is the probability of having k or more genes from the group? Observed k n j m Zhang et.al. Nucleic Acids Res. 33:W741, 2005

14 Over-representation analysis: limitations
Arbitrary thresholding Ignoring the order of genes in the significant gene list

15 Gene Set Enrichment Analysis: concept
Do genes in a gene set tend to locate at the top or bottom of the ranked gene list?

16 Gene Set Enrichment Analysis: method
-1/(n-k) +1/k k: Number of genes in the gene set S n: Number of all genes in the ranked gene list Subramanian et.al. PNAS 102:15545, 2005

17 Pathway-based analysis
Organizing genes by Pathways Gene Ontology Enrichment analysis methods Over-representation analysis Gene Set enrichment analysis Major limitation Existing knowledge on pathways or gene functions is far from complete

18 Biological networks Networks Nodes Edges Physical interaction networks
Protein-protein interaction network Proteins Physical interaction, undirected Signaling network Modification, directed Gene regulatory network TFs/miRNAs Target genes Physical interaction, Metabolic network Metabolites Metabolic reaction, Functional association networks Co-expression network Genes/proteins Co-expression, undirected Genetic network Genes Genetic interaction,

19 Properties of complex networks
Human protein-protein interaction network 9,198 proteins and 36,707 interactions Scale-free (hubs) Hierarchical modular Small world (six degree separation)

20 Network visualization
Network visualization tools Cytoscape ( Gehlenborg et al. Nature Methods, 7:S56, 2010

21 Network distance vs functional similarity
Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process. Network-based gene function prediction Network-based disease gene prediction Sharan et al. Mol Syst Biol, 3:88, 2007

22 Organizing genes based on network modules
Protein-protein interaction modules Transcriptional regulatory modules Transcription factor targets miRNA targets Network module-based analysis Over-representation analysis GSEA TF

23 WebGestalt: http://www.webgestalt.org
92546_r_at 92545_f_at 96055_at 102105_f_at 102700_at …… Jul. 1, 2013 – Jun. 30, 2014 49,136 visits from 18,213 visitors ~200 ID types Statistical analysis ~60K gene sets Zhang et.al. Nucleic Acids Res. 33:W741, 2005 Wang et al. Nucleic Acids Res. 41:W77, 2013

24 WebGestalt output: Enriched GO terms
Response to unfolded proteins 12 genes adjp=1.32e-08

25 WebGestalt output: an enriched pathways
Input genes TGF Beta Signaling Pathway

26 WebGestalt output: enriched network modules

27 GSEA: http://www.broadinstitute.org/gsea

28 GSEA: output

29 Summary Organizing genes by “gene sets” Enrichment analysis methods
Pathways Gene Ontology Network modules Enrichment analysis methods Over-representation analysis: WebGestalt Gene Set enrichment analysis: GSEA Tools WebGestalt (Over-representation analysis) GSEA (Gene set enrichment analysis) Manuals for WebGestalt and GSEA in the reading folder


Download ppt "Pathway and network analysis Functional interpretation of gene lists"

Similar presentations


Ads by Google