Download presentation
Presentation is loading. Please wait.
Published byBeverly Simpson Modified over 8 years ago
1
Canadian Bioinformatics Workshops www.bioinformatics.ca
2
2Module #: Title of Module
3
Module 3 Pathway and Network Analysis
4
Module 3 bioinformatics.ca Classes of Gene Set Analysis Khatri et al. PLOS Comp Bio. 8:1 2012 DAVID GSEA Reactome FI network PARADIGM Reactome FI network PARADIGM
5
Module 3 bioinformatics.ca Limitations of Gene Set Enrichment Analysis Many possible gene sets – diseases, molecular function, biological process, cellular compartment, pathways... Gene sets are heavily overlapping; need to sort through lists of enriched gene sets! “Bags of genes” obscure regulatory relationships among them.
6
Module 3 bioinformatics.ca Pathway Databases Advantages: – Usually curated. – Biochemical view of biological processes. – Cause and effect captured. – Human-interpretable visualizations. Disadvantages: – Sparse coverage of genome. – Different databases disagree on boundaries of pathways.
7
Module 3 bioinformatics.ca KEGG
8
Module 3 bioinformatics.ca Reactome Hand-curated pathways in human. Rigorous curation standards – every reaction traceable to primary literature. Automatically-projected pathways to non-human species. 22 species; 1112 human pathways; 5078 proteins. Features: – Google-map style reaction diagrams with overlays; – Find pathways containing your gene list; – Calculate gene overrepresentation in pathways; – Find corresponding pathways in other species. Open access.
9
Module 3 bioinformatics.ca Reactome
10
Module 3 bioinformatics.ca Pathway Commons
11
Module 3 bioinformatics.ca Pathway Colorization Main feature offered by all pathway databases. Upload a gene list Database calculates an enrichment score on each pathway and displays ranked list. Browse into pathways of interest; download colorized pictures.
12
Module 3 bioinformatics.ca Example from Reactome
13
Module 3 bioinformatics.ca Example from Reactome
14
Module 3 bioinformatics.ca
15
Module 3 bioinformatics.ca Networks Pathways capture only the “well understood” portion of biology. Networks cover less well understood relationships: – Genetic interactions – Physical interaction – Coexpression – GO term sharing – Adjacency in pathways
16
Module 3 bioinformatics.ca
17
Module 3 bioinformatics.ca
18
Module 3 bioinformatics.ca
19
Module 3 bioinformatics.ca
20
Module 3 bioinformatics.ca
21
Module 3 bioinformatics.ca Network Databases Can be built automatically or via curation. Popular sources of curated networks: – BioGRID – Curated interactions from literature; 529,000 genes, 167,000 interactions. – InTact – Curated interactions from literature; 60,000 genes, 203,000 interactions. – MINT – Curated interactions from literature; 31,000 genes, 83,000 interactions.
22
Module 3 bioinformatics.ca Uncurated Interaction Sources Text mining approaches – Computationally extract gene relationships from text, such as PubMed abstracts. – Much faster than hand curation. – Not perfect: Problems recognizing gene names. Is hedgehog a gene or a species? Natural language processing is difficult. – Popular resources: iHOP PubGene
23
Module 3 bioinformatics.ca Uncurated Interaction Sources Experimental techniques – Yeast 2 hybrid protein interactions. – Protein complex pulldowns/mass spec. – Genetic screens, such as synthetic lethals, enhancer/suppressor screens. – NOT perfect Y2H interactions have taken proteins out of natural context; physical interaction != biological interaction. Protein complex pulldowns plagued by “sticky” proteins such as actin. Genetic screens highly sensitive to genetic background (“network effects”).
24
Module 3 bioinformatics.ca Integrative Approaches Combine multiple sources of evidence to increase accuracy. Simple example: – “Party hubs” are Y2H interactions that have been filtered for those partners that share the same temporal-spatial location. Complex example: – Combine multiple sources of curated and uncurated evidence.
25
Example: Reactome FI Network Curated Human Data – Version 35. 5078 proteins 4166 reactions 3870 complexes 1112 pathways Only ~25% of genome! Goal: add a “corona” of uncurated interaction data around scaffold of curated pathway data.
26
Expanding Reactome’s Coverage Curated PathwaysUncurated Information human PPI PPI inferred from fly, worm & yeast PPI from text mining Gene co-expression GO annotation on biological processes Protein domain- domain interactions CellMap TRED GeneWays Annotated Functional Interactions Naïve Bayes Classifier Predicted Functional Interactions Wu et al. (2010) Genome Biology
27
Integrated Functional Interaction (FI) Network 10,956 proteins (9,542 genes). 209,988 FIs. ~50% coverage of genome. False (+) rate < 1% False (-) rate ~80% 5% of network shown here
28
Module 3 bioinformatics.ca Active Network Extraction & Analysis Reactome Functional Interaction network Disease subnetwork Extract mutated, overexpressed, undexpressed, expanded/deleted genes Add Linker genes Disease “modules” Disease gene prediction Sample classification Hypothesis generation Apply community clustering algorithms
29
Module 3 bioinformatics.ca p53, SMAD, TGFβ, TNF signaling KRAS, MAPK signaling Integrin signaling Heterotrimeric G-protein signaling Rho GTPase signaling Transcription & translation Cell cycle Wnt & Cadherin signaling Hedgehog signaling Transcription Zinc fingers Ca2+ Signaling Non-silent mutations blue – in primary tumour only green – in xenograft only red – in primary & xenograft Pancreatic Cancer Module Map (43 Cases) Christina Yung
30
Glioblastoma stem cells (GSC) in collaboration with Peter Dirks lab (SickKids) Irina Kalatskaya
31
Glioblastoma Stem Cell Network collagen GPCR Beta-catenin complement IL-1 BMP TP53/RB1/JUN/SP1 CREB1 FGF Small Rho proteins Ribosomal proteins HOX GLI2
32
Module 3 bioinformatics.ca
33
Module 3 bioinformatics.ca Network Classification of Disease Traditional: Associate active genes with clinical behavior to create gene-based prognostic signatures. Limitations: Too many genes reduces statistical power New idea: Look for associations between active modules and clinical behavior.
34
Module 3 bioinformatics.ca Using the Reactome FI Network to Find a Breast Cancer Survival Signature Disease Module Map Correlate principal components with clinical parameters Principal component analysis on modules Expression Analysis of tumours from multiple patients Guanming Wu
35
Module 3 bioinformatics.ca Module-Based Signatures of Breast Cancer Survival Nejm: van de Vijver et al 2002 – 295 Samples, ~12,000 genes – Event: death GSE4922: Ivshina et al. Cancer Res. 2006 – 249 Samples, ~13,000 genes – Event: recurrence or death
36
Module 3 bioinformatics.ca Building the Network Built based on the Nejm data set – 27 modules selected based on size cutoff 7 and average correlation cutoff 0.25. Validated using GSE4922.
37
Module 3 bioinformatics.ca PC Analysis Identifies Module 2 as Explaining Much of Variation in Survival
38
Module 3 bioinformatics.ca Same Signature Predicts Survival in Independent Data Set
39
Module 3 bioinformatics.ca And Three More Data Sets as Well…
40
Module 3 bioinformatics.ca Module 2: Kinetochore + Aurora B Signaling
41
Module 3 bioinformatics.ca Integration of Multiple Data Sets Experimental samples can be interrogated many ways: – RNA expression – Genome/exome sequencing – Copy number changes/loss of heterozygosity – shRNA knockdown screens Integrate multiple functional data types using network/pathway relationships?
42
Module 3 bioinformatics.ca Vaske, Benz et al. Bioinformatics 26:i237 2010 PARADIGM
43
Module 3 bioinformatics.ca Vaske, Benz et al. Bioinformatics 26:i237 2010 Factor graph: directed graph connecting genes; each gene is activated, inactivated, or unchanged in a single patient.
44
Module 3 bioinformatics.ca Vaske, Benz et al. Bioinformatics 26:i237 2010
45
Module 3 bioinformatics.ca PARADIGM: The Bad News Distributed in source code form only – Requires several third-party math/graph libraries (all open source). – I have not gotten it to compile yet! No documentation. No repositories of formatted pathway data. No examples of converting experimental data into input files.
46
Module 3 bioinformatics.ca Take Home Messages Pathway/network analysis can provide context to altered gene lists. Pathway/network analysis differs greatly in complexity, power, and usability: – SIMPLE: Pathway diagram colorization – MODERATE: Reactome FI network extraction – COMPLEX: PARADIGM This type of analysis is work-in-progress, but promises ability to integrate data across many dimensions.
47
Module 3 bioinformatics.ca URLs KEGG – www.genome.jp/kegg Biocarta – www.biocarta.com WikiPathways – www.wikipathways.org Reactome – www.reactome.org NCI/PID – pid.nci.nih.gov Ingenuity – www.ingenuity.comwww.ingenuity.com Pathway Commons – www.pathwaycommons.org/pc/www.pathwaycommons.org/pc/ PARADIGM -- http://sbenz.github.com/Paradigm/
48
Module 3 bioinformatics.ca URLs BioGrid – www.thebiogrid.orgwww.thebiogrid.org InTact – www.ebi.ac.uk/intactwww.ebi.ac.uk/intact MINT – mint.bio.uniroma2.it iHOP – www.ihop-net.org/UniPub/iHOPwww.ihop-net.org/UniPub/iHOP PubGene – www.pubgene.org
49
Module 2 bioinformatics.ca We are on a Coffee Break & Networking Session
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.