Download presentation
Presentation is loading. Please wait.
1
Canadian Bioinformatics Workshops
2
Module #: Title of Module
2
3
Pathway and Network Analysis
Module 7 – Part III Pathway and Network Analysis Lincoln Stein Bioinformatics for Cancer Genomics May 27-31, 2013
4
Classes of Gene Set Analysis
DAVID GSEA Reactome FI network PARADIGM Khatri et al. PLOS Comp Bio. 8:1 2012
5
Limitations of Gene Set Enrichment Analysis
Many possible gene sets – diseases, molecular function, biological process, cellular compartment, pathways... Gene sets are heavily overlapping; need to sort through lists of enriched gene sets! “Bags of genes” obscure regulatory relationships among them.
6
Pathway Databases Advantages: Disadvantages: Usually curated.
Biochemical view of biological processes. Cause and effect captured. Human-interpretable visualizations. Disadvantages: Sparse coverage of genome. Different databases disagree on boundaries of pathways.
7
KEGG
8
Reactome
9
Reactome Hand-curated pathways in human.
Rigorous curation standards – every reaction traceable to primary literature. Automatically-projected pathways to non-human species. 22 species; 1112 human pathways; 5078 proteins. Features: Google-map style reaction diagrams with overlays; Find pathways containing your gene list; Calculate gene overrepresentation in pathways; Find corresponding pathways in other species. Open access.
10
Pathway Commons
11
Pathway Colorization Main feature offered by all pathway databases.
Upload a gene list Database calculates an enrichment score on each pathway and displays ranked list. Browse into pathways of interest; download colorized pictures.
12
Example from Reactome
13
Example from Reactome
15
Networks Pathways capture only the “well understood” portion of biology. Networks cover less well understood relationships: Genetic interactions Physical interaction Coexpression GO term sharing Adjacency in pathways
21
Biological Networks are Scale Free
Properties: The degree (# connections) of nodes follows a power law. A node of degree k+1 is exponentially less likely to occur than a node of degree k. The local clustering coefficient (tendency of nodes to interconnect) is independent of the degree of the node. Nature Reviews Genetics 5, (February 2004) | doi: /nrg1272
22
Biological Networks are Scale Free
Implications: A small number of genes have a large number of connections (chokepoints). A large number of genes have a small number of connections (leaves). Genes cluster (functional groups). The cluster sizes are also scale-free (many small clusters, few large clusters). Nature Reviews Genetics 5, (February 2004) | doi: /nrg1272
23
Network Databases Can be built automatically or via curation.
Popular sources of curated networks: BioGRID – Curated interactions from literature; 529,000 genes, 167,000 interactions. InTact – Curated interactions from literature; 60,000 genes, 203,000 interactions. MINT – Curated interactions from literature; 31,000 genes, 83,000 interactions.
24
Uncurated Interaction Sources
Text mining approaches Computationally extract gene relationships from text, such as PubMed abstracts. Much faster than hand curation. Not perfect: Problems recognizing gene names. Is hedgehog a gene or a species? Natural language processing is difficult. Popular resources: iHOP PubGene
25
Uncurated Interaction Sources
Experimental techniques Yeast 2 hybrid protein interactions. Protein complex pulldowns/mass spec. Genetic screens, such as synthetic lethals, enhancer/suppressor screens. NOT perfect Y2H interactions have taken proteins out of natural context; physical interaction != biological interaction. Protein complex pulldowns plagued by “sticky” proteins such as actin. Genetic screens highly sensitive to genetic background (“network effects”).
26
Integrative Approaches
Combine multiple sources of evidence to increase accuracy. Simple example: “Party hubs” are Y2H interactions that have been filtered for those partners that share the same temporal-spatial location. Complex example: Combine multiple sources of curated and uncurated evidence.
27
Example: Reactome FI Network
Curated Human Data – Version 35. 5078 proteins reactions 3870 complexes pathways Only ~25% of genome! Goal: add a “corona” of uncurated interaction data around scaffold of curated pathway data.
28
Expanding Reactome’s Coverage
Curated Pathways Uncurated Information human PPI PPI inferred from fly, worm & yeast PPI from text mining Gene co-expression GO annotation on biological processes Protein domain domain interactions GeneWays CellMap TRED Naïve Bayes Classifier Annotated Functional Interactions Predicted Functional Interactions Wu et al. (2010) Genome Biology
29
Integrated Functional Interaction (FI) Network
2929 Integrated Functional Interaction (FI) Network 10,956 proteins (9,542 genes). 209,988 FIs. ~50% coverage of genome. False (+) rate < 1% False (-) rate ~80% 5% of network shown here
30
Active Network Extraction
+ Machine Learning Uncurated Interaction Evidence Curated Pathway Dbs Reactome Functional Interaction Network (~11,000 proteins; 200,000 interactions) Extract and Cluster Altered Genes Disease “modules” (10-30)
31
Clustering of TCGA Breast Cancer Mutations
Cadherin signaling Signaling by Tyrosine Kinase receptors NOTCH and wnt signaling Focal adhesion ECM-Receptor interaction Neuroactive ligand-receptor interaction Mucin cluster Cell adhesion molecules Ubiquitin-mediated proteolysis Metabolism of proteins Signaling by Rho GTPases DNA repair Cell cycle Axon guidance M phase G2/M Transition Calcium signaling
32
256 Pancreatic Cancer Mutations
Patient Samples Genes
33
Pancreatic Mutation Modules
Module 0: MAPK, Hedgehog, TGFβ signaling Module 4: ECM, focal adhesion, integrin signaling Module 5: Wnt & Cadherin singaling Module 3: Translation Module 2: B-cell receptor, ERBB, FGFR, EGFR signaling Module 9: Axon guidance Module 10: muscle contraction Module 1: Heterotrimeric G-protein signaling Module 7: Axon guidance Module 6: Ca2+ signaling Module 8: MHC class II antigen presentation
34
Modules After Hierarchical Clustering
Patient Samples Modules
35
Network-Based Clustering Algorithms
Reactome FI network (Wu & Stein, Genome Biol (12):R112) Expression or SNV analysis Online analysis via Cytoscape Plugin (lab) HotNet (Vandin et al. J Comput Biol Mar;18(3):507-22). Local installation with Python & MatLab Cytoscape visualization WGCNA (Langfelder et al BMC Bioinformatics 9: 559.) Expression analysis Local installation as R package.
36
Classification of Tumors via Molecular Phenotype
Test Classify Through the research and discoveries we are making eventually we will start to deliver tests that will allow physicians to selectively treat cancer patients with the best drugs. In this room we are using a very sophisticated machine machine to look at differences in genes so that we can identify those genes that correlate to response We are able to examine about 8000 genotypes/day or roughly 2million/year Our Company has also taken a lead position in the emerging field of pharmacogenomics and personalized medicines. We believe pharmocogenomics in the clinic offer several advantages to drug development, to improved medicines for patients. Today it is becoming commonplace to examine a single gene on a tumor biopsy to select treatment. We believe that in the future, expanding that approach to scan large sets of genes will improve our ability to find the right drug for the right patient. Through our internal genomics effort, as well as through our alliances with Millennium, the Whitehead Genomics Center, Impath, Exelixis and major clinical centers, we are integrating this approach extensively into our oncology pipeline. Proteomics Transcriptomics Genomics
37
Low risk – reduce treatment High risk – treat aggresively
Risk Stratification Don’t Treat TEST Low risk – reduce treatment Treat 10-20% progress Through the research and discoveries we are making eventually we will start to deliver tests that will allow physicians to selectively treat cancer patients with the best drugs. In this room we are using a very sophisticated machine machine to look at differences in genes so that we can identify those genes that correlate to response We are able to examine about 8000 genotypes/day or roughly 2million/year Our Company has also taken a lead position in the emerging field of pharmacogenomics and personalized medicines. We believe pharmocogenomics in the clinic offer several advantages to drug development, to improved medicines for patients. Today it is becoming commonplace to examine a single gene on a tumor biopsy to select treatment. We believe that in the future, expanding that approach to scan large sets of genes will improve our ability to find the right drug for the right patient. Through our internal genomics effort, as well as through our alliances with Millennium, the Whitehead Genomics Center, Impath, Exelixis and major clinical centers, we are integrating this approach extensively into our oncology pipeline. High risk – treat aggresively No Relapse Relapse
38
Challenges in Biomarker Discovery
Overtraining 22,000 genes; any given cancer may show alterations in 1000s of them; patients cohorts are in 100s. Can find a set of gene alterations that nicely predicts survival in a single cohort by chance. Field is littered with biomarkers that didn’t replicate in independent cohorts. Disease Heterogeneity If there are many subtypes of disease then need even larger cohorts. Tumor Heterogeneity A single primary tumor may carry high-risk and low-risk subclones simultaneously.
39
Using Network Architecture to Accelerate Biomarker Selection
Expression Analysis of tumours from multiple patients Principal component analysis on modules Disease Module Map Correlate principal components with clinical parameters Guanming Wu Genome Biol Dec 10;13(12):R112
40
Samples Used Built the network using Nejm: van de Vijver et al 2002
295 Samples, ~12,000 genes Event: death Validated with GSE4922: Ivshina et al. Cancer Res. 2006 249 Samples, ~13,000 genes Event: recurrence or death
41
PC Analysis Identifies Module 2 as Explaining Much of Variation in Survival
42
Same Signature Predicts Survival in Independent Data Set
43
And Three More Data Sets as Well…
44
Module 2: Kinetochore + Aurora B Signaling
45
Integration of Multiple Data Sets
Experimental samples can be interrogated many ways: RNA expression Genome/exome sequencing Copy number changes/loss of heterozygosity shRNA knockdown screens Integrate multiple functional data types using network/pathway relationships?
46
PARADIGM Vaske, Benz et al. Bioinformatics 26:i
47
Factor graph: directed graph connecting genes; each gene is activated, inactivated, or unchanged in a single patient. Vaske, Benz et al. Bioinformatics 26:i
48
Vaske, Benz et al. Bioinformatics 26:i237 2010
49
PARADIGM: The Bad News Distributed in source code form only
Requires several third-party math/graph libraries (all open source). Tedious to compile! Scant documentation. No repositories of formatted pathway data. No examples of converting experimental data into input files. Good news: we are working on a web service implementation for a Reactome-based implementation.
50
Take Home Messages Pathway/network analysis can provide context to altered gene lists. Pathway/network analysis differs greatly in complexity , power, and usability: SIMPLE: Pathway diagram colorization MODERATE: Reactome FI network extraction COMPLEX: PARADIGM This type of analysis is work-in-progress, but promises ability to integrate data across many dimensions.
51
URLs KEGG – www.genome.jp/kegg Biocarta – www.biocarta.com
WikiPathways – Reactome – NCI/PID – pid.nci.nih.gov Ingenuity – Pathway Commons – PARADIGM --
52
URLs BioGrid – www.thebiogrid.org InTact – www.ebi.ac.uk/intact
MINT – mint.bio.uniroma2.it iHOP – PubGene –
53
We are on a Coffee Break & Networking Session
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.