Download presentation
Presentation is loading. Please wait.
1
Pathweavers Elizabeth McClellan Ribble, Ph.D.
31 July Joint Statistical Meetings
2
Weighted Averages for reconstructed pathways: a novel method for pathway level analysis of gene expression profiles Elizabeth McClellan Ribble, Ph.D. Associate Professor, Associate Chair Department of Mathematical and Statistical Sciences Metropolitan State University of Denver, Denver, Colorado Monnie McGee, Ph.D. Associate Professor Department of Statistical Science Southern Methodist University, Dallas, Texas Richard H. Scheuermann, Ph.D. Director, La Jolla Campus J. Craig Venter Institute, La Jolla, California Professor Department of Pathology University of California San Diego, San Diego, California
3
Pathway analysis: methods
Integrated analysis of gene expression and biological pathway data crucial to understanding of systems biology Methods for testing for over-representation (or enrichment) of genes in pathways: do not test for under-representation can suffer from sample size bias ignore dependence structure of pathways lose information when using dichotomous significance markers
4
Pathway analysis: hypergeometric test
Test for pathway enrichment rejects null of no over-representation of genes of interest in pathway of interest with p-value from hypergeometric distribution smaller than a specified threshold Implementations include PathExpress (Goffard and Weiller, 2007) Pathway Processor (Beltrame et al., 2013) Pathway Miner (Pandey et al., 2004) Onto-Express (Khatri et al., 2002) Reactome’s Skypainter (Matthews et al., 2009)
5
Pathway Analysis: Hypergeometric test
Negative log p-value computed for every combination of set of genes of interest (G) and event/pathway of interest (E) If testing correct hypothesis (more genes in event leads to conclusion that there is evidence of enrichment), p-value should be small for large proportions of genes in events However, test subject to sample size bias: small number of genes in large events lead to small p- values and large number of genes in small events can have large p-values
6
Pathway analysis: Gene Sets
Gene Set Enrichment Analysis (Subramanian et al., 2005) and its variations (Emmert-Streib and Glazko, 2011; Khatri et al., 2012) are a common alternative to hypergeometric-style tests Genes are grouped on biological themes (set E) List of significant genes from experiment are ranked (set G) If members of E appear randomly distributed throughout ranked list G, then gene set E not related to G If members of E are concentrated close to top or bottom of ranked list G, then gene set E is considered related to G Main issues: Multiple inheritance (same gene appears in multiple sets) Dichotomous cutoffs to form list of significant genes Pathway precursor-product structures not considered Mitrea et al. (2013) dismiss methods based on gene sets because topological structures are completely ignored
7
Pathweavers Weighted Averages for Reconstructed Pathways (PathWeAveRs)
Pathway reconstruction finds optimal sub-grouping of reactions instead of using broadly defined groups of reactions that may contain unconnected reactions Takes into account missing information or bias in pathway selection within a given database Considers interconnections between pathways, which improves detection of salient pathways Does not encounter sample size bias present in hypergeometric tests Default uses raw p-values or scores but optionally allows for arbitrary dichotomous cutoffs
8
Pathweavers Pathway reconstructed by reducing a pathway to next lower level elements, reactions, then building “path-nodes” Path-nodes link individual reaction to all reactions connected to in original pathway (precursor-product in metabolic pathways, signal completion in signaling pathway) Biologically meaningful units in pathways are reactions Reconstruction allows for analysis on reactions to get better understanding of genes in system Analysis on higher level of pathways, often subjectively defined, too general to be able to infer much about genetic associations
9
Pathweavers: Algorithm
Filtering: Only genes in both microarray and database are used Weights: Genes are weighted by number of reactions in which they appear (importance of a gene depends on how rarely it appears in database) Reaction score: reactions are scored based on chosen gene values (e.g. p-values) and gene weights Path-node scores: path-nodes are scored based on reaction scores, which depend on how many path-nodes reactions appear in, and number of reactions in path-nodes
10
Pathweavers: permutation test
A large path-node score indicates a pathway has important genes involved Common genes are down-weighted and averages are used to avoid size- bias Path-nodes with small gene counts but high proportions of important genes can still be statistically significant Statistical significance determined by permutation test p-values Reference distribution created by thousands of permutations of gene labels under null hypothesis that relevant genes appear at random in path-nodes P-value is proportion of path-node scores in reference distribution at least as large as observed path-node score
11
Pathweavers: Applications
Basso et al. (2004) studied the role of CD40 co-receptor signaling in B cell maturation by examining expression patterns of genes in stimulated B cells with and without CD40 ligation CD40 plays an essential role in various immune responses such as survival and proliferation of B cells The significant path-nodes identified by PathWeAveRs include reactions involved in signaling in the immune system (co-stimulation by the CD28 family, toll-like receptor cascades, and signaling by interleukins), apoptosis, and unfolded protein response For comparison, Reactome’s Skypainter found significant path-nodes associated with HIV infection, diabetes, DNA repair and replication, apoptosis, and the cell cycle (all of which are large, broadly defined, and have p-values highly correlated with the path-node size) Beer et al. (2002) utilized gene expression profiles to predict survival of patients with early-stage lung adenocarcinoma Significant path-nodes involve Tie2 signaling, vitamin D signaling, and semaphorin signaling, all of which are known to play roles in lung cancer carcinogenesis
12
Pathweavers: Simulation experiments
Gene p-values from the B cell data are assumed to be representative of a generic gene expression microarray experiment Reconstructed path-nodes from Reactome are used to preserve the complex topological structure of the pathways Reactions are enriched with differentially expressed genes by sampling the genes to assign to reactions in a controlled manner (specific subgroups of reactions get small p-values while others should not)
13
Pathweavers: Simulation experiments
Simulation 1: a subset of path-nodes is enriched with differentially expressed genes and ranked (low rank is more enriched) Path WeAveRs p-values had a Spearman correlation of with the rank of the path-node Reactome’s Skypainter p-values had a Spearman correlation of with the rank of the path-node Simulation 2: gene p-values randomly assigned to gene labels (should be no correlation between path-nodes and differentially expressed genes) Path WeAveRs p-values had a Spearman correlation of with the path-node rank Reactome’s Skypainter p-values had a Spearman correlation of with the path- node rank
14
Pathweavers PathWeAveRs improves on existing statistical pathway analyses Algorithm is applied to reconstructed pathway data comprised of all known links between reactions of any given pathway database Permutation tests find optimal subgroupings of pathways This method is unlike other pathway analysis methods in that it acknowledges gene multiple inheritance, reactions that appear in multiple pathways, and connections between reactions such as precursor-product relationships in metabolic pathways and signaling cascades PathWeAveRs is applicable to any organism, pathway database, gene or protein network, and any experiment that generates a list of features and associated numeric values
15
References Basso K et al. (2004). Tracking CD40 signaling during germinal center development. Blood, 104: Beltrame L et al. (2013). Pathway Processor 2.0: a web resource for pathway-based analysis of high-throughput data. Bioinformatics, 29(14): Beer et al. (2002). Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 8: Emmert-Streib F and Glazko G. (2011). Pathway analysis expression data: deciphering functional building blocks of complex diseases. PLoS Comput Biol, 7. Goffard N and Weiller G. (2007). PathExpress: a web-based tool to identify relevant pathways in gene expression data. Nucleic Acids Res., 35:W Khatri P et al. (2002). Profiling gene expression using onto-express. Genomics, 79(2): Khatri P et al. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput Biol. Matthews L et al. (2009). Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res., 37:D619-D622. Mitrea C et al. (2013). Methods and approaches in the topology-based analysis of biological pathways. Frontiers in Microbiology, 4(278). Pandey R et al. (2004). Pathway Miner: extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data. Bioinformatics,1;20(13): Subramanian et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci., 102(43):
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.