Pathweavers Elizabeth McClellan Ribble, Ph.D.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
. Differentially Expressed Genes, Class Discovery & Classification.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
Gene Set Enrichment Analysis (GSEA)
Networks and Interactions Boo Virk v1.0.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Statistical Testing with Genes Saurabh Sinha CS 466.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
A New Statistical Method for Analyzing Longitudinal Multifactor Expression Data and It ’ s Application to Time Course Burn Data Baiyu Zhou Department of.
David Amar, Tom Hait, and Ron Shamir
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Department of Mathematics, Northern New Mexico College1
Networks and Interactions
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
University of California at San Diego
Time-Course Network Enrichment
Gene expression.
Statistical Testing with Genes
Statistical Data Analysis
Biostatistics?.
Gene-set analysis Danielle Posthuma & Christiaan de Leeuw
Ashwani Kumar and Tiratha Raj Singh*
Dept of Biomedical Informatics University of Pittsburgh
University of California at San Diego
Building and Analyzing Genome-Wide Gene Disruption Networks
Ingenuity Knowledge Base
University of California at San Diego
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Subspace Clustering for Microarray Data Analysis:
1 Department of Engineering, 2 Department of Mathematics,
Loyola Marymount University
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Volume 5, Issue 6, Pages e3 (December 2017)
Anastasia Baryshnikova  Cell Systems 
Statistical Data Analysis
Principle of Epistasis Analysis
Network biology An introduction to STRING and Cytoscape
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Characteristics of tissue‐specific co‐expression networks (CNs)‏
Loyola Marymount University
Statistical Testing with Genes
Label propagation algorithm
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Pathweavers Elizabeth McClellan Ribble, Ph.D. 31 July 2018 Joint Statistical Meetings

Weighted Averages for reconstructed pathways: a novel method for pathway level analysis of gene expression profiles Elizabeth McClellan Ribble, Ph.D. Associate Professor, Associate Chair Department of Mathematical and Statistical Sciences Metropolitan State University of Denver, Denver, Colorado Monnie McGee, Ph.D. Associate Professor Department of Statistical Science Southern Methodist University, Dallas, Texas Richard H. Scheuermann, Ph.D. Director, La Jolla Campus J. Craig Venter Institute, La Jolla, California Professor Department of Pathology University of California San Diego, San Diego, California

Pathway analysis: methods Integrated analysis of gene expression and biological pathway data crucial to understanding of systems biology Methods for testing for over-representation (or enrichment) of genes in pathways: do not test for under-representation can suffer from sample size bias ignore dependence structure of pathways lose information when using dichotomous significance markers

Pathway analysis: hypergeometric test Test for pathway enrichment rejects null of no over-representation of genes of interest in pathway of interest with p-value from hypergeometric distribution smaller than a specified threshold Implementations include PathExpress (Goffard and Weiller, 2007) Pathway Processor (Beltrame et al., 2013) Pathway Miner (Pandey et al., 2004) Onto-Express (Khatri et al., 2002) Reactome’s Skypainter (Matthews et al., 2009)

Pathway Analysis: Hypergeometric test Negative log p-value computed for every combination of set of genes of interest (G) and event/pathway of interest (E) If testing correct hypothesis (more genes in event leads to conclusion that there is evidence of enrichment), p-value should be small for large proportions of genes in events However, test subject to sample size bias: small number of genes in large events lead to small p- values and large number of genes in small events can have large p-values

Pathway analysis: Gene Sets Gene Set Enrichment Analysis (Subramanian et al., 2005) and its variations (Emmert-Streib and Glazko, 2011; Khatri et al., 2012) are a common alternative to hypergeometric-style tests Genes are grouped on biological themes (set E) List of significant genes from experiment are ranked (set G) If members of E appear randomly distributed throughout ranked list G, then gene set E not related to G If members of E are concentrated close to top or bottom of ranked list G, then gene set E is considered related to G Main issues: Multiple inheritance (same gene appears in multiple sets) Dichotomous cutoffs to form list of significant genes Pathway precursor-product structures not considered Mitrea et al. (2013) dismiss methods based on gene sets because topological structures are completely ignored

Pathweavers Weighted Averages for Reconstructed Pathways (PathWeAveRs) Pathway reconstruction finds optimal sub-grouping of reactions instead of using broadly defined groups of reactions that may contain unconnected reactions Takes into account missing information or bias in pathway selection within a given database Considers interconnections between pathways, which improves detection of salient pathways Does not encounter sample size bias present in hypergeometric tests Default uses raw p-values or scores but optionally allows for arbitrary dichotomous cutoffs

Pathweavers Pathway reconstructed by reducing a pathway to next lower level elements, reactions, then building “path-nodes” Path-nodes link individual reaction to all reactions connected to in original pathway (precursor-product in metabolic pathways, signal completion in signaling pathway) Biologically meaningful units in pathways are reactions Reconstruction allows for analysis on reactions to get better understanding of genes in system Analysis on higher level of pathways, often subjectively defined, too general to be able to infer much about genetic associations

Pathweavers: Algorithm Filtering: Only genes in both microarray and database are used Weights: Genes are weighted by number of reactions in which they appear (importance of a gene depends on how rarely it appears in database) Reaction score: reactions are scored based on chosen gene values (e.g. p-values) and gene weights Path-node scores: path-nodes are scored based on reaction scores, which depend on how many path-nodes reactions appear in, and number of reactions in path-nodes

Pathweavers: permutation test A large path-node score indicates a pathway has important genes involved Common genes are down-weighted and averages are used to avoid size- bias Path-nodes with small gene counts but high proportions of important genes can still be statistically significant Statistical significance determined by permutation test p-values Reference distribution created by thousands of permutations of gene labels under null hypothesis that relevant genes appear at random in path-nodes P-value is proportion of path-node scores in reference distribution at least as large as observed path-node score

Pathweavers: Applications Basso et al. (2004) studied the role of CD40 co-receptor signaling in B cell maturation by examining expression patterns of genes in stimulated B cells with and without CD40 ligation CD40 plays an essential role in various immune responses such as survival and proliferation of B cells The significant path-nodes identified by PathWeAveRs include reactions involved in signaling in the immune system (co-stimulation by the CD28 family, toll-like receptor cascades, and signaling by interleukins), apoptosis, and unfolded protein response For comparison, Reactome’s Skypainter found significant path-nodes associated with HIV infection, diabetes, DNA repair and replication, apoptosis, and the cell cycle (all of which are large, broadly defined, and have p-values highly correlated with the path-node size) Beer et al. (2002) utilized gene expression profiles to predict survival of patients with early-stage lung adenocarcinoma Significant path-nodes involve Tie2 signaling, vitamin D signaling, and semaphorin signaling, all of which are known to play roles in lung cancer carcinogenesis

Pathweavers: Simulation experiments Gene p-values from the B cell data are assumed to be representative of a generic gene expression microarray experiment Reconstructed path-nodes from Reactome are used to preserve the complex topological structure of the pathways Reactions are enriched with differentially expressed genes by sampling the genes to assign to reactions in a controlled manner (specific subgroups of reactions get small p-values while others should not)

Pathweavers: Simulation experiments Simulation 1: a subset of path-nodes is enriched with differentially expressed genes and ranked (low rank is more enriched) Path WeAveRs p-values had a Spearman correlation of 0.567 with the rank of the path-node Reactome’s Skypainter p-values had a Spearman correlation of -0.107 with the rank of the path-node Simulation 2: gene p-values randomly assigned to gene labels (should be no correlation between path-nodes and differentially expressed genes) Path WeAveRs p-values had a Spearman correlation of -0.021 with the path-node rank Reactome’s Skypainter p-values had a Spearman correlation of -0.138 with the path- node rank

Pathweavers PathWeAveRs improves on existing statistical pathway analyses Algorithm is applied to reconstructed pathway data comprised of all known links between reactions of any given pathway database Permutation tests find optimal subgroupings of pathways This method is unlike other pathway analysis methods in that it acknowledges gene multiple inheritance, reactions that appear in multiple pathways, and connections between reactions such as precursor-product relationships in metabolic pathways and signaling cascades PathWeAveRs is applicable to any organism, pathway database, gene or protein network, and any experiment that generates a list of features and associated numeric values

References Basso K et al. (2004). Tracking CD40 signaling during germinal center development. Blood, 104:408-4096. Beltrame L et al. (2013). Pathway Processor 2.0: a web resource for pathway-based analysis of high-throughput data. Bioinformatics, 29(14):1825-1826. Beer et al. (2002). Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 8:816-824. Emmert-Streib F and Glazko G. (2011). Pathway analysis expression data: deciphering functional building blocks of complex diseases. PLoS Comput Biol, 7. Goffard N and Weiller G. (2007). PathExpress: a web-based tool to identify relevant pathways in gene expression data. Nucleic Acids Res., 35:W176-181. Khatri P et al. (2002). Profiling gene expression using onto-express. Genomics, 79(2):266-270. Khatri P et al. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput Biol. Matthews L et al. (2009). Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res., 37:D619-D622. Mitrea C et al. (2013). Methods and approaches in the topology-based analysis of biological pathways. Frontiers in Microbiology, 4(278). Pandey R et al. (2004). Pathway Miner: extracting gene association networks from molecular pathways for predicting the biological significance of gene expression microarray data. Bioinformatics,1;20(13):2156-8. Subramanian et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci., 102(43): 15534-15550.