Identifying Signaling Pathways

Slides:

Advertisements

Similar presentations

Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Biological Networks Analysis Introduction and Dijkstras algorithm.

Advertisements

The multi-layered organization of information in living systems

D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.

Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene- induced Signaling Anthony Gitter Cancer Bioinformatics.

CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.

Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.

Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.

6. Gene Regulatory Networks

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.

Network Analysis and Application Yao Fu

Modeling and identification of biological networks Esa Pitkänen Seminar on Computational Systems Biology Department of Computer Science University.

ResponseNet revealing signaling and regulatory networks linking genetic and transcriptomic screening data CSE Fall.

Reconstruction of Transcriptional Regulatory Networks

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &

Introduction to biological molecular networks

341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.

Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.

Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.

Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Journal club Jun , Zhen.

CSCI2950-C Lecture 12 Networks

Networks and Interactions

1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION

Representation, Learning and Inference in Models of Cellular Networks

Learning gene regulatory networks in Arabidopsis thaliana

Biological networks CS 5263 Bioinformatics.

System Structures Identification

Learning Sequence Motif Models Using Expectation Maximization (EM)

Inferring Models of cis-Regulatory Modules using Information Theory

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter

Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001

Mass spectrometry-based proteomics

Interpolated Markov Models for Gene Finding

RNA Secondary Structure Prediction

Post-GWAS and Mechanistic Analyses

Eukaryotic Gene Finding

Large Scale Data Integration

Interpreting noncoding variants

1 Department of Engineering, 2 Department of Mathematics,

Reference based assembly

1 Department of Engineering, 2 Department of Mathematics,

Linking Genetic Variation to Important Phenotypes

CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks

1 Department of Engineering, 2 Department of Mathematics,

Schedule for the Afternoon

In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.

Analyzing Time Series Gene Expression Data

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey

Volume 43, Issue 3, Pages (September 2015)

Principle of Epistasis Analysis

Algorithms (2IL15) – Lecture 7

Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar

The MultiOmics Explainer

CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.

Presentation transcript:

Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey

Goals for lecture Challenges of integrating high-throughput assays Connecting relevant genes/proteins with interaction networks ResponseNet algorithm Evaluating pathway predictions Classes of signaling pathway prediction methods

High-throughput screening Which genes are involved in which cellular processes? Hit: gene that affects the phenotype Phenotypes include: Growth rate Cell death Cell size Intensity of some reporter Many others

Types of screens Genetic screening Chemical screening Test genes individually or in parallel Knockout, knockdown (RNA interference), overexpression, CRISPR/Cas genome editing Chemical screening Which genes are affected by a stimulus?

Differentially expressed genes Compare mRNA transcript levels between control and treatment conditions Genes whose expression changes significantly are also involved in the cellular process Alternatively, differential protein abundance or phosphorylation

Differentially expressed genes Interpreting screens Differentially expressed genes Screen hits Very few genes detected in both

Assays reveal different parts of a cellular process Database representation of a “pathway” KEGG

Assays reveal different parts of a cellular process Differentially expressed genes Genetic screen hits

Pathways connect the disjoint gene lists Can’t rely on pathway databases High-quality, low coverage Instead learn condition-specific pathways computationally Combine data with generic physical interaction networks

Physical interactions Protein-protein interactions (PPI) Metabolic Protein-DNA (transcription factor-gene) Genes and proteins are different node types Prot A Prot B Appling Graz TF Gene Yeger-Lotem2009

Hairball networks Networks are highly connected Can’t use naïve strategy to connect screen hits and differentially expressed genes Yeger-Lotem2009

Identify connections within an interaction network Yeger-Lotem2009

How to define a computational “pathway” Given: Partially directed network of known physical interactions (e.g. PPI, kinase-substrate, TF-gene) Scores on source nodes Scores on target nodes Do: Return directed paths in the network connecting sources to targets

ResponseNet optimization goals Connect screen hits and differentially expressed genes Recover sparse connections Identify intermediate proteins missed by the screens Prefer high-confidence interactions Minimum cost flow formulation can meet these objectives

Construct the interaction network Protein Gene

Transform to a flow problem

Max flow on graphs Each edge can tolerate different level of flow or have different preference of sending flow along that edge S Pump flow from source Incoming and outgoing flow conserved at each node T Flow conserved to target

Weighting interactions Probability-like confidence of the interaction Example evidence: edge score of 1.0 16 distinct publications supporting the edge iRefWeb

Weights and capacities on edges cij= 1 Flow capacity (wij, cij) wij from interaction network confidence T

Find the minimum cost flow Prefer no flow on the low-weight edges if alternative paths exist Return the edges with non-zero flow T

Formal minimum cost flow Positive flow on an edge incurs a cost Cost is greater for low-weight edges Flow on an edge Parameter controlling the amount of flow from the source

Formal minimum cost flow Flow coming in to a node equals flow leaving the node

Formal minimum cost flow Flow leaving the source equals flow entering the target

Formal minimum cost flow Flow is non-negative and does not exceed edge capacity

Formal minimum cost flow

Linear programming Optimization problem is a linear program Canonical form Polynomial time complexity Many off-the-shelf solvers Practical Optimization: A Gentle Introduction Introduction to linear programming Simplex method Network flow Wikipedia

ResponseNet pathways Identifies pathway members that are neither hits nor differentially expressed Ste5 recovered when STE5 deletion is the perturbation

ResponseNet summary Advantages Disadvantages Computationally efficient Integrates multiple types of data Incorporates interaction confidence Identifies biologically plausible networks Disadvantages Direction of flow is not biologically meaningful Path length not considered Requires sources and targets Dependent on completeness and quality of input network

Evaluating pathway predictions Unlike PIQ, we don’t have a complete gold standard available for evaluation Can simulate “gold standard” pathways from a network Compare relative performance of multiple methods on independent data

Evaluating pathway predictions Ritz2016

Evaluating pathway predictions Ritz2016

Evaluating pathway predictions MacGilvray2018 PR curves can evaluate node or edge recovery but not the global pathway structure

Evaluation beyond pathway databases Natural language processing can also help semi-automated evaluation Literome Chilibot iHOP

Classes of pathway prediction algorithms Are edges important? No Network diffusion Yes Sources and targets? Spanning tree Steiner tree Next slide… Classes of pathway prediction algorithms

Classes of pathway prediction algorithms Have sources and targets What path properties are important? Total path length or score Shortest paths Total source-target connectivity Network flow Connectivity in minimum cost network Steiner tree Complex properties Integer program Symbolic solver Graphical model

Alternative pathway identification algorithms k-shortest paths Ruths2007 Shih2012 Random walks / network diffusion / circuits Tu2006 eQTL electrical diagrams (eQED) HotNet Integer programs Signaling-regulatory Pathway INferencE (SPINE) Chasman2014

Alternative pathway identification algorithms Path-based objectives Physical Network Models (PNM) Maximum Edge Orientation (MEO) Signaling and Dynamic Regulatory Events Miner (SDREM) Steiner tree Prize-collecting Steiner forest (PCSF) Belief propagation approximation (msgsteiner) Omics Integrator implementation Hybrid approaches PathLinker: random walk + shortest paths ANAT: shortest paths + Steiner tree

Recent developments in pathway discovery Multi-task learning: jointly model several related biological conditions ResponseNet extension: SAMNet Steiner forest extension: Multi-PCSF SDREM extension: MT-SDREM Temporal data ResponseNet extension: TimeXNet Steiner forest extension and ST-Steiner Temporal Pathway Synthesizer

Condition-specific genes/proteins used as input Genetic screen hits (as causes or effects) Differentially expressed genes Transcription factors inferred from gene expression Proteomic changes (protein abundance or post- translational modifications) Kinases inferred from phosphorylation Genetic variants or DNA mutations Enzymes regulating metabolites Receptors or sensory proteins Protein interaction partners Pathway databases or other prior knowledge