1 Joint analysis of regulatory networks and expression profiles Ron Shamir School of Computer Science Tel Aviv University April 2013 1 Sources: Igor Ulitsky.

Slides:

Advertisements

Similar presentations

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.

Advertisements

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

The multi-layered organization of information in living systems

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.

Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,

University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.

Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.

APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir ’ s Computational Genomics Group.

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.

Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.

Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.

Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.

Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,

Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al

Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.

Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Joint analysis of genetic and physical interactions in S. cerevisiae Igor Ulitsky Ron Shamir lab School of Computer Science Tel Aviv University.

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.

Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.

Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.

Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.

Network Analysis and Application Yao Fu

Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

EXPression ANalyzer and DisplayER Adi Maron-Katz Igor Ulitsky Chaim Linhart Amos Tanay Rani Elkon Seagull Shavit Dorit Sagir Eyal David Roded Sharan Israel.

Regulation of Gene Expression Eukaryotes

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Cellular macromolecule catabolism cellular macromolecule metabolism cytoplasm organization and biogenesis establishment of cellular localization intracellular.

CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.

Introduction to biological molecular networks

DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.

341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.

The Bi-Module problem: new algorithms and applications Group meeting January 2013 David Amar.

Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School

Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.

Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.

Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey

Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

CSCI2950-C Lecture 12 Networks

Biological networks CS 5263 Bioinformatics.

Schedule for the Afternoon

SEG5010 Presentation Zhou Lanjun.

Presentation transcript:

1 Joint analysis of regulatory networks and expression profiles Ron Shamir School of Computer Science Tel Aviv University April Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics Vol. 25 no (2009).

Outline Background Joint network and expression profiles –Matisse –Cezanne 2

Background 3

DNA RNA protein transcription translation The hard disk One program Its output 4

DNA Microarrays / RNA-seq Simultaneous measurement of expression levels of all genes / transcripts. Perform measurements in one experiment Allow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade 5

The Raw Data genes experiments Entries of the Raw Data matrix: expression levels. Ratios/absolute values/… expression pattern for each gene Profile for each experiment /condition/sample/chip Needs normalization! 6

7 EXPression ANalyzer and DisplayER Clustering Identify clusters of co-expressed genes CLICK, KMeans, SOM, hierarchical A. Maron, R. Sharan Bioinformatics 03 Function. enrichment GO, TANGO Visualization Promoter analysis Analyze TF binding sites of co- regulated genes PRIMA Biclustering Identify homogeneous submatrices SAMBA A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 microRNA function inference: FAME Ulitsky et al. Nature Protocols 10

Networks of Protein-protein interactions (PPIs) Large, readily available resource Representation: Network with nodes=proteins/genes edges=interactions 8 Analysis methods: Global properties Motif content analysis Complex extraction Cross-species comparison

The hairball syndrome 9

Potential inroad into pathways and function Can the network help to improve the analysis? 10

Analysis of gene expression profiles + a network 11

12 Goal Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed “Where is the action in the network in a particular experiment?”

Ron Shamir, RNA Antalia, April 08 13

14

15 Ulitsky & Shamir BMC Systems Biology 07

Input: Expression data and a PPI network Output: a collection of modules –Connected PPI subnetworks –Correlated expression profiles Interaction High expression similarity 16 Modular Analysis for Topology of Interactions and Similarity SEts

Probabilistic model Event M ij : i,j are mates = highly co-expressed P(S ij |M ij ) ~ N(  m,  2 m ) P(S ij |  M ij ) ~ N(  n,  2 n ) H 0 : U is a set of unrelated genes H 1 : U is a module = connected subnetwork with high internal similarity R i : gene i transcriptionally regulated  m : fraction of mates out of module gene pairs that are transcriptionally regulated  m = P(M ij | R i  R j, H 1 ) p m : fraction of mates out of all gene pairs that are transcriptionally regulated 17

Probabilistic model (2) Is connected gene set U a module? Assuming pair indep: Define  m ij =  m P(R i )P(R j ) Define  n ij = p m P(R i )P(R j ). Likelihood ratio Pr(Data|H 1 )/Pr Data|H 0 ) Taking log: sum of terms  ij: 18

Probabilistic model - summary Similarities: mixture of two Gaussians For a candidate group U, the likelihood ratio of originating from a module or from the background is Module score = Gene group likelihood ratio = sum over all the gene pairs Find connected subgraphs U with high W U 19

Complexity Finding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights) Devised a heuristic for the problem 20

MATISSE workflow Seed generation Greedy optimization Significance filtering

Finding seeds Three seeding alternatives tested All alternatives build a seed and delete it from the network Building small seeds around single nodes: Best neighbors All neighbors Approximating the heaviest subgraph Delete low-degree nodes and record the heaviest subnetwork found

Greedy optimization Simultaneous optimization of all the seeds The following steps are considered: Node addition Node removal Assignment change Module merge

Front vs. Back nodes Only a fraction of the genes (front nodes) have meaningful similarity values MATISSE can link them using other genes (back nodes). Back nodes correspond to: –Unmeasured transcripts –Post-translational regulation –Partially regulated pathways 24

Advantages of MATISSE No p-vals needed for measurements Works when a fraction of the genes expression patterns are informative Can handle any similarity data No prespecified number of modules 25

Test case: Yeast osmotic shock Network: 65,990 PPIs & protein-DNA interactions among 6,246 genes Expression: 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke & Herskowitz 04) Front nodes: 2,000 genes with the highest variance 26

Pheromone response subnetwork Back Front 27

Performance comparison % of modules with category enrichment at p< % annotations enriched at p<10 -3 in modules 28

GO and promoter analysis (c) 29

Application to stem cells ~150 human stem cell lines of diverse types profiled using microarrays Clustered profiles into groups Adjusted Matisse to seek subnetworks that characteristic to each group Focused analysis on pluripotent stem cells F. Müller, L. Laurent, D. Kostka, I. Ulitsky, R. Williams, C. Lu, I. Park, M. Rao, P. Schwartz, N. Schmidt, J. Loring Nature 08 30

Pluripotent stem cells network Highlights the key protein machinery underlying pluripotency 31

Ulitsky & Shamir Bioinformatics

Accounting for PPI confidence PPI-based analysis is made difficult by abundant false positive / negative interactions Various methods can assign confidence (probability) to individual edges Idea: seek modules that are connected with high probability Ulitsky & Shamir Bioinformatics,

What is a confidently connected module? With high probability, any two parts of the module are connected by an edge ▫Accommodates both sparse and dense pathways ▫Accommodates genes with low-confidence connectivity with many module genes Confidently-connected modules can be found efficiently 34

Connected with high probability? Every two genes are connected by a confident path ▫Bias to dense pathways There is a minimum spanning tree with high-confidence edges ▫Same as ignoring low-confidence edges An edge connects any two parts of the module are connected with high probability 35

CEZANNE: (Co-Expression Zone ANalysis using NEtworks) Edge probability p(e)  Edge weight –log(1-p(e)) For any W  U, ≥1 edge connects W with U\W with probability q (e.g. 0.95)  The weight of the minimum cut of U is at least -log(1-q) Algorithm: among the subnets whose minimum cut exceeds -log(1-q) find the one with the maximum co-expression score P({A},{B,C,D})=1-0.3*0.3=0.91 P({A,C,D},{B})=0.94 P({A,B},{C,D})=0.94 P({A,B,D},{C})=0.994 minimum cut A A B B C C D D 36

How to find confidently connected modules? Seed identification: Run MATISSE ignoring edge weights, then “slice” the modules using minimum cut, until all subnetworks are “legal” Greedy optimization (how to find legal moves?): ▫Adding nodes is easy to test (positive edge weights) ▫Merging modules is easy to test ▫(Re)moving modules: requires maintaining the set of ‘crucial’ nodes in each module Solvable in minutes on real world examples 37

DNA damage response in S. cerevisiae 47 DNA Damage Response expression profiles (Gasch et al., 01) Front nodes: 2,074 genes with at least two-fold expression change Network and confidence values: purification enrichment (PE) scores (Collins et al. 07) 38

Module sizeGO biological processp-valueGO-slim protein complexesp-value 346 ribosome biogenesis and assembly1.2· ribosome5.9· translation1.0· eukaryotic 43S preinitiation complex3.8· rRNA processing7.5· small nucleolar ribonucleoprotein complex1.5· S primary transcript processing4.6· DNA-directed RNA polymerase III complex3.1· ribosome assembly4.3· exosome (RNase complex)4.4· ribosomal large subunit biogenesis9.2· DNA-directed RNA polymerase I complex5.7· rRNA modification4.4· Noc complex3.2· protein catabolism1.8· proteasome complex (sensu Eukaryota)5.7· proteolysis9.0· proteasome core complex (sensu Eukaryota)9.4· ubiquitin cycle1.1· histone acetylation3.6· histone acetyltransferase complex2.1· chromatin modification5.9· transcription from RNA polymerase II promoter1.4· translation1.1· ribosome1.4· nuclear mRNA splicing, via spliceosome3.5· spliceosome complex3.5· small nuclear ribonucleoprotein complex2.5· barbed-end actin filament capping4.8·10 -6 F-actin capping protein complex4.8·10 -6 endocytosis1.1·10 -5 cytoskeleton organization and biogenesis2.8· establishment and/or maintenance of chromatin architecture 1.1·10 -5 chromatin remodeling complex4.6· glycogen metabolism3.0·10 -8 protein phosphatase type 1 complex3.3·10 -5 sporulation (sensu Fungi)2.0· translation1.1·10 -7 ribosome4.0· tRNA processing2.5· ribonuclease P complex9.2·10 -8 rRNA processing2.2· trehalose biosynthesis 6.8· alpha,alpha-trehalose-phosphate synthase complex (UDP-forming) 6.8· ubiquitin-dependent protein catabolism5.2· pseudohyphal growth9.8·10 -7 cAMP-dependent protein kinase complex9.6· proteasome assembly3.2·10 -6 protein folding3.9·10 -6 DNA damage response modules Cytoplasmic ribosome biogenesis Proteasome Mitochondrial ribosome – small subunit Mitochondrial ribosome – large subunit Spliceosome Novel actin-localized pathway? Hsp90 PKA Trehalose biosynthesis Ribonuclease P Suggests SWS2 a novel member Novel pathway enriched with actin-localized proteins; Supported in other datasets; Similar deletion phenotypes 39

Comparison with prior work Combined measure of sensitivity (% of annotations enriched) and specificity (% of modules enriched) with p<0.001 Clustering of only expression data Clustering expression & network (Hanisch et al., 2002) Expression similarity + network connectivity Expression similarity + confident network connectivity 40

41

Summary Algorithms using co-expression + networks to detect functionally coherent modules Accommodate both sparse and dense subnetworks Subnetworks linked to osmotic shock and DNA damage A general framework for confident connectivity in PPI networks The next steps: ▫Co-expression is not the only interesting way to utilize GE data ▫Scaling to complex human datasets 42