Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University
Goal: Reconstruct Cellular Networks Biocarta. u Structure u Function u Dynamics Conditions Genes Common approach: Interaction Networks Different semantics for networks u Boolean, probabilistic, differential equations, …
A Major Assumption… mRNA tr. rate protein active protein mRNA mRNA degradation TF G G G Activation signal Hidden mRNA Observed
Realistic Regulation Modeling u Model the closest connection u Active protein levels are not measured u Transcript rates are computed from expression data and mRNA decay rates u Realistic biochemical model of transcription rates TF G G Hidden Observed proteinmRNA mRNA tr. rate active protein Activation signal mRNA degradation Hidden Observed
OnOff Modeling Transcription Rate Simplest case: one activator G TF mRNA transcripts G TF On [McAdams & Arkin, 1997; Ronen et al, 2002] P( ) Avg rate = + P( )
Modeling Transcription Rate Steady state equations: G TF Concentration of free promoters Concentration of bound promoters Concentration of TF dd bb
Modeling Transcription Rate G TF dd bb = 1 = 4 = 20 = 250 TF activity Transcription rate TF activity Time = 1 = 4 = 20 = 250 Trans rate Time
[Buchler et al, 2003; Setty et al, 2002] General Two Regulator Function TF 2 TF 1 G P(State) a b c d 11 X 22 G TF Similar models for other modes of binding: u Competitive binding u Cooperative binding
P(State) General Two Regulator Function TF 2 TF 1 G b = 0 a = 0 c = 0 d =1 b = 1 c = 1 bb aa cc dd X X X X = Average Rate Rate “AND” gate “OR” gate a b c d [Buchler et al, 2003; Setty et al, 2002] Avg rate = function of TF concentrations Few parameters: Affinity parameters Rate parameters
Models of Regulatory Networks Regulators (activity) Target Genes (trans. rate) G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF 3 G5G5 G6G6 G7G7 Noise Observed rates ? Predicted rates TF activity Time Trans rate Time
Learning Learning From Data Transcription rates Expression data mRNA decay rates Kinetic parameters G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF 2 + Gradient ascent
TF 1 TF 2 G4G4 TF 1 G3G3 G2G2 G1G1 Learning Cell Cycle Experiment Transcription rates Expression data mRNA decay rates Kinetic parameters + Biological Databases [YPD] ChIP location [Lee et. al] 7 regulators & 141 target genes Cell cycle gene expression [Spellman et. al] + mRNA decay rates [Wang et al] Transcription rates
M/G1 G1 S S/G2 G2/M predictionsinput parameters Cell Cycle Experiment 17x141 = 2397 Data points 466 parameters 17x7 = 119 Regulator activity values
G1G2G1G2 FKH1 FKH2 G1G2G1G2 SWI5 ACE2 Regulator Activity Profiles u When are they active? Known biology: u SWI4 & MBP1: mid-late G1 u FHK1: S/G2 u FKH2: G2/M u SWI5: M/G1 G1G2G1G2 MBP1 SWI4 Reconstructed activity profiles match direct experimental knowledge
Regulator Activity Profiles u When are they active? u Could we reconstruct these from mRNA profiles? Known biology: u SWI5 is transcriptionally regulated u MCM1 is not Regulator’s own mRNA is not sufficient to reconstruct activity levels mRNA profile SWI5 Activity mRNA MCM1 Activity mRNA
Regulator Activity Profiles u When are they active? u Could we reconstruct these from mRNA profiles? u Could we reconstruct these from target’s transcription rate? Avg target rate
M/G1 G1 S S/G2 G2/M input predictions Cell Cycle Experiment How well are we doing? residue
Model Learning ab initio Learning Transcription rates Learning Expression data mRNA decay rates Kinetic parameters G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF 2 + Big assumption: u Network topology is given u Unrealistic, even for well understood systems + Challenge: Reconstruct network topology? Number of regulators Their joint effect on target genes
How Do We Learn Structure? Standard approach: hill climbing search G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 G4G4 TF 2 TF 1 G3G3 G2G2 G1G G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF 3 Problem: Scoring structures is costly Requires non-linear parameter optimization Impractical on real data
Pred(G|TF,Y) Ideal regulator Time Pred(G|TF) TF G Y Step 1: Compute optimal hypothetical regulator Time regulators Step 2: Search for “similar” regulator TF 1 TF 2 TF 3 TF 4 Activity level Target Profile Ideal Regulator Method Goal: Consider adding edges Idea: Score only promising candidates
Parent(s) activity Predicted(G|TF,TF 2 ) Time regulators TF 1 TF 2 TF 3 TF 4 Step 3: Add new parent and optimize parameters Time Step 1: Compute optimal hypothetical regulator Step 2: Search for “similar” regulator Pred(G|TF,Y) Ideal regulator Y Target Profile TF G TF 2 Crucial point: Choice of similarity measure u Principled approach see [Nachman et al UAI04] u Provides approximation to Δlikelihood Ideal Regulator Method Goal: Consider adding edges Idea: Score only promising candidates
New regulator: “centroid” of selected ideal regulators Adding New Regulator Ideal regulators Idea: Introduce hidden regulator for genes with similar ideal regulator TF new G1G1 G2G2 G4G4 G1G1 G2G2 G3G3 G4G4 G5G5 Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5 Time
M/G1 G1 S S/G2 G2/M Input rates Curated prior knowledge 466 params ab initio from scratch 461 params Ab initio Structure Learning
Input rates Curated prior knowledge 466 params ab initio from scratch 461 params M/G1 G1 S S/G2 G2/M Ab initio Structure Learning
H2 SWI5 H4 SWI4 Significant target overlap & correlated activity Significant target overlap & weak correlation H1 MBP1 H3 FKH2 curated ab initio target genes regulators Regulators: ab initio vs. curated H1 H2H4H3H5H6 H7 SWI4MBP1ACE2FKH1SWI5MCM1FKH2
curated ab initio target genes regulators u Significant agreement with “known” topology Both in structure & dynamics u Improved predictions Regulators: ab initio vs. curated SWI4MBP1ACE2FKH1SWI5MCM1FKH2 H1 H2H4H3H5H6 H7
Model Learning Conclusions Kinetic parameters G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF Transcription rates Network (prior knowledge) G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 u Realistic model, based on first principles u Learning procedure Reconstruct unobserved activity profiles Reconstruct network topology u Insights into Structure & Dynamics Function
Future Directions u Prior knowledge u ChIP location u Cis-regulatory elements External perturbations Internal feedback G4G4 TF 2 TF 1 G3G3 G2G2 G1G1 TF 3 G5G5 G6G6 G7G7