Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory Network Models that Represent Regulator States and Roles. To appear in Lecture Notes in Bioinformatics.
Task Given: –Gene expression data –Other sources of data e.g. sequence data, transcription factor binding sites, transcription unit predictions Do: –Construct a model that captures regulatory interactions in a cell
Effector Key Ideas: States and Roles Cellular Condition Regulator Expression Regulatee Expression Regulatee Expression Regulator State Regulator states –Cannot be observed –Depend on more than regulator expression –We use cellular conditions as surrogates/predictors of regulation effectors Regulator roles –Is a regulator an activator or a repressor? –We use sequence analysis to predict these roles
Network Variables and Structure Hidden Regulator States: “activated” or “inactivated” Cellular Conditions: “stationary growth phase”, “heat shock”,... Regulatees: expression states represented as a mixture of Gaussians Regulators: expression states represented as a mixture of Gaussians Connect where we have evidence of regulation Select relevant parents
Network Parameters: Hidden Nodes use CPD-Trees Growth Medium Heat Shock metJ state Growth Phase = Log Phase Growth Phase Growth Phase metJ Parents selected from regulator expression, cellular conditions May contain context-sensitive independence metJ = Low expressionmetJ ≠ Low expression Growth Phase ≠ Log P(metJ state = activated): P(metJ state = activated): 0.994P(metJ state = activated): 0.004
Initializing Roles metA transcription unit Transcription Start Site* -35 UpstreamDownstream DNA metR state metJ state metA metJ state P(Low) P(High) activated activated activated inactivated inactivated activated Inactivated inactivated metR state CPT for regulatee metA Binding sites (metR binds upstream; considered an activator) (metJ binds downstream; considered a repressor) *Predicted transcription start sites from Bockhorst et. al., ISMB ‘03
Training the Model Initialize the parameters –Activators tend to bind more upstream than repressors Use an EM algorithm to set parameters –E-Step: Determine expected states of regulators –M-Step: Update CPDs Repeat until convergence
Experimental Data and Procedure Expression measurements from Affymetrix microarrays (Fred Blattner’s lab, University of Wisconsin-Madison) Regulator binding site predictions from TRANSFAC, EcoCyc, cross-species comparison (McCue, et. al., Genome Research 12, 2002) Experimental data consists of: –90 Experiments –6 Cellular condition variables (between two and seven values) –296 regulatees –64 regulators Cross-fold validation –Microarrays held aside for testing –Conditions from test microarrays do not appear in training set
Log Likelihood Average Squared Error Classification Error Model -12, % Our Model (3 iterations of adding missing TFs) -12, % Baseline #2 (No hidden nodes, using cellular conditions) -13, % Baseline #1 (No hidden nodes, no cellular conditions) -11, % Random Initialization (3 iterations of adding missing TFs)