Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001 Computational Biology Lecture #11: Inferring Regulatory Networks from Gene Expression Data Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001 9/16/2018 ©Bud Mishra, 2001
Regulatory Networks ©Bud Mishra, 2001 All cells in an organism have the same genomic data, but the proteins synthesized in each vary according to cell type, time and environmental factors There are network of interactions among various biochemical entities in a cell (DNA RNA, protein, small moleules) Can we infer the networks of interactions among genes? 9/16/2018 ©Bud Mishra, 2001
Gene Regulation ©Bud Mishra, 2001 DNA Transport to cytosol transcription mRNA Nonphosphorylated protein Transport to nucleus Nonphosphorylated protein Post-translational modifications Nonphosphorylated protein 9/16/2018 ©Bud Mishra, 2001
Regulatory Networks ©Bud Mishra, 2001 There are lots of regulatory interactions that occur after transcription. But we will focus on transcriptional regulation: It plays a major role in the regulation of protein synthesis We can measure mRNA levels relatively easily 9/16/2018 ©Bud Mishra, 2001
Transcriptional Regulation: Example: The lac Operon Regions coding for proteins Regulatory Regions Diffusable regulatory proteins RNA polymerase P O lacZ lacI lacY lacA I Z Y A mRNA + ribosomes 9/16/2018 ©Bud Mishra, 2001
Transcriptional Regulation: Example: The lac Operon Binds but cannot move to transcribe Regions coding for proteins Regulatory Regions Diffusable regulatory proteins RNA polymerase I lacI P P O lacZ lacY lacA mRNA + ribosomes No mRNA I When lactose is absent, the protein encoded by lacI represses transcription of the lac operon 9/16/2018 ©Bud Mishra, 2001
Transcriptional Regulation: Example: The lac Operon Regions coding for proteins Regulatory Regions Diffusable regulatory proteins RNA polymerase P O lacZ lacI lacY lacA I Z Y A mRNA + ribosomes Lactose Confirmational change Blocked 9/16/2018 ©Bud Mishra, 2001
Inferring Regulatory Network Given: Temporal expression data for a set of genes Infer: The network of regulatory relationship among the genes 9/16/2018 ©Bud Mishra, 2001
Regulatory Network Models Boolean Networks Kaufmann ’93, Liang, Fuhrman & Somogyi ’98 Differential Equations Chen, He & Church ’99 Bayesian Networks Friedman et al. ’99 Weight Matrices Weaver, Workman & Stormo ‘99 9/16/2018 ©Bud Mishra, 2001
Inferring Regulatory Networks with Weight Matrices Overview: Assume discrete time steps u(t) is a vector representing the expression level of n genes at time t Build a model for predicting u(t+1) given u(0), u(1),…, u(t) 9/16/2018 ©Bud Mishra, 2001
Overview of the model ©Bud Mishra, 2001 u(t) Input expression levels at time t r(t) Determine net regulation of each gene at time t x(t) Determine response of each gene at time t u(t+1) predict input expression levels at time t+1 u(t) r(t) x(t) u(t+1) 9/16/2018 ©Bud Mishra, 2001
Determining the Net Regulation of Each Gene Model regulative interactions among genes with a weight matrix ri(t) = åj wij uj(t) ri(t) = Regulatory input to i wij = Regulatory influence of j on i uj(t) = Expression level of j 9/16/2018 ©Bud Mishra, 2001
Determining the Response of Each Gene r(t) x(t+1) The regulatory input to each gene determines its response through a sigmoid-like (“squashing”) function. xi(t+1) = [1+ exp(-ri(t) – bi)]-1 9/16/2018 ©Bud Mishra, 2001
Determining the Response of Each Gene The bi parameter represents the predisposition of the gene in the absence of any regulative input (its basal rate) We can represent it as just another weight connected to a “gene” that is always completely on. xi(t+1) = [1 +exp{ -(åj wij uj(t) + bi)}]-1 9/16/2018 ©Bud Mishra, 2001
Predicting the Expression Level of Each Gene at Time t+1 The response of each gene is a value in [0,1]. Convert this relative level into a real unit of expression Allow different levels of maximal expression for each gene 9/16/2018 ©Bud Mishra, 2001
Predicting the Expression Level of Each Gene at Time t+1 ui(t+1) = mi xi(t+1) ui = Expression level of i mi = Maximal expression level for i xi = Response of i 9/16/2018 ©Bud Mishra, 2001
Maximal expression level for i Expression level of gene i at time t+1 Putting it Together ui(t+1) = mi/ [1 +exp{ -(åj wij uj(t) + bi)}] Maximal expression level for i Regulatory input to i Expression level of gene i at time t+1 9/16/2018 ©Bud Mishra, 2001
Including Environmental Variables One can represent environmental variables (e.g., the concentration of lactose) as follows: Extend input vector to include n genes and p environmental variables Extend weight matrix so that each gene is connected to p environmental variables 9/16/2018 ©Bud Mishra, 2001
Learning the Parameters of the Model Given A time series of expression measurements u(0), …, u(t), u(t+1): Pairs h u(t), u(t+1) i Find The wij parameters so that the data are closely modeled. This model can be solved with “back-propagation” algorithm as in a feed-forward neural network 9/16/2018 ©Bud Mishra, 2001
Learning the Parameters: Linear Algebra Approach Weaver et al: Example of a linear algebraic approach The model for each gene is independent So one can determine the best weights for gene i, Then the best weight for gene j etc… Set up a linear problem or determining the weights for each gene i 9/16/2018 ©Bud Mishra, 2001
Overview of the model ©Bud Mishra, 2001 u(t) Input expression levels at time t r(t) Determine net regulation of each gene at time t x(t) Determine response of each gene at time t u(t+1) predict input expression levels at time t+1 u(t) r(t) x(t) u(t+1) 9/16/2018 ©Bud Mishra, 2001
[ ][ ]=[ ] Linear Algebra ©Bud Mishra, 2001 Learning the parameters: Alternatively: U wi = ri ) wi = U-1ri Use singular value decomposition to calculate the inverse of U. u1(0) L un(0) wi1 ri(0) M O M M M u1(t) L un(t) win ri(t) [ ][ ]=[ ] 9/16/2018 ©Bud Mishra, 2001
Experimental Methodology Generate random weight matrix models Use model to generate data h u(t), u(t+1) i pairs See how well the method recovers the “correct” model 9/16/2018 ©Bud Mishra, 2001
Experimental Methodology Generate random regulatory networks # Genes (n) ranged from 10 to 200 Each had a set maximal expression level Several parameters to control the distribution of weights Average % of non-zero weights in a row Max and min for absolute value of weights Normally distributed noise is introduced into inputs. 9/16/2018 ©Bud Mishra, 2001
Experimental Methodology Evaluated method according to how well it identified non-zero weights (I.e., correctly identified gene interactions) Specifically, consider: Sensitivity = TP/(TP+FP) TP= True Positive =#correctly predicted non zero weights FP=False Positives =#incorrectly predicted non zero weights 9/16/2018 ©Bud Mishra, 2001
Results ©Bud Mishra, 2001 More training data ) More accurate models Sparse Networks ) More accurate models False positive (non-zero) weights about 10 times smaller than true positive… Sensitivity > 90% 9/16/2018 ©Bud Mishra, 2001
Limitations of Approach Assumption that all gene interactions are independent of one another Assumption about regular discrete time evolution Assumption that a gene’s maximal expression level is known or can be estimated The model accounts only for transcriptional regulation 9/16/2018 ©Bud Mishra, 2001
Bayesian Networks ©Bud Mishra, 2001 Friedman, Linial, Nachman & Pe’er ‘2000 Learned Bayesian network models from Stanford yeast cell-cycle data 76 measurements of 6177 genes Focused on 800 genes whose expression varied over the cell-cycle stages 9/16/2018 ©Bud Mishra, 2001
Bayesian Networks E A D B C ©Bud Mishra, 2001 Edges represent dependencies Nodes represent gene activities E A E A Pr[B|E,A] Pr[: B|E,A] 0 0 0..3 0.7 0 1 0.4 0.6 1 0 0.7 0.3 1 1 0.1 0.9 D B C 9/16/2018 ©Bud Mishra, 2001
Representing Partial Models Since there is little data and many variables, focus on finding “features” common to lots of models that could explain the data Markov relations: Is Y in the Markov blanket of X? X, given its Markov blanket is independent of other variables in network Order relations: Is X an ancestor of Y? 9/16/2018 ©Bud Mishra, 2001
Estimating Confidence in Features Bootstrap Method: For I = 1 to m Sample (with replacement) expression experiments Learn a Bayesian network from this sample The confidence in a feature is the fraction of the m models in which it was represented… 9/16/2018 ©Bud Mishra, 2001
Biological Analysis ©Bud Mishra, 2001 Using confidence in order relations, the approach identified “dominant genes” Several of these are known to be involved in cell-cycle control Several have non-viable null mutants Many encode proteins involved in replication, sporulation, budding Assessing confident Markov relations Most pairs are functionally related 9/16/2018 ©Bud Mishra, 2001