Reconstruction of Transcriptional Regulatory Networks Adapted from Chapter 4 of “Systems Biology: Properties of Reconstructed Networks” by Bernhard O.Palsson
What is Transcriptional regulation ? Transcriptional regulatory networks (TRNs) are the on-off switches at the gene level Regulating component Changed RNA and protein output Changed cell behavior and structures Input Signals Adapted from http://genomicsgtl.energy.gov/science/generegulatorynetwork.shtml
Why do we care about Regulation? Regulation has a significant effect on cell behavior Example: E. coli – Estimated 400 regulatory genes – 178 regulatory and putative regulatory genes found in genome – 690 transcription units (contiguous genes with a common expression condition, promoter and terminator) identified in RegulonDB – Will have a major effect on model predictions of cellular behavior The basic functional block of a regulatory network is the promoter region of a gene or operon, which contains the cis-regulatory binding sites for the relevant transcription factors regulating the expression of a particular The TRN is then defined by which TFs bind to which promoters and what the integrated effect of these TFs is on the expression of genes TRNs can be decomposed into a small set of commonly occuring structural “motifs” and the behavior of prototypical examples of these motifs can be studied to gain insight into their role in the TRN To begin: why do we care about regulation? When building a mathematical model it is as important to know what one can neglect in a model as it is to know what to include. Can we neglect regulation? The answer is, perhaps under some conditions, but certainly not in all cases. The reason is that regulation has a significant effect on cell behavior. As an exa mple I have shown here a table in which E. coli proteins where distributed among 22 functional groups (from the first-draft K-12 annotation). You can see how metabolism and transport accounts for a substantial number of the known genes. Additionally, it is estimated that 400 (~10%) of the genes in E. coli have transcriptional regulatory functions; of these 178 regulatory and putative regulatory genes have been found in the genome. According to RegulonDB, a database we will discuss in more detail later, there are 690 transcription units (e.g. regulated genes or operons) which have been identified in this organism. Modeling these units will have a major effect on model predictio ns where regulatory effects have a dominant influence on metabolism. I’ll add in passing that the effect of regulation is generally much greater in eukaryotes
Hierarchy in Transcriptional Regulation genes operon Stimulon Regulon For transcriptional regulation there are several levels of abstraction to consider, and these are some words you will need to know. First of all, you have the genes. Some genes are constitutive, meaning that they are always transcribed at relatively constant levels in the cell. Other ge nes are regulated, as I mentioned, by a stimulus which is sensed, activating a regulator which either induces or represses transcription of these genes. Sometimes a group of genes which are neighbors to one another are transcribed together in what is called an operon. This is primarily found in prokaryotes, presumably because it is a “cheap” way of regulating several related genes at once. An alternative means of regulating related genes together is a regulon, where a certain regulatory protein binds to multiple locations on the DNA, causing induction or repression (or a combination of both) of related genes or operons. This is the method of choice for eukaryotes. At the highest level of abstraction is the stimulon, which includes all of the regulons which are induced by a particular stimulus. For example, let’s say that amino acids are present in the extracellular medium. This is a stimulus which is sensed by the cell, resulting in the activation or inactivation of certain proteins in the cell. These proteins in turn have an effect on transcription of various genes and operons, resulting in a new set of available metabolic or other proteins in the cell. End result: an altered behavior, most likely in this case that amino acids are no longer synthesized in the cell de novo, but rather taken up.
The lac Operon lac repressor Carbon Source Operation status OFF Glucose only CAP RNA polymerase ON mRNA Lactose only OFF Neither OFF Glucose and Lactose CAP site Promoter Operator Structural genes for lactose -metobolizing enzymes
The GAL Regulon Tup Mig 1 Carbon Source Operation status OFF (basal) Neither Glucose nor Galactose Gal4 Gal80 RNA polymerase ON mRNA Galactose only OFF Glucose and Galactose Mig 1 Tup UASG Mig1 site GAL1 gene required for galactose metabolism
Fundamental data types for TRNs Component data Binding sites, transcription factor (TF) molecules etc. Interaction data Links are formed by chemical interactions DNA-protein,protein-protein,metabolite-RNA Positive and negative controls Network state data Reconstructed networks have functional states Controls for network states assessed by perturbation experiments Genetic/environmental/systemic
Regulatory vs Metabolic Circuits Regulatory circuits are poorly characterized Less-well understood Qualitative statements vs. “hard” stoichiometry Not mechanistically conserved across different organisms Regulatory circuits are more complex Multiple effects/transcription factor (TF) Multiple regulators/gene How are regulatory circuits different than metabolic circuits? First, regulatory circuits are not nearly as well characterized as metabolic circuits. They are less-well understood and thus far have been characterized more qualitatively in terms of function (e.g. “protein X has an inhibitory effect on gene abcD ”) in comparison to metabolic reactions which have quantitative stoichiometric statements like “Glucose + ATP à Glucose-6-Phosphate + ADP”, which provides an unambiguous description of the glucokinase catalytic function. Also, the actual mechanism of many of these circuits is different in different organisms, and therefore it is more difficult to infer regulatio n from the genome of an organism as can be done for metabolic genes. Second, regulatory circuits are more complex than metabolic networks, as illustrated in this slide of a sea urchin regulatory network. Each transcription factor can have multiple effects – as described in a regulon – and each gene may have multiple regulators. All of these factors add to the difficulty of reconstructing regulatory networks.
Key Considerations for TRN Reconstruction How to represent regulatory information? – Is transcription regulation Boolean (switch-like) or continuous? – Should transcription be thought of as a stochastic or deterministic process? What constitutes significant regulation? – Many extracellular signals can affect expression level of a gene. – Which signals are actually physiologically significant? Problems with experimental data in the literature: – Experiments done under different conditions (e.g. strain background) – Typically experimentalists concentrate on studying well-known TF/target pairs in great detail In vivo vs in vitro Because reconstructing regulatory networks in the genome (or in fact any) scale is such a young field there are still many unresolved issues. A major issue is how regulatory information should be represented – this is especially relevant with regards to what type of models are built based on the regulatory information. In these slides as well as in most of the genetics literature the assumption is usually made that a gene is either transcribed or not, i.e. regulation is assumed to be switch like (or Boolean). However, it is well known by biochemists (but maybe not by geneticists?) that regulatory processes are in essence no different from other biochemical processes and that different magnitudes of incoming signals can cause different levels of transcriptional activity. This issue is illuminated in the following paper: Biggar SR, Crabtree GR Cell signaling can direct either binary or graded transcriptional responses. EMBO J. 2001 Jun 15;20(12):3167-76. Another issue relevant to regulatory reconstruction is what is considered to be a significant regulatory interaction. In many cases multiple signals can regulate the transcription of one gene, but some of these signals only play a modulatory role. These signal are not capable of changing transcriptional activity on their own and only act in the presence of other stronger signals. A third problem is how experimental data in the literature should be interpreted. For example studies on transcriptional regulation in vitro may not be very useful, because they do not account for other regulatory signals present in vivo.
Bottom-up Reconstruction Pool genomic, biochemical and physiological data, inferring functions where necessary. include regulatory rules Represent rules using Boolean logic, kinetic theory and the like. Analyze separately or together with metabolic network as a metabolic/regulatory model. Use model to make predictions about the behavior and emergent properties of the system predictions should be seen as hypotheses which must be tested experimentally. However, it is possible to reconstruct regulatory networks, given the information we have, and the process is similar in concept to the process of metabolic network reconstruction. Again, we need to draw together the genomic, biochemical and physiological data, inferring functions where necessary. However, in this case (for bottom- up reconstruction) we will rely mostly on biochemically characterized regulatory proteins and their corresponding genes. Rather than including metabolic reactions, we will include regulatory rules, for example “gene abcD is transcribed if regulatory protein ProT is active” and “regulatory protein Prot is active if there is oxygen in the extracellular environment”. These rules can be represented using Boolean logic, kinetic theory and the like. The metabolic network may be constructed as usual and now the two networks may be analyzed separately or together with analytical methods as a metabolic/regulatory model. Once again, such a model will make predictions about the behavior and emergent properties of the system which should be seen as hypotheses which must be tested experimentally.
Top-down Reconstruction Problems with bottom-up reconstruction: – Many (most?) TF targets are not characterized – Tedious process, because informative databases are rare Alternative approach: Utilize data from well-designed high-throughput experiments to reverse-engineer (or “back-calculate”) regulatory circuits – Gene expression profiles for wild type and deletion strains under appropriate conditions (genetic perturbation) – Promoter sequence data and possibly consensus binding sites for TFs – Location analysis (ChIP-Chip) data on transcription factor binding sites In addition to the issues discussed in the previous slide that are common to any approach to reconstructing and modeling regulatory networks there are a couple of specific problems with the bottom- up (literature-based) approach. The first is due to the fact that up to very recent years targets for transcription factors were usually identified one gene at a time. While this ensures very low false positive rates, it also means that most relevant targets for many transcription factors have not been identified. The second problem is the general lack of structured databases on transcriptional regulation, which makes the actual reconstruction process exceedingly time-consuming. An alternative approach to the bottom- up approach has emerged in the last few years primarily due to the availability of large-scale gene expression profiling data. This approach, which we call top-down reconstruction, is based on the idea that since we know the “outputs” of the complex regulatory circuit in the form of gene expression profiles, we should be able to reconstruct the underlying circuit solely from this data. Although gene expression data clearly is useful for this task it is not the only type of data that could potentially be utilized. Additional useful data sets are location analysis (or ChIP-Chip) data, which describes genome-wide binding sites of TFs, and promoter sequence data (the region upstream of the transcription start site), whic h can be used to computationally identify binding sites for transcription factors.
Issues with Top-down Reconstruction Very complex models and algorithms are required to reverse engineer regulatory circuits Computational issues: Explosion in the number of structures Model complexity issues: Explosion in the number of parameters Optimality issues: Only locally optimal circuits can be found Data is not usually available in sufficient quantities or with appropriate quality – computational and experimental people usually don’t work together Currently these methods are primarily used to create hypotheses about potential targets of TFs In addition to the issues mentioned in conjunction with the bottom-down reconstruction approach, the top-down approach suffers from its own weaknesses. The major problem is that since the underlying regulatory circuit is potentially very complex, the types of models and algorithms required for top-down reconstruction also tend to be very complex (in fact these are probably some of the most complex statistical models ever constructed). While this complexity is not a problem as such it results in a few practical problems in fitting the model that are detailed in the slide. A central problem is the explosion in the number of different alternative models (both structure and parameters) to be considered, which requires both large amounts of sufficiently high-quality experimental data and efficient methods for learning models from data. Recently, however, some of the best statistical modeling people have started collaborating with some of the best biology groups that are capable of generating large quantities of high-quality data. This should lead in rapid improvement in both the methods for top-down reconstruction and in new biological knowledge from the reconstruction efforts.
Combining knowledge-based and data-based regulatory network reconstruction strategies.a Regulatory networks can be reconstructed by collecting individual regulatory interactions from relevant databases and the primary literature (knowledge). Alternatively, networks can be derived directly from high-throughput experimental data and promoter sequence analysis through various data-mining methods. a Herrgård M.J. et al , Current Opinion in Biotechnology,2004,15:70-77
Graphical Representation of Boolean TRNs in E. coli. Transcription factors (104) Metabolic genes (479) Stimuli (102) TFs regulating gene expression Stimuli affecting TF activity
Summary Transcriptional regulatory networks determine the expression state of a genome These networks are presently incompletely defined Approaches to regulatory reconstruction are still being developed (especially top-down) Models of TRNs will help unravel the “logic” of gene circuits
Thank you !