Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Regulation and Control of Metabolism in Bacteria
Warm up Mon 11/3/14 Adv Bio 1. What does the phrase “gene regulation” mean? 2. If the lac operon cannot bind to the repressor.. What would be the outcome?
From computer to the wet lab and back Comparative genomics in the discovery of a family of bacterial transporters with a new mode of action Mikhail Gelfand.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Gene regulation. Gene expression models  Prokaryotes and Eukaryotes employ common and different methods of gene regulation  Prokaryotic models 1. Trp.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
1 Molecular genetics of bacteria Emphasis: ways that bacteria differ from eukaryotes DNA structure and function; definitions. DNA replication Transcription.
CHAPTER 8 Metabolic Respiration Overview of Regulation Most genes encode proteins, and most proteins are enzymes. The expression of such a gene can be.
Negative regulatory proteins bind to operator sequences in the DNA and prevent or weaken RNA polymerase binding.
12-5 Gene Regulation.
E.coli aerobic/anaerobic switch study Chao Wang, Mar
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Evolution of minimal metabolic networks WANG Chao April 11, 2006.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Tasha A. Desai, Dmitry A. Rodionov, Mikhail S. Gelfand, Eric J. Alm, and Christopher V. Rao 1 Alvin Chen April 14, 2010.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Draw 8 boxes on your paper
Laboratory of mathematical methods and models in bioinformatics Institute for Information Transmission Problems, Russian Academy of Sciences
Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis Mikhail Gelfand Institute for Information.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Evolution of bacterial regulatory systems Mikhail Gelfand Institute for Information Transmission Problems, RAS BGRS-2004, Novosibirsk.
Microbial Models I: Genetics of Viruses and Bacteria 7 November, 2005 Text Chapter 18.
Identification of specificity-determining positions in protein alignments Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information.
Adaptive evolution in prokaryotic transcriptional regulatory networks M. Madan Babu, PhD NCBI, NLM National Institutes of Health.
Molecular biology in silico Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS AlBio06,
Life without Fur.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Evolution of regulatory interactions in bacteria Mikhail Gelfand Institute for Information Transmission Problems, RAS 4th Bertinoro Computational Biology.
Subsystem: Succinate dehydrogenase The super-macromolecular respiratory complex II (succinate:quinone oxidoreductase) couples the oxidation of succinate.
Life without Fur. Mikhail Gelfand Research and Training Center of Bioinformatics, Institute for Information Transmission Problems, RAS Genome Dynamics:
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
GENE EXPRESSION.
Structure and evolution of prokaryotic transcriptional regulatory networks Group Leader MRC Laboratory of Molecular Biology Cambridge M. Madan Babu.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard.
Anis Karimpour-Fard 1, Corrella Detweiler 2, Ryan T. Gill 3, and Lawrence Hunter 1 1 University of Colorado School of Medicine 2 MCD-Biology, University.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Introduction to biological molecular networks
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Last Class 1. Transcription 2. RNA Modification and Splicing
SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 Computational functional genomics Lital Haham Sivan Pearl.
Figure S1 Figure S1. Phylogenetic tree of LexA binding sites in cyanobacteria, B.subtilis,  - proteobacteria and E.coli. Binding sites of cyanobacteria.
Ribonucleotide reductases (RNRs) catalyse the reduction of ribonucleotides to their corresponding 2`-deoxyribonucleotides and therefore play an essential.
Transcription factor binding motifs (part II) 10/22/07.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Intersubunit contacts are often facilitated by specificity-determining positions Computational identification of protein positions that possibly account.
Regulation of Gene Expression
System Structures Identification
Summary and Recommendations
Mikhail Gelfand Research and Training Center of Bioinformatics
Protein Occupancy Landscape of a Bacterial Genome
Genome-wide Reconstruction of OxyR and SoxRS Transcriptional Regulatory Networks under Oxidative Stress in Escherichia coli K-12 MG1655  Sang Woo Seo,
From Mendel to Genomics
Volume 8, Issue 7, Pages (July 2015)
Summary and Recommendations
DNA AND RNA 12-5 Gene Regulation.
Presentation transcript:

Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS Moscow, Russia Singapore, July 2006

Comparative genomics of regulation Why –Functional annotation of genes –Metabolic modeling –Practical applications in genetic engineering, drug targeting etc. How –Close genomes: phylogenetic footprinting. Regulatory sites are seen as conservation islands in alignments of gene upstream regions –Distant genomes: consistency filtering. Candidate sites in one genome may be unreliable, but independent occurrence upstream of orthologous genes in many genomes yields reliable predictions Caveats –Presense of (predicted) binding sites does not immediately imply functional regulation –Operon structure –Need to verify presence of orthologous transcription factors in the studied genomes –Orthologous factors may have different binding motifs –One functional system may be regulated by different factors within and between genomes Many genomes –Taxon-specific regulation –Evolution individual sites transcription-fator families transcription factors and their binding motifs simple and complex regulatory systems

How it works: Two simple examples Biotin regulator of alpha-proteobacteria Universal regulator of ribonucleotide reductases: reconstruction of the regulatory system and the mechanism of regulation

BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing Profile 2: Gram-negative bacteriaProfile 1: Gram-positive bacteria, Archaea

BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing Profile 2: Gram-negative bacteriaProfile 1: Gram-positive bacteria, Archaea BirA of alpha-proteobacteria: no DNA-binding domain

Identification of the candidate regulator (BioR) in alpha- proteobacteria 1.Candidate binding sites: similar palindromes upstream of biotin biosynthesis and transport genes in different genomes TTATAGATAA TTATCTATAA TTATAGATAg TTATCTATAA TTATAGATAg TTATCTATAA TcATATATtA TcATAGATAg TTATCTATAA TTATCTATtA TTATCTAcAA TTATCTATAA TcATAGATtA cTATAGATAA TTATCTAcAA

1. 2.Positional clustering: candidate transcription factor from the GntR family is often found in the same loci (black arrows) 3.Phyletic patterns: phyletic distribution of candidate sites (red cirsles) exactly coincides with the phyletic distribution of the candidate regulator 4.Autoregulation: in many cases there are candidate sites upstream of the bioR gene itself

Conserved signal upstream of nrd genes

Identification of the candidate regulator by the analysis of phyletic patterns COG1327: the only COG with exactly the same phylogenetic pattern as the signal –“large scale” on the level of major taxa –“small scale” within major taxa: absent in small parasites among alpha- and gamma- proteobacteria absent in Desulfovibrio spp. among delta-proteobacteria absent in Nostoc sp. among cyanobacteria absent in Oenococcus and Leuconostoc among Firmicutes present only in Treponema denticola among four spirochetes

COG1327 “Predicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway?

Additional evidence – 1 nrdR is sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA

Additional evidence – 2 In some genomes, candidate NrdR-binding sites are found upstream of other replication- related genes –dNTP salvage –topoisomerase I, replication initiator dnaA, chromosome partitioning, DNA helicase II

Multiple sites (nrd genes): FNR, DnaA, NrdR

Mode of regulation Repressor (overlaps with promoters) Co-operative binding: –most sites occur in tandem (> 90% cases) –the distance between the copies (centers of palindromes) equals an integer number of DNA turns: mainly (94%) bp, in 84% bp – 3 turns 21 bp (2 turns) in Vibrio spp bp (4 turns) in some Firmicutes experimental confirmation in Streptomyces (Borovok et al., 2004)

Evolutionary processes that shape regulatory systems Expansion and contraction of regulons Duplications of regulators with or without regulated loci Loss of regulators with or without regulated loci Re-assortment of regulators and structural genes … especially in complex systems Horizontal transfer

Loss of regulators, and cryptic sites Loss of the RbsR in Y. pestis (ABC-transporter also is lost) Start codon of rbsD RbsR binding site

Regulon expansion: how FruR has become CRA icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria

Common ancestor of Enterobacteriales icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria Enterobacteriales

Common ancestor of Escherichia and Salmonella icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria Enterobacteriales E. coli and Salmonella spp.

Trehalose/maltose catabolism, alpha-proteobacteria Duplicated LacI-family regulators: lineage-specific post-duplication loss

The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?)

Utilization of an unknown galactoside, gamma-proteobacteria Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup) Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X Erwinia: one regulon, GalR

Utilization of maltose/maltodextrin, Firmicutes Two different ABC transporters (shades of red) PTS (pink) Glucoside hydrolases (shades of green) Two regulators (black and grey)

Modularity of the functional subsystem Two different ABC systems Three hydrolases in one operon (E. faecalis) or separately

Changes of regulation Displacement: invasion of a regulator from a different subfamily (horizontal transfer from a related species?) – blue sites

Orthologous TFs with completely different regulons (alpha-proteobaceria and Xanthomonadales)

Catabolism of gluconate, proteobacteria

extreme variability of regulation of “marginal” regulon members γ Pseudomonas spp. β

Combined regulatory network for iron homeostasis genes in  -proteobacteria [- Fe] [+Fe] [- Fe] [+Fe] [ Fe]- FeS FeS status of cell The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.

Rhizobiales Bradyrhizobiaceae Rhizobiaceae Rhodo- bacterales Hyphomonadaceae Rhodo- bacteraceae Rickettsiales Rhodo- spirillales Sphingomo- nadales - p r o t e o b a c t e r i a Organism Irr MntR Sinorhizobium meliloti Rhizobium leguminosarum Rhizobium etli Agrobacterium tumefaciens Mesorhizobium loti Mesorhizobiumsp. BNC1 Brucella melitensis Bartonellaquintana andspp. Bradyrhizobium japonicum Rhodopseudomonas palustris Nitrobacter hamburgensis Nitrobacter winogradskyi Rhodobacter capsulatus Rhodobacter sphaeroides Silicibacter sp. TM1040 Silicibacter pomeroyi Jannaschia sp.CC51 Rhodobacterales bacterium HTCC2654 Roseobactersp. MED193 Roseovarius nubinhibens ISM Roseovarius sp.217 Loktanella vestfoldensis SKA53 Sulfitobacter sp. EE-36 Oceanicola batsensis HTCC2597 Oceanicaulis alexandrii HTCC2633 Caulobacter crescentus Parvularcula bermudensis HTCC2503 Erythrobacter litoralis Novosphingobium aromaticivorans Sphinopyxis alaskensisgRB2256 Zymomonas mobilis Gluconobacter oxydans Rhodospirillum rubrum Magnetospirillum magneticum Pelagibacter ubique HTCC1002 SM + MUR / FUR RirAIscR RL RHE AGR ML MBNC BME BQ BJ RPA Nham Nwi RC Rsph STM SPO Jann RB2654 MED193 ISM ROS217 SKA53 EE36 OB2597 OA2633 CC PB2503 ELI Saro Sala ZM GOX Rrub Amb Abb. PU #? Group Caulobacterales Parvularculales Rickettsia and Ehrlichia species SAR11 cluster A. B. C. D. Fe and Mn regulons Distribution of Irr, Fur/Mur, MntR, RirA, and IscR regulons in α-proteobacteria #?' in RirA column denotes the absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes.

Phylogenetic tree of the Fur family of transcription factors in  -proteobacteria - I Fur in  - and  - proteobacteria Fur in  - proteobacteria Fur in Firmicutes in  proteobacteria Fur MBNC RB AGR C 620 RL mur Nwi 0013 RPA0450 BJ fur ROS Jann 1799 SPO2477 STM1w MED OB SKA Rsph ISM GOX0771 ZM01411 Saro Sala 1452 ELI1325 OA PB CC0057 Rrub Amb1009 Amb4460 SM mur MBNC BQ fur2 BMEI0375 Mesorhizobium sp. BNC1(I) Sinorhizobium meliloti Bartonella quintana Rhodopseudomonas palustris Bradyrhizobium japonicum Caulobacter crescentus Zmomonas mobilisy Rhodobacter sphaeroides Silicibacter sp. TM1040 Silicibacter pomeroyi Agrobacterium tumefaciens Rhizobium leguminosarum Brucella melitensis Mesorhizobium sp. BNC1(II) Rhodobacterales bacterium HTCC2654 Nitrobacter winogradskyi Nham 0990 Nitrobacter hamburgensis X14 Jannaschia sp. CC51 Roseovarius sp.217 Roseobacter sp. MED193 Oceanicola batsensis HTCC2597 Loktanella vestfoldensis SKA53 Roseovarius nubinhibens ISM Gluconobacter oxydans Erythrobacter litoralis Novosphingobium aromaticivorans Sphinopyxis alaskensis RB2256 Oceanicaulis alexandrii HTCC2633 Rhodospirillum rubrum Parvularcula bermudensisHTCC2503 Magnetospirillum magneticum (I) EE Sulfitobacter sp. EE-36 ECOLI PSEAE NEIMA HELPY BACSU Helicobacter pylori : sp|O25671 Bacillus subtilis: P54574sp| Neisseria meningitidis : sp|P0A0S7 Pseudomonas aeruginosa : sp|Q03456 Escherichia coli : P0A9A9sp| Mur Fur  Magnetospirillum magneticum (II) RHE_CH00378 Rhizobiumetli  PU Pelagibacter ubique HTCC1002  Irr in  proteobacteria  proteobacteria Regulator of manganese uptake genes (sit, mntH) Regulator of iron uptake and metabolism genes

The A, B, and C groups of - proteobacteria -  Mur Caulobacter crescentus Zymomonas mobilis Gluconobacter oxydans Erythrobacter litoralis Novosphingobium aromaticivorans Rhodospirillum rubrum Magnetospirillum magneticum Escherichia coli Sphinopyxis alaskensis Parvularcula bermudensis - Oceanicaulis alexandrii Bacillus subtilis Sequence logos for the identified Fur-binding sites in the D group of  proteobacteria Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis Identified Mur-binding sites

Phylogenetic tree of the Fur family of transcription factors in  -proteobacteria - II Fur in  - and  - proteobacteria Fur in  - proteobacteria Fur in Firmicutes Irr in  proteo- bacteria regulator of iron homeostasis  proteobacteria Fur ECOLI PSEAE NEIMA HELPY BACSU Helicobacter pylori: sp|O25671 Bacillus subtilis: P54574sp| Neisseria meningitidis : sp|P0A0S7 Pseudomonas aeruginosa : sp|Q03456 Escherichia coli : P0A9A9sp| Mur / Fur  I rr- AGR C 249 SM irr RL irr1 RL irr2 MLr5570 MBNC BQ fur1 BMEI1955 BMEI1563 BJ blr1216 RB SKA ROS ISM OB Jann 1652 Rsph EE STM1w MED SPOA0445 RC irr RPA2339 RPA0424* BJ irr* Nwi 0035* Nham 1013* Nitrobacter hamburgensisX14 Nitrobacter winogradskyi Bradyrhizobium japonicum (I) Agrobacterium tumefaciens Rhizobium leguminosarum (I) Mesorhizobium sp. BNC1 Sinorhizobium meliloti Mesorhizobiumloti Bartonella quintana Brucella melitensis (I) Bradyrhizobium japonicum (II) Rhodobacter sphaeroides Rhodobactercapsulatus Silicibacter pomeroyi Silicibacter sp. TM1040 Roseobacter sp. MED193 Sulfitobacter sp. EE-36 Jannaschia sp. CC51 Oceanicola batsensis HTCC2597 Roseovarius nubinhibens ISM Roseovariussp.217 Loktanella vestfoldensis SKA53 Rhodobacterales bacterium HTCC2654  Rhizobiumetli RHE CH00106 Rhizobium leguminosarum (II) Brucella melitensis (II) Rhodopseudomonas palustris (II) Rhodopseudomonas palustris (I) PU Pelagibacter ubique HTCC1002

Sequence logos for the identified Irr binding sites in  -proteobacteria (8 species) - IrrThe A group The B group (4 species) - Irr The C group (12 species) - Irr

Phylogenetic tree of the Rrf2 family of transcription factors in  -proteobacteria proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif Iron repressor RirA (Rhizobium leguminosarum) Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli) Cysteine metabolism repressor CymR (Bacillus subtilis) Iron-Sulfur cluster synthesis repressor IscR (Escherichia coli) Positional clustering of rrf2-like genes with: iron uptake and storage genes; Fe-S cluster synthesis operons; genes involved in nitrosative stress protection; sulfate uptake/assimilation genes; thioredoxin reductase; carboxymuconolactone decarboxylase-family genes; hmc cytochrome operon Cytochrome complex regulator Rrf2 (Desulfovibrio vulgaris)

The A group - RirA(8 species) (12 species)The C group - RirA Sequence logos for the identified RirA-binding sites in  -proteobacteria

Genes Functions: Iron uptake Iron storage FeS synthesis Iron usage Heme biosynthesis Regulatory genes Manganese uptake Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in  -proteobacteria

An attempt to reconstruct the history

Regulators and their signals Subtle changes at close evolutionary distances Cases of motif conservation at surprisingly large distances Correlation between contacting nucleotides and amino acid residues

DNA signals and protein-DNA interactions CRPPurR IHFTrpR Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom)

Specificity-determining positions in the LacI family Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups 10 residues contact NPF (analog of the effector) 6 residues in the intersubunit contacts 7 residues contact the operator sequence 7 residues in the effector contact zone (5Ǻ<d min <10Ǻ) 5 residues in the intersubunit contact zone (5Ǻ<d min <10Ǻ) 6 residues in the operator contact zone (5Ǻ<d min <10Ǻ) – 44 SDPs LacI from E.coli

The LacI family: subtle changes in signals at close distances G A n CGCG GnGn GCGC

CRP/FNR family of regulators

Correlation between contacting nucleotides and amino acid residues CooA in Desulfovibrio spp. CRP in Gamma-proteobacteria HcpR in Desulfovibrio spp. FNR in Gamma-proteobacteria DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP KITRQEIGQIVGCSRETVGRILKMLED YP CRP KXTRQEIGQIVGCSRETVGRILKMLED VC CRP KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR TMTRGDIGNYLGLTVETISRLLGRFQK TGTCGGCnnGCCGACA TTGTgAnnnnnnTcACAA TTGTGAnnnnnnTCACAA TTGATnnnnATCAA Contacting residues: REnnnR TG: 1 st arginine GA: glutamate and 2 nd arginine

The correlation holds for other factors in the family

Open problems Model the evolution of regulatory systems (a catalog of elementary events, estimates of probabilities) –Birth of a binding site; what are the mechanisms? –Loss of a binding site –Duplication of a regulated gene and/or a regulator –Horizontal transfer of a regulated gene and/or a regulator –Loss of structural a gene and/or a regulator –General properties? Distribution of TF family and regulon sizes Stable cores and flexible margins of functional systems (in terms of gene presence and regulation) Co-evolution of TFs and DNA sites: –“Neutral” model for the evolution of binding sites (with invariant functional pressure from the bound protein) –How do the signals evolve? What is the driving force – changes in TFs? –TF-family, position-specific protein-DNA recognition code? All that needs to take into account the incompleteness and noise in the data

Acknowledgements Andrei A. Mironov (algorithms and software) Alexandra B. Rakhmaninova (SDPs) Dmitry Rodionov (now at Burnham Institute) (BioR, NrdR, iron) Olga Laikova (LacI, sugars) Dmitry Ravcheev (FruR) Olga Kalinina (SDPs/LacI) Leonid Mirny, MIT (protein/DNA contacts, SDPs) Andy Johnston, University of East Anglia (iron) Howard Hughes Medical Institute Russian Fund of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology” INTAS