Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
GIBBS SAMPLER FOR IDENTIFICATION OF SYMMETRICALLY STRUCTURED AND POSSIBLY SPACED DNA MOTIFS AND ITS VALIDATION ON THE ArcA BINDING SITES.
Riboswitches Sharon Epstein 30/03/2006 Frontiers in Metabolome sciences Feinberg Graduate School.
From computer to the wet lab and back Comparative genomics in the discovery of a family of bacterial transporters with a new mode of action Mikhail Gelfand.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Gene regulation. Gene expression models  Prokaryotes and Eukaryotes employ common and different methods of gene regulation  Prokaryotic models 1. Trp.
STRATEGY FOR GENE REGULATION 1.INFORMATION IN NUCLEIC ACID – CIS ELEMENT CIS = NEXT TO; ACTS ONLY ON THAT MOLECULE 2.TRANS FACTOR (USUALLY A PROTEIN) BINDS.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
1 Molecular genetics of bacteria Emphasis: ways that bacteria differ from eukaryotes DNA structure and function; definitions. DNA replication Transcription.
CHAPTER 8 Metabolic Respiration Overview of Regulation Most genes encode proteins, and most proteins are enzymes. The expression of such a gene can be.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Transcriptional Regulation Getting started – Promotors, Sigma Factors, and DNA-binding proteins.
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Sequence similarity.
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
(CHAPTER 12- Brooker Text)
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Tasha A. Desai, Dmitry A. Rodionov, Mikhail S. Gelfand, Eric J. Alm, and Christopher V. Rao 1 Alvin Chen April 14, 2010.
Sigma Factors & Transcriptional Regulation of P. syringae TTSS Alexander Wong.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Regulatory factors 1) Gene copy number 2) Transcriptional control 2-1) Promoters 2-2) Terminators, attenuators and anti-terminators 2-3) Induction and.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Laboratory of mathematical methods and models in bioinformatics Institute for Information Transmission Problems, Russian Academy of Sciences
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Evolution of bacterial regulatory systems Mikhail Gelfand Institute for Information Transmission Problems, RAS BGRS-2004, Novosibirsk.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Identification of specificity-determining positions in protein alignments Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Adaptive evolution in prokaryotic transcriptional regulatory networks M. Madan Babu, PhD NCBI, NLM National Institutes of Health.
Sequence analysis – an overview A.Krishnamachari
Chapter 16 Outline 16.4 Some Operons Regulate Transcription Through Attenuation, the Premature Termination of Transcription, Antisense RNA Molecules.
Reconstruction of Transcriptional Regulatory Networks
Life without Fur.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Evolution of regulatory interactions in bacteria Mikhail Gelfand Institute for Information Transmission Problems, RAS 4th Bertinoro Computational Biology.
Life without Fur. Mikhail Gelfand Research and Training Center of Bioinformatics, Institute for Information Transmission Problems, RAS Genome Dynamics:
Chapter 16 – Control of Gene Expression in Prokaryotes
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
1 Gene Regulation Organisms have lots of genetic information, but they don’t necessarily want to use all of it (or use it fully) at one particular time.
Structure and evolution of prokaryotic transcriptional regulatory networks Group Leader MRC Laboratory of Molecular Biology Cambridge M. Madan Babu.
Molecular and Genomic Evolution Getting at the Gene Pool.
Control of Gene Expression Chapter DNA RNA Protein replication (mutation!) transcription translation (nucleotides) (amino acids) (nucleotides) Nucleic.
1 Molecular genetics of bacteria Gene regulation and regulation of metabolism Genetic exchange among bacteria Bacteria are successful because 1.They carefully.
Evolution of regulatory interactions in bacteria Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
© 2009 W. H. Freeman and Company
SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.
1 Computational functional genomics Lital Haham Sivan Pearl.
Ribonucleotide reductases (RNRs) catalyse the reduction of ribonucleotides to their corresponding 2`-deoxyribonucleotides and therefore play an essential.
Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Intersubunit contacts are often facilitated by specificity-determining positions Computational identification of protein positions that possibly account.
Control of Gene Expression in Prokaryotes
Institute for Information Transmission Problems
Mikhail Gelfand Research and Training Center of Bioinformatics
Genome-wide Reconstruction of OxyR and SoxRS Transcriptional Regulatory Networks under Oxidative Stress in Escherichia coli K-12 MG1655  Sang Woo Seo,
Presentation transcript:

Evolution of bacterial regulatory systems Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems Moscow, Russia CASB-20, UCDS, La Jolla, III.2009

Plan Co-evolution of transcription factors and their binding motifs Evolution of regulatory systems and regulons

Regulators and their motifs Cases of motif conservation at surprisingly large distances Subtle changes at close evolutionary distances Correlation between contacting nucleotides and amino acid residues

NrdR (regulator of ribonucleotide reducases and some other replication-related genes): conservation at large distances

DNA motifs and protein-DNA interactions CRP PurR IHFTrpR Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom)

The CRP/FNR family of regulators

Correlation between contacting nucleotides and amino acid residues CooA in Desulfovibrio spp. CRP in Gamma-proteobacteria HcpR in Desulfovibrio spp. FNR in Gamma-proteobacteria DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP KITRQEIGQIVGCSRETVGRILKMLED YP CRP KXTRQEIGQIVGCSRETVGRILKMLED VC CRP KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR TMTRGDIGNYLGLTVETISRLLGRFQK TGTCGGCnnGCCGACA TTGTgAnnnnnnTcACAA TTGTGAnnnnnnTCACAA TTGATnnnnATCAA Contacting residues: REnnnR TG: 1 st arginine GA: glutamate and 2 nd arginine

The correlation holds for other factors in the family

The LacI family: subtle changes in motifs at close distances G A n CGCG GnGn GCGC

The LacI family: systematic analysis 1369 DNA-binding domains in 200 orthologous rows =35%, =71 а.о binding sites, L=20н., =45% Calculate mutual information between columns of TF and site alignments Set threshold on mutual information of correlated pairs

Definitions Sites LAFDHDQILQMAQERLQGKVRYQP-IGFELLPEKFSLRQLQRMYETVLGRS---LDKRNF LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGED--FADYSNF tTAaTGgCTTTAtGcCACTAT LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN--FSDYSNF LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF LAFDHSKILAYGHRRLCNKLEYSP-VAFDVLPEYFTLNDLYQFYSTVLGAN--FSDYSNF LAFDHNQILDYGYQRLRNKLEYSP-IAFEVLPELFTLNDLFQLYTTVLGED--FADYSNF LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN—-FSDYSNF LSFDHNEILAYGHRRLRNKLEYSP-VAFEVLPEMFTLNDLYQLYTTVLGEN--FSDYSNF TTAaaGTAAtAaTTACCATAA AaAtTGTCTTTAtGcCACTAT TTATGGTAAATTcTACCATAA TTATgGTCAgTTTcACcAaAA tTAaTGgCTTTAtGcCACTAT TTaGTCgAAATAaccaACtAA TTATCGTCAtCtcGACGACAA TttAGGTAAgTTATACTTTTA Z-score Mutual information Protein alignment

Correlated pairs

Higher order correlations -ATIKDVAKRANVSTTTV- AATTGTGAGCGCTCACT SLSQ TL TQ

Not a phylogenetic trace

NrtR (regulator of NAD metabolism)

Comparison with the recently solved structure: correlated positions indeed bind the DNA (more exactly, form a hydrophobic cluster)

Catalog of events Expansion and contraction of regulons New regulators (where from?) Duplications of regulators with or without regulated loci Loss of regulators with or without regulated loci Re-assortment of regulators and structural genes … especially in complex systems Horizontal transfer

Regulon expansion, or how FruR has become CRA CRA (a.k.a. FruR) in Escherichia coli: –global regulator –well-studied in experiment (many regulated genes known) Going back in time: looking for candidate CRA/FruR sites upstream of (orthologs of) genes known to be regulated in E.coli

Common ancestor of gamma-proteobacteria icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria

Common ancestor of the Enterobacteriales icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria Enterobacteriales

Common ancestor of Escherichia and Salmonella icdA aceA aceB aceEF pckA ppsApykF adhE gpmApgk tpiA gapA pfkA fbp Fructose fruKfruBA eda edd epd Glucose ptsHI-crr Mannose manXYZ mtlD mtlA Mannitol Gamma-proteobacteria Enterobacteriales E. coli and Salmonella spp.

Regulation of amino acid biosynthesis in the Firmicutes Interplay between regulatory RNA elements and transcription factors Expansion of T-box systems (normally – RNA structures regulating aminoacyl-tRNA-synthetases)

Recent duplications and bursts: ARG-T-box in Clostridium difficile

… caused by loss of transcription factor AhrC

Duplications and changes in specificity: ASN/ASP/HIS T-boxes

Blow-up 1

Blow-up 2. Prediction Regulators lost in lineages with expanded HIS-T-box regulon??

… and validation conserved motifs upstream of HIS biosynthesis genes candidate transcription factor yerC co-localized with the his genes present only in genomes with the motifs upstream of the his genes genomes with neither YerC motif nor HIS-T-boxes: attenuators Bacillales (his operon) Clostridiales Thermoanaerobacteriales Halanaerobiales Bacillales

The evolutionary history of the his genes regulation in the Firmicutes

T-boxes: Summary / History

Life without Fur

Regulation of iron homeostasis (the Escherichia coli paradigm) Iron: essential cofactor (limiting in many environments) dangerous at large concentrations FUR (responds to iron): synthesis of siderophores transport (siderophores, heme, Fe 2+, Fe 3+ ) storage iron-dependent enzymes synthesis of heme synthesis of Fe-S clusters Similar in Bacillus subtilis

Regulation of iron homeostasis in α-proteobacteria Experimental studies: FUR/MUR: Bradyrhizobium, Rhizobium and Sinorhizobium RirA (Rrf2 family): Rhizobium and Sinorhizobium Irr (FUR family): Bradyrhizobium, Rhizobium and Brucella RirA Irr FeSheme RirA degraded Fur Fe Fur Iron uptak e systems Sideroph ore uptake Fe / Fe uptake Transcription factors 2+3+ Iron storage ferritins FeS synthesis Heme synthesis Iron-requiring enzymes [iron cofactor] IscR Irr [- Fe] [+Fe] [- Fe] [+Fe] [ Fe]- FeS FeS status of cell

Distribution of transcription factors in genomes Search for candidate motifs and binding sites using standard comparative genomic techniques

Regulation of genes in functional subsystems Rhizobiales Bradyrhizobiaceae Rhodobacteriales The Zoo (likely ancestral state)

Reconstruction of history Appearance of the iron-Rhodo motif Frequent co-regulation with Irr Strict division of function with Irr

All logos and Some Very Tempting Hypotheses: 1.Cross-recognition of FUR and IscR motifs in the ancestor. 2.When FUR had become MUR, and IscR had been lost in Rhizobiales, emerging RirA (from the Rrf2 family, with a rather different general consensus) took over their sites. 3.Iron-Rhodo boxes are recognized by IscR: directly testable 1 2 3

Summary and open problems Regulatory systems are very flexible –easily lost –easily expanded (in particular, by duplication) –may change specificity –rapid turnover of regulatory sites With more stories like these, we can start thinking about a general theory –catalog of elementary events; how frequent? –mechanisms (duplication, birth e.g. from enzymes, horizontal transfer) –conserved (regulon cores) and non-conserved (marginal regulon members) genes in relation to metabolic and functional subsystems/roles –(TF family-specific) protein-DNA recognition code –distribution of TF families in genomes; distribution of regulon sizes; etc.

People Andrei A. Mironov – software, algorithms Alexandra Rakhmaninova – SDP, protein-DNA correlations Anna Gerasimova (now at LBNL) – NadR Olga Kalinina (on loan to EMBL) – SDP Yuri Korostelev – protein-DNA correlations Olga Laikova – LacI Dmitry Ravcheev– CRA/FruR Dmitry Rodionov (on loan to Burnham Institute) – iron etc. Alexei Vitreschak – T-boxes and riboswitches Andy Jonson (U. of East Anglia) – experimental validation (iron) Leonid Mirny (MIT) – protein-DNA, SDP Andrei Osterman (Burnham Institute) – experimental validation Howard Hughes Medical Institute Russian Foundation of Basic Research Russian Academy of Sciences, program “Molecular and Cellular Biology”