Bioinformatics Methods in University of Nebraska - Lincoln

Slides:



Advertisements
Similar presentations
The DNA Story Germs, Genes, and Genomics 4. Heredity Genes DNA Manipulating DNA.
Advertisements

Bacterial Physiology A Proteomic Approach to Oral Diseases Oral Diseases Peter Zilm Microbiology Laboratory Dental School The University of Adelaide.
From First Assembly Towards a New Cyberpharmaceutical Computing Paradigm Sorin Istrail Senior Director, Informatics Research.
1 Genome sizes (sample). 2 Some genomics history 1995: first bacterial genome, Haemophilus influenza, 1.8 Mbp, sequenced at TIGR first use of whole-genome.
BIO513: Lecture 1. Central dogma “The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Similar Sequence Similar Function Charles Yan Spring 2006.
Exploring the Biology of Disulfide-Rich Hyperthermophiles through Protein Phylogenetic Profiles Navapoln Ramakul 1, Morgan Beeby 12, and Todd O. Yeates.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Bioinformatics Analysis of YqjG: an introduction and some questions YqjG: “Uncharacterized protein” from Escherichia coli UniProt ID = P42620 (YQJG_ECOLI)
BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation.
Integrating Genomics Throughout the Curriculum, with an Emphasis on Prokaryotes Jeffrey D. Newman Lycoming College May 20, 2002.
CHMI E.R. Gauthier, Ph.D. 1 CHMI 2227E Biochemistry I Gene expression.
Small Talk Cell-to-Cell Communication in Bacteria.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Protein Synthesis Process that makes proteins
Genomic ORFans: Past, Present and Future Naomi Siew and Daniel Fischer Ben-Gurion University Be’er-Sheva, Israel.
1 A new theory of gene regulation based on relationships of DNA sequences flanking genes Richard J. Feldmann Global Determinants, Inc. Derwood, Maryland.
AHM September 2004 Grid Services Supporting the Usage of Secure Federated, Distributed Biomedical Data Dr Richard Sinnott Technical Director National e-Science.
Central dogma: the story of life RNA DNA Protein.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
Step 3: Tools Database Searching
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Suppl Table 1 Suppl.Tab.1. Homogeneity comparison of amino acid sequences of selective Bacterial gene products with the reserved TM1-H5-TM2 sequence of.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
MICROBIOLOGY PRESENTATION BY Momen ali khan. Staphylococcus Streptococcus Enterococcus faecalis.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Biology DNA Unit.
B. subtilis as query species
Protein structure is conceptually divided into four levels of organization Primary structure is the amino acid sequence of a protein's polypeptide chain.
Evolution of Metabolic Pathway
Nothing in (computational) biology makes
Basics of BLAST Basic BLAST Search - What is BLAST?
A Biological Guillotine
Crystal Structure of Protein Isoaspartyl Methyltransferase
Structure of a Complex between E
Volume 11, Issue 3, Pages (March 2003)
There are four levels of structure in proteins
Crystal structure of the chemotaxis receptor methyltransferase CheR suggests a conserved structural motif for binding S-adenosylmethionine  Snezana Djordjevic,
CHMI 2227E Biochemistry I Gene expression
Recerca de selenoproteïnes en el genoma d’organimes eucariotes
Recerca de selenoproteïnes en el genoma d’organimes eucariotes
The Y-Family of DNA Polymerases
Volume 16, Issue 11, Pages (November 2008)
Sequence Based Analysis Tutorial
Crystal Structure of RumA, an Iron-Sulfur Cluster Containing E
Argonaute proteins Current Biology
Volume 108, Issue 5, Pages (March 2002)
Volume 8, Issue 7, Pages (July 2000)
Volume 122, Issue 4, Pages (August 2005)
Volume 8, Issue 7, Pages (July 2000)
Crystal Structure of Protein Isoaspartyl Methyltransferase
Volume 8, Issue 1, Pages (January 2000)
Volume 122, Issue 4, Pages (August 2005)
Volume 16, Issue 11, Pages (November 2008)
Computational genomics
Volume 12, Issue 5, Pages (May 2004)
L. Aravind, Eugene V. Koonin  Current Biology 
Volume 7, Issue 9, Pages (September 1999)
Missense Mutation in Pseudouridine Synthase 1 (PUS1) Causes Mitochondrial Myopathy and Sideroblastic Anemia (MLASA)  Yelena Bykhovskaya, Kari Casas, Emebet.
Missense Mutations in the N-Terminal Domain of Human Phenylalanine Hydroxylase Interfere with Binding of Regulatory Phenylalanine  Torben Gjetting, Marie.
L. Aravind, Eugene V. Koonin  Current Biology 
Conservation of Intramembrane Proteolytic Activity and Substrate Specificity in Prokaryotic and Eukaryotic Rhomboids  Sinisa Urban, Daniel Schlieper,
Correspondence Current Biology
Presentation transcript:

Bioinformatics Methods in University of Nebraska - Lincoln Redox Biology Dmitri Fomenko Redox Biology Center University of Nebraska - Lincoln

Algorithm for identification of thiol oxidoreductases

Cysteines identification

Functional categories of cysteines 1. Cysteines with redox-catalytic activity. Such cysteines are directly involved in catalysis and occur in oxidoreductases. Examples: Thioredoxins, Glutaredoxins, Glutathione peroxidases, Peroxiredoxins, Methionine sulfoxide reductases. 2. Regulatory cysteines. Protein activity is regulated by redox state of these non-catalytic cysteines. Examples: transcription factors - OxyR, Yap1, chaperone Hsp33, mitochondrial branched chain aminotransferase. 3. Structural Cysteines. These cysteines are involved in formation of intramolecular and intermolecular disulfide bonds during oxidative folding and occur in various protein types. 4. Metal-coordinating Cysteines. These residues are involved in coordination of divalent metal ions. Examples: iron-sulfur clusters, zinc-binding proteins, calcium binding proteins. 5. Catalytic cysteines, which do not change their redox state during catalysis. Examples: cysteine proteases, GAPDH. Cysteine is one of two least abundant amino acids residues in proteins, but it is the most conserved amino acid. Functional cysteines are highly conserved even in distantly related organisms.

Major redox motif - CxxC CxxC-derived redox motifs:  x – any amino acid CxxC-derived redox motifs:   CxxS, CxxT, SxxC, TxxC Cysteines in the CxxC redox motif may be replaced with selenocysteine (U) Redox active cysteines are acsessible for interactions and located on protein surface

Thioredoxin (b-a-b-a-b-b-a) Major representatives: Thioredoxins Glutaredoxins Peroxiredoxins Glutathione peroxidases Protein disulfide isomerases (PDI) More then 60% of known thiol oxidoreductases are thioredoxin-fold proteins

Amino acid distribution around redox Cys

Cysteine Selenocysteine Identification of redox cysteines in protein sequences by homology to sporadic selenoproteins Cysteine pKa = 8.3 TGT, TGC-codons Selenocysteine pKa = 5.2 TGA-codon

Selenocysteine is a better catalytic group in proteins than cysteine Selenocysteine has much lower occurrence than cysteine All known selenoproteins are redox proteins and selenocysteine is located in their catalytic centers Most selenoproteins have cysteine containing homologs

Selenocysteine incorporation system Eukaryotes: SECIS Initiation of translation Selenocysteine UGA AUG STOP 5’ AAAAAAA 3’ UAA UAG UGA 100 - 5000 bp Bacteria: SECIS Initiation of translation AUG STOP 5’ 3’ UAA UAG UGA 20 - 50 bp

Major sequence alignment tools One of the most popular sequences analysis and alignment tool is BLAST (Basic Local Alignment Search Tools). BLAST is a set of sequence comparison programs that are used to search sequence databases for optimal local alignments to a query sequence. BLAST programs have good sensitivity and reasonable fast. The program exists as standalone version for most computer platforms and operation systems, and as web-version at http://www.ncbi.nlm.nih.gov/BLAST/. NCBI sequences databases updated each day and contain all protein and nucleotide sequences from open resources. There is a specialized extra sensitive tool for protein sequence alignment called PSI-BLAST (Position-Specific Iterative BLAST). This tool can be extremely useful for identification of similarity between distantly related proteins.

tblastn - output for AhpD hydroperoxide reductase

Rhodanese superfamily – 28 protein families CD01448 CD00158 CD01444 CD01524

Arsenic methyltransferases

General algorithm for identification of thiol oxidoreductases by searching for Sec/Cys pairs http://genomics.unl.edu/REDOX/REDOXCysSearch/ http://www.selenodb.org/ http://genome.unl.edu/SECISearch.html

Identification of conserved cysteines and redox motifs Redox active cysteines are typically conserved even in distantly related proteins. Blastall and PSI BLAST programs from BLAST tools could be used for conservation profile analyses

Metal binding cysteines Major metal coordinating residues – Cysteine and Histidine Major metal binding motif - CxxC Metal binding cysteines are conserved even in distantly related organisms More than 90% of CxxC motifs are involved in metal coordination Metal binding proteins are major false-positive hits in thiol oxidoreductases identification process Some of metal binding cysteines are involved in redox regulation

Metal-binding protein patterns and profiles in the PROSITE database http://ca.expasy.org/prosite/ There are 77 zinc-binding protein patterns, including 36 patterns that contain one cysteine. 30 of these 36 patterns contain one CxxC motif. Of 74 iron-binding protein patterns, 31 contain one cysteine. 15 of these 31 contain one CxxC motif.

Peroxiredoxins – thiol/disulfide oxidoreductase Galactitol-1-phosphate dehydrogenases (Zn)

The CxxC motif conservation filter Conservation profile based distribution for X1 and X2 X1 { N1(AA1), N2(AA2), N3(AA3), …,Nn(AAn)} N1>N2>N3>….>Nn X2 { P1(AA1), P2(AA2), P3(AA3),…, Pn(AAn)} P1>P2>P3>….>Pn AA - amino acid; Nn, Pn - number of amino acids;  Metal-binding proteins:   N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + <=1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn) Redox proteins:   N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + >1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn)

The CxxC motif distance filter Metal-binding proteins: <=80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC Redox proteins >80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC

Distribution of conserved amino acids flanking conserved cysteines in Saccharomyces cerevisiae

Prosite pattern based filter Amino acid conservation based filter CxxC motifs distance based filter

Distribution of conserved amino acids around conserved cysteines in Saccharomyces cerevisiae proteins after filtering out metal binding-proteins

Structure-based methods for prediction of thiol oxidoreductases Secondary structure prediction: PSI Pred - http://bioinf.cs.ucl.ac.uk/psipred/ SSPro - http://www.igb.uci.edu/tools/scratch/

Active cysteine followed by alpha helix in most of redox proteins. Secondary structures distribution (thioredoxin-fold proteins included) Thioredoxin fold proteins redox motif surrounded by beta strand and alpha helix beta-CxxC-alpha Secondary structures distribution (non thioredoxin-fold proteins only) Active cysteine followed by alpha helix in most of redox proteins. CxxC – alpha

Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________   15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________

Selenoprotein W (b-a-b-b-b-a) Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________   15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVD ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH___________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____

{ { Selenoprotein W (b-a-b-b-b-a) Rdx12 (b-a-b-b-b-a) Selenoprotein M (b-a-b-b-b-a) LLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _______________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________   15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ { Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__

15 kDa Protein (b-a-b-b-b-a) { Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Selenoprotein H (a-b-a-b-b-a-a)   AAVVAVAEKREKLANGGEGMEEATVVIEHCTSCRVYGRNAAALSQALRLEAPELPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPEPQEVVEELKKYLS    _HHHHHH________________EEEEEE____HHHHHHHHHHHHHHHHH____EEEE_________EEEEE_______HHHHHHH____________HHHHHHHHHHHH_ Selenoprotein T (b-b-a-b-a-a-a-a-b-b) GGVPSKRLKMQYATGPLLKFQICVSUGYRRVFEEYMRVISQRYPDIRIEGENYLPQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQAPSIWQWGQENKVYACMMVFFLSNMIENQCMSTGAFEITLNDVPVWSKLES   ______EEEEEE____EEEEEEEEE___HHHHHHHHHHHHHH____EEE______HHHHHHHHHHHHHHHHHHHHHH_____HHHH______HHHHHH___HHHHHHHHHHHHHHHHHHH___EEEEEEE__EEEEEEE__ { Selenoprotein M (b-a-b-b-b-a) LLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _____________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________   15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________   Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__

Annotation of protein Sequence or structural similarity to known proteins Functional associations

Primary sequence based proteins 3D structure prediction 3D-Jury system http://bioinfo.pl/meta/

SelR MsrA Eukaryotes Archaea Bacteria Homo sapiens    Drosophila melanogaster  Caenorhabditis elegans  Arabidopsis thaliana     Saccharomyces cerevisiae  Archaea Aeropyrum pernix Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC-1  Methanothermobacter thermautotrophicus  Methanococcus jannaschii Pyrococcus abyssi Pyrococcus horikoshii Thermoplasma acidophilum Thermoplasma volcanium Bacteria Aquifex aeolicus Chlamydia muridarum Chlamydia trachomatis Chlamydophila pneumoniae AR39 Chlamydophila pneumoniae CWL029 Chlamydophila pneumoniae J138 Synechocystis sp. PCC 6803  Mycobacterium leprae Mycobacterium tuberculosis CDC1551  Mycobacterium tuberculosis H37Rv  Bacillus halodurans  Bacillus subtilis  Clostridium acetobutylicum  Mycoplasma genitalium  Mycoplasma pneumoniae  Mycoplasma pulmonis  Ureaplasma urealyticum Lactococcus lactis subsp. lactis  Staphylococcus aureus subsp. aureus Mu50  Staphylococcus aureus subsp. aureus N315  Streptococcus pneumoniae R6   Streptococcus pneumoniae TIGR4   Streptococcus pyogenes M1 GAS   Caulobacter crescentus  Agrobacterium tumefaciens  Mesorhizobium loti   Sinorhizobium meliloti   Rickettsia conorii Rickettsia prowazekii Neisseria meningitidis MC58  Neisseria meningitidis Z2491  Campylobacter jejuni  Helicobacter pylori 26695  Helicobacter pylori J99  Escherichia coli K12  Escherichia coli O157:H7  Escherichia coli O157:H7 EDL933  Yersinia pestis  Buchnera sp. APS Vibrio cholerae    Xylella fastidiosa 9a5c  Haemophilus influenzae Rd  Pasteurella multocida  Pseudomonas aeruginosa  Borrelia burgdorferi Treponema pallidum  Thermotoga maritima Deinococcus radiodurans                  

Prediction of protein functional associations STRING - Search Tool http://string.embl.de/ Example - Yeasts mitochondrial glutaredoxin Grx5 >gi|6325198| Grx5p [Saccharomyces cerevisiae] COG0278 MFLPKFNPIRSFSPILRAKTLLRYQNRMYLSTEIRKAIEDAIESAPVVLFMKGTPEFPKCGFSRATIGLL GNQGVDPAKFAAYNVLEDPELREGIKEFSEWPTIPQLYVNKEFIGGCDVITSMARSGELADLLEEAQALV PEEEEETKDR

Genome neighborhood for Grx (COG0278)

Coocurrence prediction for GRX (COG0278)