Bioinformatics Methods in University of Nebraska - Lincoln Redox Biology Dmitri Fomenko Redox Biology Center University of Nebraska - Lincoln
Algorithm for identification of thiol oxidoreductases
Cysteines identification
Functional categories of cysteines 1. Cysteines with redox-catalytic activity. Such cysteines are directly involved in catalysis and occur in oxidoreductases. Examples: Thioredoxins, Glutaredoxins, Glutathione peroxidases, Peroxiredoxins, Methionine sulfoxide reductases. 2. Regulatory cysteines. Protein activity is regulated by redox state of these non-catalytic cysteines. Examples: transcription factors - OxyR, Yap1, chaperone Hsp33, mitochondrial branched chain aminotransferase. 3. Structural Cysteines. These cysteines are involved in formation of intramolecular and intermolecular disulfide bonds during oxidative folding and occur in various protein types. 4. Metal-coordinating Cysteines. These residues are involved in coordination of divalent metal ions. Examples: iron-sulfur clusters, zinc-binding proteins, calcium binding proteins. 5. Catalytic cysteines, which do not change their redox state during catalysis. Examples: cysteine proteases, GAPDH. Cysteine is one of two least abundant amino acids residues in proteins, but it is the most conserved amino acid. Functional cysteines are highly conserved even in distantly related organisms.
Major redox motif - CxxC CxxC-derived redox motifs: x – any amino acid CxxC-derived redox motifs: CxxS, CxxT, SxxC, TxxC Cysteines in the CxxC redox motif may be replaced with selenocysteine (U) Redox active cysteines are acsessible for interactions and located on protein surface
Thioredoxin (b-a-b-a-b-b-a) Major representatives: Thioredoxins Glutaredoxins Peroxiredoxins Glutathione peroxidases Protein disulfide isomerases (PDI) More then 60% of known thiol oxidoreductases are thioredoxin-fold proteins
Amino acid distribution around redox Cys
Cysteine Selenocysteine Identification of redox cysteines in protein sequences by homology to sporadic selenoproteins Cysteine pKa = 8.3 TGT, TGC-codons Selenocysteine pKa = 5.2 TGA-codon
Selenocysteine is a better catalytic group in proteins than cysteine Selenocysteine has much lower occurrence than cysteine All known selenoproteins are redox proteins and selenocysteine is located in their catalytic centers Most selenoproteins have cysteine containing homologs
Selenocysteine incorporation system Eukaryotes: SECIS Initiation of translation Selenocysteine UGA AUG STOP 5’ AAAAAAA 3’ UAA UAG UGA 100 - 5000 bp Bacteria: SECIS Initiation of translation AUG STOP 5’ 3’ UAA UAG UGA 20 - 50 bp
Major sequence alignment tools One of the most popular sequences analysis and alignment tool is BLAST (Basic Local Alignment Search Tools). BLAST is a set of sequence comparison programs that are used to search sequence databases for optimal local alignments to a query sequence. BLAST programs have good sensitivity and reasonable fast. The program exists as standalone version for most computer platforms and operation systems, and as web-version at http://www.ncbi.nlm.nih.gov/BLAST/. NCBI sequences databases updated each day and contain all protein and nucleotide sequences from open resources. There is a specialized extra sensitive tool for protein sequence alignment called PSI-BLAST (Position-Specific Iterative BLAST). This tool can be extremely useful for identification of similarity between distantly related proteins.
tblastn - output for AhpD hydroperoxide reductase
Rhodanese superfamily – 28 protein families CD01448 CD00158 CD01444 CD01524
Arsenic methyltransferases
General algorithm for identification of thiol oxidoreductases by searching for Sec/Cys pairs http://genomics.unl.edu/REDOX/REDOXCysSearch/ http://www.selenodb.org/ http://genome.unl.edu/SECISearch.html
Identification of conserved cysteines and redox motifs Redox active cysteines are typically conserved even in distantly related proteins. Blastall and PSI BLAST programs from BLAST tools could be used for conservation profile analyses
Metal binding cysteines Major metal coordinating residues – Cysteine and Histidine Major metal binding motif - CxxC Metal binding cysteines are conserved even in distantly related organisms More than 90% of CxxC motifs are involved in metal coordination Metal binding proteins are major false-positive hits in thiol oxidoreductases identification process Some of metal binding cysteines are involved in redox regulation
Metal-binding protein patterns and profiles in the PROSITE database http://ca.expasy.org/prosite/ There are 77 zinc-binding protein patterns, including 36 patterns that contain one cysteine. 30 of these 36 patterns contain one CxxC motif. Of 74 iron-binding protein patterns, 31 contain one cysteine. 15 of these 31 contain one CxxC motif.
Peroxiredoxins – thiol/disulfide oxidoreductase Galactitol-1-phosphate dehydrogenases (Zn)
The CxxC motif conservation filter Conservation profile based distribution for X1 and X2 X1 { N1(AA1), N2(AA2), N3(AA3), …,Nn(AAn)} N1>N2>N3>….>Nn X2 { P1(AA1), P2(AA2), P3(AA3),…, Pn(AAn)} P1>P2>P3>….>Pn AA - amino acid; Nn, Pn - number of amino acids; Metal-binding proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + <=1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn) Redox proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) + >1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn)
The CxxC motif distance filter Metal-binding proteins: <=80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC Redox proteins >80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC
Distribution of conserved amino acids flanking conserved cysteines in Saccharomyces cerevisiae
Prosite pattern based filter Amino acid conservation based filter CxxC motifs distance based filter
Distribution of conserved amino acids around conserved cysteines in Saccharomyces cerevisiae proteins after filtering out metal binding-proteins
Structure-based methods for prediction of thiol oxidoreductases Secondary structure prediction: PSI Pred - http://bioinf.cs.ucl.ac.uk/psipred/ SSPro - http://www.igb.uci.edu/tools/scratch/
Active cysteine followed by alpha helix in most of redox proteins. Secondary structures distribution (thioredoxin-fold proteins included) Thioredoxin fold proteins redox motif surrounded by beta strand and alpha helix beta-CxxC-alpha Secondary structures distribution (non thioredoxin-fold proteins only) Active cysteine followed by alpha helix in most of redox proteins. CxxC – alpha
Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________
Selenoprotein W (b-a-b-b-b-a) Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVD ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH___________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____
{ { Selenoprotein W (b-a-b-b-b-a) Rdx12 (b-a-b-b-b-a) Selenoprotein M (b-a-b-b-b-a) LLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _______________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ { Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__
15 kDa Protein (b-a-b-b-b-a) { Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Selenoprotein H (a-b-a-b-b-a-a) AAVVAVAEKREKLANGGEGMEEATVVIEHCTSCRVYGRNAAALSQALRLEAPELPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPEPQEVVEELKKYLS _HHHHHH________________EEEEEE____HHHHHHHHHHHHHHHHH____EEEE_________EEEEE_______HHHHHHH____________HHHHHHHHHHHH_ Selenoprotein T (b-b-a-b-a-a-a-a-b-b) GGVPSKRLKMQYATGPLLKFQICVSUGYRRVFEEYMRVISQRYPDIRIEGENYLPQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQAPSIWQWGQENKVYACMMVFFLSNMIENQCMSTGAFEITLNDVPVWSKLES ______EEEEEE____EEEEEEEEE___HHHHHHHHHHHHHH____EEE______HHHHHHHHHHHHHHHHHHHHHH_____HHHH______HHHHHH___HHHHHHHHHHHHHHHHHHH___EEEEEEE__EEEEEEE__ { Selenoprotein M (b-a-b-b-b-a) LLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _____________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__
Annotation of protein Sequence or structural similarity to known proteins Functional associations
Primary sequence based proteins 3D structure prediction 3D-Jury system http://bioinfo.pl/meta/
SelR MsrA Eukaryotes Archaea Bacteria Homo sapiens Drosophila melanogaster Caenorhabditis elegans Arabidopsis thaliana Saccharomyces cerevisiae Archaea Aeropyrum pernix Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC-1 Methanothermobacter thermautotrophicus Methanococcus jannaschii Pyrococcus abyssi Pyrococcus horikoshii Thermoplasma acidophilum Thermoplasma volcanium Bacteria Aquifex aeolicus Chlamydia muridarum Chlamydia trachomatis Chlamydophila pneumoniae AR39 Chlamydophila pneumoniae CWL029 Chlamydophila pneumoniae J138 Synechocystis sp. PCC 6803 Mycobacterium leprae Mycobacterium tuberculosis CDC1551 Mycobacterium tuberculosis H37Rv Bacillus halodurans Bacillus subtilis Clostridium acetobutylicum Mycoplasma genitalium Mycoplasma pneumoniae Mycoplasma pulmonis Ureaplasma urealyticum Lactococcus lactis subsp. lactis Staphylococcus aureus subsp. aureus Mu50 Staphylococcus aureus subsp. aureus N315 Streptococcus pneumoniae R6 Streptococcus pneumoniae TIGR4 Streptococcus pyogenes M1 GAS Caulobacter crescentus Agrobacterium tumefaciens Mesorhizobium loti Sinorhizobium meliloti Rickettsia conorii Rickettsia prowazekii Neisseria meningitidis MC58 Neisseria meningitidis Z2491 Campylobacter jejuni Helicobacter pylori 26695 Helicobacter pylori J99 Escherichia coli K12 Escherichia coli O157:H7 Escherichia coli O157:H7 EDL933 Yersinia pestis Buchnera sp. APS Vibrio cholerae Xylella fastidiosa 9a5c Haemophilus influenzae Rd Pasteurella multocida Pseudomonas aeruginosa Borrelia burgdorferi Treponema pallidum Thermotoga maritima Deinococcus radiodurans
Prediction of protein functional associations STRING - Search Tool http://string.embl.de/ Example - Yeasts mitochondrial glutaredoxin Grx5 >gi|6325198| Grx5p [Saccharomyces cerevisiae] COG0278 MFLPKFNPIRSFSPILRAKTLLRYQNRMYLSTEIRKAIEDAIESAPVVLFMKGTPEFPKCGFSRATIGLL GNQGVDPAKFAAYNVLEDPELREGIKEFSEWPTIPQLYVNKEFIGGCDVITSMARSGELADLLEEAQALV PEEEEETKDR
Genome neighborhood for Grx (COG0278)
Coocurrence prediction for GRX (COG0278)