Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Methods in University of Nebraska - Lincoln

Similar presentations


Presentation on theme: "Bioinformatics Methods in University of Nebraska - Lincoln"— Presentation transcript:

1 Bioinformatics Methods in University of Nebraska - Lincoln
Redox Biology Dmitri Fomenko Redox Biology Center University of Nebraska - Lincoln

2 Algorithm for identification of thiol oxidoreductases

3 Cysteines identification

4 Functional categories of cysteines
1. Cysteines with redox-catalytic activity. Such cysteines are directly involved in catalysis and occur in oxidoreductases. Examples: Thioredoxins, Glutaredoxins, Glutathione peroxidases, Peroxiredoxins, Methionine sulfoxide reductases. 2. Regulatory cysteines. Protein activity is regulated by redox state of these non-catalytic cysteines. Examples: transcription factors - OxyR, Yap1, chaperone Hsp33, mitochondrial branched chain aminotransferase. 3. Structural Cysteines. These cysteines are involved in formation of intramolecular and intermolecular disulfide bonds during oxidative folding and occur in various protein types. 4. Metal-coordinating Cysteines. These residues are involved in coordination of divalent metal ions. Examples: iron-sulfur clusters, zinc-binding proteins, calcium binding proteins. 5. Catalytic cysteines, which do not change their redox state during catalysis. Examples: cysteine proteases, GAPDH. Cysteine is one of two least abundant amino acids residues in proteins, but it is the most conserved amino acid. Functional cysteines are highly conserved even in distantly related organisms.

5 Major redox motif - CxxC CxxC-derived redox motifs:
 x – any amino acid CxxC-derived redox motifs: CxxS, CxxT, SxxC, TxxC Cysteines in the CxxC redox motif may be replaced with selenocysteine (U) Redox active cysteines are acsessible for interactions and located on protein surface

6 Thioredoxin (b-a-b-a-b-b-a)
Major representatives: Thioredoxins Glutaredoxins Peroxiredoxins Glutathione peroxidases Protein disulfide isomerases (PDI) More then 60% of known thiol oxidoreductases are thioredoxin-fold proteins

7 Amino acid distribution around redox Cys

8 Cysteine Selenocysteine
Identification of redox cysteines in protein sequences by homology to sporadic selenoproteins Cysteine pKa = 8.3 TGT, TGC-codons Selenocysteine pKa = 5.2 TGA-codon

9 Selenocysteine is a better catalytic group in proteins than cysteine
Selenocysteine has much lower occurrence than cysteine All known selenoproteins are redox proteins and selenocysteine is located in their catalytic centers Most selenoproteins have cysteine containing homologs

10 Selenocysteine incorporation system
Eukaryotes: SECIS Initiation of translation Selenocysteine UGA AUG STOP 5’ AAAAAAA 3’ UAA UAG UGA bp Bacteria: SECIS Initiation of translation AUG STOP 5’ 3’ UAA UAG UGA bp

11

12 Major sequence alignment tools
One of the most popular sequences analysis and alignment tool is BLAST (Basic Local Alignment Search Tools). BLAST is a set of sequence comparison programs that are used to search sequence databases for optimal local alignments to a query sequence. BLAST programs have good sensitivity and reasonable fast. The program exists as standalone version for most computer platforms and operation systems, and as web-version at NCBI sequences databases updated each day and contain all protein and nucleotide sequences from open resources. There is a specialized extra sensitive tool for protein sequence alignment called PSI-BLAST (Position-Specific Iterative BLAST). This tool can be extremely useful for identification of similarity between distantly related proteins.

13 tblastn - output for AhpD hydroperoxide reductase

14 Rhodanese superfamily – 28 protein families
CD01448 CD00158 CD01444 CD01524

15 Arsenic methyltransferases

16 General algorithm for identification of
thiol oxidoreductases by searching for Sec/Cys pairs

17 Identification of conserved cysteines and redox motifs
Redox active cysteines are typically conserved even in distantly related proteins. Blastall and PSI BLAST programs from BLAST tools could be used for conservation profile analyses

18 Metal binding cysteines
Major metal coordinating residues – Cysteine and Histidine Major metal binding motif - CxxC Metal binding cysteines are conserved even in distantly related organisms More than 90% of CxxC motifs are involved in metal coordination Metal binding proteins are major false-positive hits in thiol oxidoreductases identification process Some of metal binding cysteines are involved in redox regulation

19 Metal-binding protein patterns and profiles in the PROSITE database
There are 77 zinc-binding protein patterns, including 36 patterns that contain one cysteine. 30 of these 36 patterns contain one CxxC motif. Of 74 iron-binding protein patterns, 31 contain one cysteine. 15 of these 31 contain one CxxC motif.

20 Peroxiredoxins – thiol/disulfide oxidoreductase
Galactitol-1-phosphate dehydrogenases (Zn)

21 The CxxC motif conservation filter
Conservation profile based distribution for X1 and X2 X1 { N1(AA1), N2(AA2), N3(AA3), …,Nn(AAn)} N1>N2>N3>….>Nn X2 { P1(AA1), P2(AA2), P3(AA3),…, Pn(AAn)} P1>P2>P3>….>Pn AA - amino acid; Nn, Pn - number of amino acids;  Metal-binding proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) <=1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn) Redox proteins: N1(AA1)+ N2(AA2) P1(AA1)+ P2(AA2) >1 N1(AA1)+ N2(AA2)+…..+ Nn(AAn) P1(AA1)+ P2(AA2)+…..+ Pn(AAn)

22 The CxxC motif distance filter Metal-binding proteins:
<=80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC Redox proteins >80 Amino acids CxxCxxxxxx……………..xxxxxxCxxC

23 Distribution of conserved amino acids flanking conserved cysteines in Saccharomyces cerevisiae

24 Prosite pattern based filter
Amino acid conservation based filter CxxC motifs distance based filter

25 Distribution of conserved amino acids around conserved cysteines in Saccharomyces cerevisiae proteins after filtering out metal binding-proteins

26 Structure-based methods for prediction of thiol oxidoreductases
Secondary structure prediction: PSI Pred - SSPro -

27 Active cysteine followed by alpha helix in most of redox proteins.
Secondary structures distribution (thioredoxin-fold proteins included) Thioredoxin fold proteins redox motif surrounded by beta strand and alpha helix beta-CxxC-alpha Secondary structures distribution (non thioredoxin-fold proteins only) Active cysteine followed by alpha helix in most of redox proteins. CxxC – alpha

28 Selenoprotein M (b-a-b-b-b-a)
MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________

29 Selenoprotein W (b-a-b-b-b-a)
Selenoprotein M (b-a-b-b-b-a) MSILLSPPSLLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS ________________________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVD ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH___________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____

30 { { Selenoprotein W (b-a-b-b-b-a) Rdx12 (b-a-b-b-b-a)
Selenoprotein M (b-a-b-b-b-a) LLLLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _______________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ { Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__

31 15 kDa Protein (b-a-b-b-b-a)
{ Selenoprotein W (b-a-b-b-b-a) MTETKPRIAIRYCTQUNWLLRAGWMAQEILQTFASDIGEVSLIPSTGGLFEITVDGTIIWERKRDGGFPGPKELKQRIRDLIDPERDLGHVDR ______EEEEEEE_____HHHHHHHHHHHHHHHHHH___EEEE_____EEEEEE__EEEEEE________HHHHHHHHHHH____________ Rdx12 (b-a-b-b-b-a) MSGEPGQVSVVPPPGEVEAGSGVHIVVEYCKPCGFEATYLELASSLEEEYPGIEIESRLGGTGAFEIEINGQLVFSKLENGGFPYEKDLMEAIRRASNGEPLEKITNSRPP _______________________EEEEEEE_____HHHHHHHHHHHHHH___EEEEE______EEEEEE__EEEEEEE_______HHHHHHHHHHH_____HHHHH_____ Selenoprotein H (a-b-a-b-b-a-a)   AAVVAVAEKREKLANGGEGMEEATVVIEHCTSCRVYGRNAAALSQALRLEAPELPVKVNPTKPRRGSFEVTLLRPDGSSAELWTGIKKGPPRKLKFPEPQEVVEELKKYLS    _HHHHHH________________EEEEEE____HHHHHHHHHHHHHHHHH____EEEE_________EEEEE_______HHHHHHH____________HHHHHHHHHHHH_ Selenoprotein T (b-b-a-b-a-a-a-a-b-b) GGVPSKRLKMQYATGPLLKFQICVSUGYRRVFEEYMRVISQRYPDIRIEGENYLPQPIYRHIASFLSVFKLVLIGLIIVGKDPFAFFGMQAPSIWQWGQENKVYACMMVFFLSNMIENQCMSTGAFEITLNDVPVWSKLES   ______EEEEEE____EEEEEEEEE___HHHHHHHHHHHHHH____EEE______HHHHHHHHHHHHHHHHHHHHHH_____HHHH______HHHHHH___HHHHHHHHHHHHHHHHHHH___EEEEEEE__EEEEEEE__ { Selenoprotein M (b-a-b-b-b-a) LLLAALVAPATSTTNYRPDWNRLRGLARGRVETCGGUQLNRLKEVKAFVTEDIQLYHNLVMKHLPGADPELVLLSRNYQELERIPLSQMTRDEINALVQELGFYRKS _____________________________EEEEE________HHHHH___________EEEE_______EEEEEE______EEEEEE___HHHHHHHHH________ 15 kDa Protein  (b-a-b-b-b-a) LDQQPAAQRTYAKAILEVCTUKFRAYPQIQAFIQSGRPAKFPNLQIKYVRGLDPVVKLLDASGKVQETLSITKWNTDTVEEFFETHLAKDGAGKNS ____________EEEEEEE_______HHHHHHHHH_______EEEEEEE_____EEEEEE_____EEEE________HHHHHHHHH__________ Thioredoxin (b-a-b-a-b-b-a) KGYTVFEFTAGWCPDCKVIEPDLPKLEKKYSQLQFVSVDRDQFIDICVQNDILGIPSFLIFKEGQLLGSYIGKERKSIDQIDQFLSQHI ___EEEEEE____HHHHHHHHHHHHHHHH___EEEEEE_HHHHHHHHHH_______EEEEEE__EEEEEEE___HHHHHHHHHHHHH__

32

33 Annotation of protein Sequence or structural similarity to known proteins Functional associations

34 Primary sequence based proteins 3D structure prediction
3D-Jury system

35

36 SelR MsrA Eukaryotes Archaea Bacteria Homo sapiens   
Drosophila melanogaster  Caenorhabditis elegans  Arabidopsis thaliana     Saccharomyces cerevisiae  Archaea Aeropyrum pernix Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC  Methanothermobacter thermautotrophicus  Methanococcus jannaschii Pyrococcus abyssi Pyrococcus horikoshii Thermoplasma acidophilum Thermoplasma volcanium Bacteria Aquifex aeolicus Chlamydia muridarum Chlamydia trachomatis Chlamydophila pneumoniae AR39 Chlamydophila pneumoniae CWL029 Chlamydophila pneumoniae J138 Synechocystis sp. PCC  Mycobacterium leprae Mycobacterium tuberculosis CDC  Mycobacterium tuberculosis H37Rv  Bacillus halodurans  Bacillus subtilis  Clostridium acetobutylicum  Mycoplasma genitalium  Mycoplasma pneumoniae  Mycoplasma pulmonis  Ureaplasma urealyticum Lactococcus lactis subsp. lactis  Staphylococcus aureus subsp. aureus Mu50  Staphylococcus aureus subsp. aureus N  Streptococcus pneumoniae R6   Streptococcus pneumoniae TIGR4   Streptococcus pyogenes M1 GAS   Caulobacter crescentus  Agrobacterium tumefaciens  Mesorhizobium loti   Sinorhizobium meliloti   Rickettsia conorii Rickettsia prowazekii Neisseria meningitidis MC  Neisseria meningitidis Z  Campylobacter jejuni  Helicobacter pylori  Helicobacter pylori J  Escherichia coli K  Escherichia coli O157:H7  Escherichia coli O157:H7 EDL  Yersinia pestis  Buchnera sp. APS Vibrio cholerae    Xylella fastidiosa 9a5c  Haemophilus influenzae Rd  Pasteurella multocida  Pseudomonas aeruginosa  Borrelia burgdorferi Treponema pallidum  Thermotoga maritima Deinococcus radiodurans                  

37 Prediction of protein functional associations
STRING - Search Tool Example - Yeasts mitochondrial glutaredoxin Grx5 >gi| | Grx5p [Saccharomyces cerevisiae] COG0278 MFLPKFNPIRSFSPILRAKTLLRYQNRMYLSTEIRKAIEDAIESAPVVLFMKGTPEFPKCGFSRATIGLL GNQGVDPAKFAAYNVLEDPELREGIKEFSEWPTIPQLYVNKEFIGGCDVITSMARSGELADLLEEAQALV PEEEEETKDR

38 Genome neighborhood for Grx (COG0278)

39 Coocurrence prediction for GRX (COG0278)


Download ppt "Bioinformatics Methods in University of Nebraska - Lincoln"

Similar presentations


Ads by Google