Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson
Objectives: Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). Specific operations: –Use of BLAST to find similar sequences (protein & nucleotide) –Downloading and saving sequences –Comparison of sequences and alignment with ClustalW –Interpretation of phylogenetic data.
The “test” protein sequence: AAA92063AAA cysteinyl endopep...[gi: ] LOCUS AAA aa linear PLN 22-AUG-2002 DEFINITION cysteinyl endopeptidase [Vigna radiata]. ACCESSION AAA92063 VERSION AAA GI: DBSOURCE locus VRU49445 accession U U KEYWORDS. SOURCE Vigna radiata ORGANISM Vigna radiataVigna radiata Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna. REFERENCE 1 (residues 1 to 362) AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences, State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY , USA
Student given VRU49445 sequence (only) via or Blackboard Find sequence via Entrez, download in Fasta format VRU49445 sequence Submit to Protein-Protein BLAST (BLASTP) BLASTP results – related sequences
Score E Sequences producing significant alignments: (bits) Value gi| |gb|AAA | cysteinyl endopeptidase [Vigna ra gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo gi|445927|prf|| A Cys endopeptidase gi| |gb|AAA |705 gi|118158|sp|P12412|CYSP_VIGMU686 gi|445927|prf|| A684 gi| |pir||S22502gi| |pir||S22502 cysteine proteinase (EC ) gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo gi| |emb|CAA | endopeptidase (EP-C1) [Phaseolus gi| |dbj|BAC | cysteine proteinase [Glycine ma gi| |dbj|BAC | cysteine proteinase [Glycine ma gi| |pir||T08122 cysteine endopeptidase (EC ) e-164 gi|600111|emb|CAA | cysteine proteinase [Vicia sativa] 540 e-152 gi| |emb|CAA | pre-pro-TPE4A protein [Pisum sat e-152 gi| |ref|NP_ | cysteine proteinase [Arabidops e-147 gi| |dbj|BAC | cysteine protease-2 [Helianthus e-145 gi| |pir||S49166 cysteine proteinase (EC ) pr e-143 gi| |pir||T06708 cysteine proteinase (EC ) T e-137 gi| |sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu e- 137 gi| |pir||JC7787 carrot seed cysteine proteinase (EC e-136 gi| |ref|NP_ | cysteine proteinase, putative e-135 gi| |gb|AAB | cysteine proteinase 470 e-131 gi| |gb|AAD |AF133839_1 papain-like cysteine pr e-129 gi| |ref|NP_ | cysteine proteinase, putative e gi|544129|sp|P25803|CYSP_PHAVU674 gi| |emb|CAA |673 gi| |dbj|BAC |657 gi| |dbj|BAC |653 gi| |pir||T gi|600111|emb|CAA |540 gi| |emb|CAA |539 gi| |ref|NP_ |521 gi| |dbj|BAC |516 gi| |pir||S gi| |pir||T gi| |sp|P43156|CYSP_HEMSP490gi| |pir||JC gi| |ref|NP_ |483 gi| |gb|AAB |470 gi| |gb|AAD |AF133839_1462 gi| |ref|NP_ |462
BLASTP results – related sequences Copy most similar cDNA sequences (in FASTA format) cDNA sequences from P. vulgaris, V. mungo, G. max, V. sativa, etc. Submit sequences to CLUSTALW at Biology Workbench website.
gi_118158_sp_P12412_CYSP_VIG MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS gi_ _gb_AAA __cy MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS gi_ _dbj_BAC __ MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS gi_ _dbj_BAC __ MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS gi_600111_emb_CAA __cy MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT gi_118158_sp_P12412_CYSP_VIG RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _gb_AAA __cy RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _dbj_BAC __ RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _dbj_BAC __ RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_600111_emb_CAA __cy RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA gi_118158_sp_P12412_CYSP_VIG GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_ _gb_AAA __cy GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_ _dbj_BAC __ GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG gi_ _dbj_BAC __ GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG gi_600111_emb_CAA __cy DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG Alignment of the Cysteine Proteases from Vigna, Phaseolus, Glycine, and Vicia.
Unrooted Phylogenetic Tree
Add more sequences (e.g. of non- legumes) and see how tree changes? Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results.
Possible Additions: Add more sequences (e.g. of non- legumes) and see how tree changes? Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results with those from protein sequences.
Compare the nucleotide sequences of the cDNA and gene pairs where available – exons/introns? ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGA ACGTGTGACGAATCAAAGGTG ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–- ACCTGTGATGCATCAAAGGTG AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA AATGACCTAGCTGTGTCAATTGATGGTCATGAAA TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA AATGACCTAGCTGTGTCAATTGATGGCCATGAGA AATGACCTAGCTGTGTCAATTGATGGCCATGAGA ************************** ***** *
Examine targeting of cysteine protease – e.g. with TargetP or PSORT. PSORT : With AAA92063 (Vigna radiata cysteine protease): endoplasmic reticulum (lumen) --- Certainty= 0.910(Affirmative) outside --- Certainty= 0.719(Affirmative) lysosome (lumen) --- Certainty= 0.190(Affirmative) endoplasmic reticulum (membrane) --- Certainty= 0.100(Affirmative)