Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
On line (DNA and amino acid) Sequence Information Lecture 7.
BIOINFORMATICS Ency Lee.
Bioinformatics and Chips Bioinformatics is a very integral part of each step in a chip project. Bioinformatics is a very integral part of each step in.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Phage? New Sequence Horizontal Transfer Molecular Evolution.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics and Phylogenetic Analysis
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics for your classroom Seth Bordenstein Discover the Microbes Within! March 12, 2006 NCBI BLAST 1. No programming skills needed 2.Familiarity.
How to use the web for bioinformatics Ethan Strauss X 1171
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
A Tool for Supporting Integration Across Multiple Flat-File Datasets Xuan Zhang, Gagan Agrawal Ohio State University.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Bee Venom Lab Anu Murphy. Introduction We will use web-based biological tools to study various bee venom toxins. Outline of presentation: –Which toxins.
Introduction Advances in 2D gel techniques Mass spectrometry in proteomics Edman Degradation Identification of proteins Peptide proteome Membrane proteins.
Bioinformatics.
An Introduction to Bioinformatics
Gene Expression Omnibus (GEO)
Essential Bioinformatics and Biocomputing Module (Tutorial) Biological Databases Lecturer: Chen Yuzong Jan 2003 TAs: Cao Zhiwei Lee Teckkwong, Bernett.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
(1) Access the Oryzabase (1) Access the Oryzabase (2) Click the.
Adding GO GO Workshop 3-6 August GOanna results and GOanna2ga 2. gene association files 3. getting GO for your dataset 4. adding more GO (introduction)
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
STEPHANIE HINTZEN BIOL 471 SIV and HIV: Differences in Diversity and Divergence.
Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree.
Motif discovery and Protein Databases Tutorial 5.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
>gi| |gb|AAB | ADP-glucose pyrophosphorylase large subunit [Oryza sativa] 02-AUG-1996 Gene accession U66041 Plant Physiol. 112, 1399 (1996)
CTCAAGGGGTNAGNNNTNTNAAAGNTGCCNTTCCAAAGNTNNGNNNANNACNNTTGGCCGAGAACTTNGNNG GGGNTNANTNNNATATTCCNATTTTGCCTAATACNANGCTTGATANTTTCCGTTTNNTCNCACCTGGGNNCNNNT AATCGGATGNNGGACANANCAANGCGGGCCTTCACCCCATCNTGGNGGNCCNTNNGNCCNTTTNGCCANTCNC.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Bioinformatics and Computational Biology
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
Automatic and manual sequence alignment Inferring phylogenetic trees Mining web-based databases Estimating rates of molecular evolution Testing evolutionary.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Bioinformatics. History Margaret Dayhoff, 1965: Atlas of Protein Sequence and Structure Brookhaven, 1970s: Protein Data Bank (PDB) Needleman & Wunsch,
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Gene_identifier color_no gtm1_mouse 2 gtm2_mouse 2 >fasta_format_description_line >GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (GSTM1-1) PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKI.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Introducing Bioinformatics Using the Nitrogen Cycle Alyssa Bumbaugh Ron Peck Mark Radosevich.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
What is BLAST? Basic BLAST search What is BLAST?
Wolbachia Bioinformatics
Bioinformatics for your classroom
Basics of BLAST Basic BLAST Search - What is BLAST?
What is Bioinformatics?
BLAST.
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Basic Local Alignment Search Tool (BLAST)
Multiple sequence alignment & Phylogenetics Analysis
Biology WorkBench David Shiuan Department of Life Science,
BLAST Slides adapted & edited from a set by
How to search NCBI.
BLAST Slides adapted & edited from a set by
Presentation transcript:

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson

Objectives: Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). Specific operations: –Use of BLAST to find similar sequences (protein & nucleotide) –Downloading and saving sequences –Comparison of sequences and alignment with ClustalW –Interpretation of phylogenetic data.

The “test” protein sequence: AAA92063AAA cysteinyl endopep...[gi: ] LOCUS AAA aa linear PLN 22-AUG-2002 DEFINITION cysteinyl endopeptidase [Vigna radiata]. ACCESSION AAA92063 VERSION AAA GI: DBSOURCE locus VRU49445 accession U U KEYWORDS. SOURCE Vigna radiata ORGANISM Vigna radiataVigna radiata Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna. REFERENCE 1 (residues 1 to 362) AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences, State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY , USA

Student given VRU49445 sequence (only) via or Blackboard Find sequence via Entrez, download in Fasta format VRU49445 sequence Submit to Protein-Protein BLAST (BLASTP) BLASTP results – related sequences

Score E Sequences producing significant alignments: (bits) Value gi| |gb|AAA | cysteinyl endopeptidase [Vigna ra gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo gi|445927|prf|| A Cys endopeptidase gi| |gb|AAA |705 gi|118158|sp|P12412|CYSP_VIGMU686 gi|445927|prf|| A684 gi| |pir||S22502gi| |pir||S22502 cysteine proteinase (EC ) gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo gi| |emb|CAA | endopeptidase (EP-C1) [Phaseolus gi| |dbj|BAC | cysteine proteinase [Glycine ma gi| |dbj|BAC | cysteine proteinase [Glycine ma gi| |pir||T08122 cysteine endopeptidase (EC ) e-164 gi|600111|emb|CAA | cysteine proteinase [Vicia sativa] 540 e-152 gi| |emb|CAA | pre-pro-TPE4A protein [Pisum sat e-152 gi| |ref|NP_ | cysteine proteinase [Arabidops e-147 gi| |dbj|BAC | cysteine protease-2 [Helianthus e-145 gi| |pir||S49166 cysteine proteinase (EC ) pr e-143 gi| |pir||T06708 cysteine proteinase (EC ) T e-137 gi| |sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu e- 137 gi| |pir||JC7787 carrot seed cysteine proteinase (EC e-136 gi| |ref|NP_ | cysteine proteinase, putative e-135 gi| |gb|AAB | cysteine proteinase 470 e-131 gi| |gb|AAD |AF133839_1 papain-like cysteine pr e-129 gi| |ref|NP_ | cysteine proteinase, putative e gi|544129|sp|P25803|CYSP_PHAVU674 gi| |emb|CAA |673 gi| |dbj|BAC |657 gi| |dbj|BAC |653 gi| |pir||T gi|600111|emb|CAA |540 gi| |emb|CAA |539 gi| |ref|NP_ |521 gi| |dbj|BAC |516 gi| |pir||S gi| |pir||T gi| |sp|P43156|CYSP_HEMSP490gi| |pir||JC gi| |ref|NP_ |483 gi| |gb|AAB |470 gi| |gb|AAD |AF133839_1462 gi| |ref|NP_ |462

BLASTP results – related sequences Copy most similar cDNA sequences (in FASTA format) cDNA sequences from P. vulgaris, V. mungo, G. max, V. sativa, etc. Submit sequences to CLUSTALW at Biology Workbench website.

gi_118158_sp_P12412_CYSP_VIG MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS gi_ _gb_AAA __cy MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS gi_ _dbj_BAC __ MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS gi_ _dbj_BAC __ MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS gi_600111_emb_CAA __cy MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT gi_118158_sp_P12412_CYSP_VIG RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _gb_AAA __cy RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _dbj_BAC __ RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_ _dbj_BAC __ RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_600111_emb_CAA __cy RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA gi_118158_sp_P12412_CYSP_VIG GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_ _gb_AAA __cy GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_ _dbj_BAC __ GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG gi_ _dbj_BAC __ GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG gi_600111_emb_CAA __cy DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG Alignment of the Cysteine Proteases from Vigna, Phaseolus, Glycine, and Vicia.

Unrooted Phylogenetic Tree

Add more sequences (e.g. of non- legumes) and see how tree changes? Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results.

Possible Additions: Add more sequences (e.g. of non- legumes) and see how tree changes? Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results with those from protein sequences.

Compare the nucleotide sequences of the cDNA and gene pairs where available – exons/introns? ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGA ACGTGTGACGAATCAAAGGTG ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–- ACCTGTGATGCATCAAAGGTG AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA AATGACCTAGCTGTGTCAATTGATGGTCATGAAA TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA AATGACCTAGCTGTGTCAATTGATGGCCATGAGA AATGACCTAGCTGTGTCAATTGATGGCCATGAGA ************************** ***** *

Examine targeting of cysteine protease – e.g. with TargetP or PSORT. PSORT : With AAA92063 (Vigna radiata cysteine protease): endoplasmic reticulum (lumen) --- Certainty= 0.910(Affirmative) outside --- Certainty= 0.719(Affirmative) lysosome (lumen) --- Certainty= 0.190(Affirmative) endoplasmic reticulum (membrane) --- Certainty= 0.100(Affirmative)