Genome Center of Wisconsin, UW-Madison

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
Profiles for Sequences
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Similar Sequence Similar Function Charles Yan Spring 2006.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Sequence alignment, E-value & Extreme value distribution
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Effect of gap penalty on Local Alignment Score:Score: 161 at (seq1)[2..36] : (seq2)[53..90] 2 ASTV----TSCLEPTEVFMDLWPEDHSNWQELSPLEPSD || | | |||||||||||||||||||||||||||
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Sackler Medical School
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
Sequencing Data Analysis
Bioinformatics and BLAST
Gene Annotation with DNA Subway
BLAST.
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basics of Comparative Genomics
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Condor: BLAST Tuesday, Dec 7th, 10:45am
Sequencing Data Analysis
Presentation transcript:

Genome Center of Wisconsin, UW-Madison Making Sense of DNA and protein sequence analysis tools (course #2) http://www.ncbi.nlm.nih.gov/Class/minicourses/ Dave Baumler Genome Center of Wisconsin, UW-Madison dbaumler@wisc.edu

Todays session an overview You have been given a 5 KB piece of DNA sequence GeneScan: find any exons in the DNA sequence and generate a predicted protein sequence ScanProsite: scan the protein sequence for domains/motifs/patterns found in the prosite database BLASTP: run a BLASTP search against the Swissprot database find some of the best matches (hits) and copy each protein sequence into a word doc for the alignment MultAlin: conduct protein sequence alignments from the BLASTP search

In this session you will try out 4 different tools, Lots of other tools exist http://bioinformatics.ca/links_directory/

Where are the coding regions? TCAGCGAAGATGAGATAGTTTTTAAAGGTGGGATTTCCCCACCTTTAAAAAGCGAGAAGTCCCGGTTTTAAAGAGGAGTAAAATCCTCTTTTTCTAGCCCACTCAGGTGGTTTTTTTGGTTTTCGCTCCTTGCCGCATCTTCTGTGCCTTTGATGGCGGCTGGTTGGGGTGAAAGGCTGCATATTCCAGAATTTCAGACAGTAGATTGTTTTTGAAATCTTCCGTTTTATCGTTGACGAACTTAACCATCCTGTTGAAATCATCTTCCTTTGATACACCTTCAGGAAATGCCTTAGGAACTGATGTTTGGCTATCCAAGGCATCTTGCAATATCTGCACGATCTCCGAATTCATTGATCGCCCATTGGCCTTTGCTCTGGCGGCAACTGCGTCACGCATACCGTCAGGCATCCTAACTGTAAATCTCTCAATGAAAGCTGGATCTTCTTTTTCAGTCATCATCTTAAACCATAAAAATTTATACAAAACACACTAGCATCATATTGACATTACCCACAATGACATCATAATGGTGTCAGGCATCAAAATGATGTCATCATGACAAGGGGAAAGTAAATGCAAGATGTTCTCTATACAGGTCGTAAGAACGACAGCTTTCAGCTTCGTCTGCCTGAGCGAATGAAAGAAGAGATCCGTCGCATGGCAGAGATGGACGGCATTTCGATTAATTCTGCAATCGTGCAGCGCCTTGCTAAAAGCTTGCGTGAGGAAAGAGTTAATGGGCAGTAAAAACAGCGAAGCCCGGAAGTGTGGGGACACTAACCGGGCTTCTAATGTCAGTTACCTAGCGGGAAACCAACAATGACCAGTATAGCAATCTTTGAAGCAGTAAACACTATCTCTCTTCCATTCCACGGACAGAAGATCATAACTGCGATGGTGGCGGGTGTGGCGTATGTGGCAATGAAGCCCATCGTGGAAAACATCGGTTTAGACTGGAAGAGCCAGTATGCCAAGCTCGTTAGTCAGCGTGAAAAGTTCGGGTGTGGTGATATCACCATACCTACCAAAGGTGGTGTTCAGCAGATGCTTTGCATCCCTTTGAAGAAACTGAATGGATGGCTCTTCAGCATTAACCCAGCAAAAGTACGTGATGCAGTTCGTGAAGGTTTAATTCGCTATCAAGAAGAGTGTTTTACAGCTTTGCACGATTACTGGAGCAAAGGTGTTGCAACGAATCCCCGGACACCGAAGAAACAGGAAGACAAAAAGTCACGCTATCACGTTCGCGTTATTGTCTATGACAACCTGTTTGGTGGATGCGTTGAATTTCAGGGGCGTGCGGATACGTTTCGGGGGATTGCATCGGGTGTAGCAACCGATATGGGATTTAAGCCAACAGGATTTATCGAGCAGCCTTACGCTGTTGAAAAAATGAGGAAGGTCTACTGATTGGCGTATTGGAAGGCGCAAAAAGAAAAGCCAGCAGATGGGCTGCTGGCATTCATTGGGTATATGAACTTTCGGAGAACATATGAAGTCAATTATCAAGCATTTTGAGTTTAAGTCAAGTGAAGGGCATGTAGTGAGCCTTGAGGCTGCAAGCTTTAAAGGCAAGCCAGTTTTTTTAGCAATTGATTTGGCTAAGGCTCTCGGGTACTCAAATCCGTCA

Genemark.hmm a statistical model

Exon prediction in Eukaryotic DNA using Genescan: Net result is a protein sequence GeneScan looks for start and stop codons, promoters, splice sites, polyA tails, provides statistics for coding potential

GeneScan results

GeneScan results

I have a protein sequence, now what? -Amos Bairoch, (creater of SWISS-PROT), created a collection of small well-conserved segments (patterns) to classify and analyze new proteins -PROSITE is the name he gave to this pattern database -PROSITE also contains profiles which describe every position of a protein family -ScanProsite is a server that compares your protein to the PROSITE database -if your protein contains a PROSITE pattern, it can give you a pretty clear indication of its function

What does it look for on the protein sequence? -profiles of protein families -conserved patterns in the sequence ([RK]-x-[ST]) -cofactor binding motifs -substrate binding motifs ScanProsite: Around the world there are ~8 other major collections of domains, such as Interproscan, CD server, or Pfscan

ScanProsite results continued

Sequence Similarity Searches using BLAST -If you have a region of sequenced DNA, and you want to know what the protein encoded does -If you can find similar sequences you can say, “if something is true for that sequence, it is probably true for mine as well.” -could take years in the lab, can take only seconds to search a database for similarity The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. This is an unknown gene sequence used in the next few slides ATGGAACTGACTCCAAGAGAAAAAGACAAACTATTACTGTTTACCGCTGCACTGCTGGCAGAGCGTCGTCTGGCCCGCGGCCTGAAACTTAACTATCCCGAATCCGTGGCCCTGATTAGCGCTTTTATAATGGAGGGCGCTCGCGACGGCAAAAGCGTCGCTGCGCTGATGGAAGAAGGACGGCATGTCCTGAGTCGCGAGCAGGTCATGGAAGGCATACCAGAAATGATCCCCGATATCCAGGTCGAAGCCACCTTTCCGGACGGCTCCAAGCTGGTTACCGTCCATAATCCGATAATCTGA

The different types of BLAST BLAST = Basic Local Alignment Search Tool “The most popular data mining tool ever” BLASTN DNA sequence vs. DNA sequence database BLASTP protein sequence vs. protein sequence database BLASTX DNA sequence translated in 6 reading frames vs. protein sequence database tBLASTX DNA sequence translated in 6 reading frames vs. DNA sequence database translated in 6 frames

Steps to use Blast #1) Paste sequence here #2) Choose search set (Either nucleotide collection or Protein Data Bank) #4 push blast button #3) select program to use

The number of sequences in the database Blast output #1 The number of sequences in the database Red, pink, and green are good matches The number of letters (base pairs) in the database This is the length of your query (in this case it was nucleotides)

How good is your BLAST hit? The bit score: a measure of the statistical significance of the score (The higher the score the better and matches <50 are unreliable) E-value: it is the number of times that your database match may have occurred by chance. The lower (closest to zero) the better, matches above 0.001 are close to the “twilight zone” Click here next to get to this genbank entry

A GenBank file Organism from which the sequence was characterized List of annotated features Product Structural annotation Function Name of the gene (ureC)

Once you find some protein sequences with BLAST, copy and paste in word or a text editor Note: each one will need a FASTA header with the organism name following as the first line

MultAlin: conduct protein sequence alignments from the BLASTP search B Asx Aspartic acid or Asparagine Z Glx Glutamine or Glutamic acid

Its your turn http://www.ncbi.nlm.nih.gov/Class/minicourses/ Choose Course #2: Making sense of DNA and protein sequences Questions to consider as they work through these exercises: #1) What aspects of the tools/resources are confusing or problematic? What questions do you think your students would have? #2) How can we design similar exercises for our classes that are more compelling? How can we make the students more engaged, invested and motivated to learn? #3 Group compilation of additional resources/websites that might be even better or more intuitive than the NCBI tools?