Introduction to Bioinformatics - Tutorial no. 2 BLAST.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Introduction to bioinformatics
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence alignment, E-value & Extreme value distribution
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Copyright OpenHelix. No use or reproduction without express written consent1.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
School B&I TCD Bioinformatics Database homology searching May 2010.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Part I: Identifying sequences with … Speaker : S. Gaj Date
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Nucleotide Sequence Analysis 1 Part I [web page]web page Osvaldo Graña CNIO Bioinformatics Unit March 2013.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
What is sequencing? Video: WlxM (Illumina video) WlxM.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
Basic Local Alignment Sequence Tool (BLAST)
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
BLAST.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Introduction to Bioinformatics - Tutorial no. 2 BLAST

BLAST – Outline Sequence Alignment Complexity and indexing BLASTN and BLASTP  Basic parameters PAM and BLOSUM matrices Affine gap model E Values (once again)

Advanced BLAST Databases BLAST options BLAST output Taxonomic BLAST Pairwise BLAST

NameQuery typeDatabase blastnGenomic blastpProtein blastxTranslated genomicProtein tblastnProteinTranslated genomic tblastxTranslated genomic Genomic translations test all 6 possibilities: 3x for codon frames, 2x for reverse complement BLAST Variations

BLASTN Databases nr GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq) htgsHigh-throughput genomic sequences (draft) patPatented nucleotide sequences mitoMitochondrial sequences vectorVector subset of GenBank monthGenBank, EMBL, DDBJ, PDB from 30 days chromContigs and chromosomes from RefSeq

BLASTP Databases nr GenBank CDS translations, RefSeq, PDB, SWISS-PROT, PIR, PRF swissprotSWISS-PROT patPatented protein sequences pdbProtein Data Bank month GenBank CDS translations, PDB, SWISS-PROT, PIR, PRF from 30 days

BLASTN/P Options (1) Only search part of database using NCBI Entrez query format Search specific organism Remove low information content, e.g. short repeats or rich in only 2 nucleotides Remove known human repeats (LINEs, SINEs)

BLASTN/P Options (2) Threshold for results significance Use index based on words of 7, 11 or 15 nucleotides Costs to open and extend gap, score for nucleotide match or mismatch. Allowed gap scores: 10/1, 10/2, 11/1, 8/2, 9/2

BLASTP Options Scoring matrix: PAM, etc… Search for a motif (PSI-BLAST) Costs to open and extend gap

BLASTN/P Formatting (1) Show colored bar chart Number of sequences listed Number of alignments shown Other (less important) options on what to show

BLASTN/P Formatting (2) How to display alignments Only show results which match Entrez search or are from specific organism Only show results with E values in this range

BLASTN Results Query sequence representation Matched areas of database sequences

BLAST Output Header Request ID for later retrieval Query sequence details Database details Tax BLAST

BLAST Alignments (1) Sequence Identifier Sequence description Score and E value

BLAST Alignments (2) Normalized score of alignment Expected number of such hits (2e-11 = 2  ) Number of exact matches Number of matches with positive score Number of insertion / deletions

BLAST Alignments (3) Query sequenceExact matchInsertion / deletion Matched sequence Mismatch with positive score Position within sequence Masked low complexity region

Expectation Values Increases linearly with length of query sequence Increases linearly with length of database Decreases exponentially with score of alignment

Tax BLAST Lineage of organism with strongest hit Score of organism’s strongest hit Number of organism hits Shared ancestry in taxonomic tree

BLAST2SEQ Scoring scheme Type of program Gap model, Expect Value, Advanced options Sequences Scoring matrix Sequences GO ! This tool produces the alignment of two given sequences using BLAST engine for local alignment.BLAST

Questions You have two query sequences: query1 and query2: >query1 CCGTCCGTCCGTCGTCCTCCTCGCTTGCGGGGCGCCGGGCCCGTCCTCGAGCCCCCNNNNNCCGTCCGGC CGCGTCGGGGCCTCGCCGCGCTCTACCTACCTACCTGGTTGATCCTGCCAGTAGCATATGCTTGTCTCAA AGATTAAGCCATGCATGTCTAAGTACGCACGGCCGGTACAGTGAAACTGCGAATGGCTCATTAAATCAGT TATGGTTCCTTTGGTCGCTCGCTCCTCTCCTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCC GACGGGCGCTGACCCCCTTCGCGGGGGGGATGCGTGCATTTATCAGATCAAAACCAACCCGGTCAGCCCC TCTCCGGCCCCGGCCGGGGGGCGGGCCGCGGCGGCTTTGGTGACTCTAGATAACCTCGGGCCGATCGCAC GCCCCCCGTGGCGGCGACGACCCATTCGAACGTCTGCCCTATCAACTTTCGATGGTAGTCGCCGTGCCTA CCATGGTGACCACGGGTGACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCAC ATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGACCCGGGGAGGTAGTGACGAAAAATAACAA TACAGGACTCTTTCGAGGCCCTGTAATTGGAATGAGTCCACTTTAAATCCTTTAACGAGGATCCATTGGA GGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGCTGCAGTTAA AAAGCTCGTAGTTGGATCTTGGGAGCGGGCGGGCGGTCCGCCGCGAGGCGAGCCACCGCCCGTCCCCGCC CCTTGCCTCTCGGCGCCCCCTCGATGCTCTTAGCTGAGTGTCCCGCGGGGCCCGAAGCGTTTACTTTGAA AAAATTAGAGTGTTCAAAGCAGGCCCGAGCCGCCTGGATACCGCAGCTAGGAATAATGGAATAGGACCGC GGTTCTATTTTGTTGGTTTTCGGAACTGAGGCCATGATTAAGAGGGACGGCCGGGGGCATTCGTATTGCG CCGCTAGAGGTGAAATTCTTGGACCGGCGCAAGACGGACCAGAGCGAAAGCATTTGCCAAGAATGTTTTC ATTAATCAAGAACGAAAGTCGGAGGTTCGAAGACGATCAGATACCGTCGTAGTTCCGACCATAAACGATG CCGACCGGCGATGCGGCGGCGTTATTCCCATGACCCGCCGGGCAGCTTCCGGGAAACCAAAGTCTTTGGG TTCCGGGGGGAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAG CCTGCGGCTTAATTTGACTCAACACGGGAAACCTCACCCGGCCCGGACACGGACAGGATTGACAGATTGA TAGCTCTTTCTCGATTCCGTGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTA ATTCCGATAACGAACGAGACTCTGGCATGCTAACTAGTTACGCGACCCCCGAGCGGTCGGCGTCCCCCAA CTTCTTAGAGGGACAAGTGGCGTTCAGCCACCCGAGATTGAGCAATAACAGGTCTGTGATGCCCTTAGAT GTCCGGGGCTGCACGCGCGCTACACTGACTGGCTCAGCGTGTGCCTACCCTACGCCGGCAGGCGCGGGTA ACCCGTTGAACCCCATTCGTGATGGGGATCGGGGATTGCAATTATTCCCCATGAACGAGGAATTCCCAGT AAGTGCGGGTCATAAGCTTGCGTTGATTAAGTCCCTGCCCTTTGTACACACCGCCCGTCGCTACTACCGA TTGGATGGTTTAGTGAGGCCCTCGGATCGGCCCCGCCGGGGTCGGCCCACGGCCTGGCGGAGCGCTGAGA AGACGGTCGAA

Questions >query2 TACGAACGCTGGCGGCATGCTAATACATGCAAGTCGAACGAGACCTTCGGGTCTAGTGGCGCACGGGTGG CTAACGCGTGGGAATCTGCCCTTGGGTTCGGAATAACTTCGGGAAACTGAAGCTAATACCGGATGATGAC GAAAGTCCAAAGATTTATCGCCCAGGGATGAGCCCGCGTAGGATTAGCTAGTTGGTGGGGTAAAGGCTCA CCAAGGCAACGATCCTTAGCTGGTCTGAGAGGATGATCAGCCACACTGGGACTGAGACACGGCCCAGACT CCTACGGGAGGCAGCAGTAGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGAGTG ATGAAGGCCTTAGGGTTGTAAAGCTCTTTTACCCGAGATGATAATGACAGTATCGGGAGAATAAGCTCCG GCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGAGCTAGCGTTGTTCGGAATTACTGGGCGTAAAG CGCACGTAGGCGGCGATTTAAGTCAGAGGTGAAAGCCCGGGCTCAACCCCGAACTGCCTTTGAGACTGGA TTGCTAGAATCTTGGAGAGGCGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAAC ACCAGTGCGAAGGCGGCTCGCTGGACAAGTATTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGGAT TAGATACCCTGGTAGTCCACGCCGTAAACGATGATAACTAGCTGCCGGGGCACATGGTGTTTCGGTGGCG CACGTAACGCATTAAGTTATCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGG GCCTGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGCAGAACCTTACCAGCGTTTGACATC CTCATCGCGGATTTCAGAGATGATTTCCTTCAGTTCGGCTGGATGAGTGACAGGTGCTGCATGGCTGTCG TCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGCCTTTAGTTGCCAGCAT TTAGTTGGGTACTCTAAAGGAACCGCCGGTGATAAGCCGGAGAAGGTGGGGATGACGTCAAGTCCTCATG GCCCTTACGCGCTGGGCTACACACGTGCTACAATGGCGACTACAGTGGGCTGCAACCGTGCGAGCGGTAG CTAATCTCCAAAAGTCGTCTCAGTTCGGATTGTTCTCTGCAACTCGAGAGCATGAAGGCGGAATCGCTAG TAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCAGGCCTTGTACACACCGCCCGTCACACCATGGG ATTTGGATTCACCCGAAGGCACTGCGTTAACCCGCAAGGGAGACAGGTGACCACGGTGGGTTTAGAGACT GGGGTGAA

Questions Using BLASTN Find what do each one of these sequences code for.

Questions

To which organism each sequence is related? Do these sequences code for proteins? Pretend the information for answering previous questions is not available to you could you suggest a way to answer these questions anyway? BLAST X

Questions Look carefully at the e-value column of the first 50 results of each query. What can you learn about these sequences? Are these sequences generally conserved between other organisms? 5 last answers

Questions Use bl2seq to align the two query sequences. What can you say about the relation between them? Based does this last result make sense?

Questions You have two query sequences. >query3 ATGTCTGCTCCACAAGCCAAGATTTTGTCTCAAGCTCCAACTGAATTGGAATTACAAGTT GCTCAAGCTTTCGTTGAATTGGAAAATTCTTCTCCAGAATTGAAAGCTGAGTTGAGACCT TTGCAATTCAAGTCCATCAGAGAAGT >query4 GTATGTTATTAATTTGAATCTAAACTTAAGAATAATGGAGAGTAACAAAGGAAAAAAGTG TGAACGGGACGATACCAGAATGTTTCAATCTAGAAAAGTATAAAAGATAAGGACTAGGAC TCAAATGTATTTGGCTGACTATCGCCTGAACCTTGATGCTAAGCAAATACCATATCTTCA AGAAAAAGCCTACTCCAGTGTTTAAGAAGAAGGGAACGATTTACTAGATCATGCTATACG CAGTAAGGTTCTGATAGTTAATTACAATCGGTCCAAGTTCTAAGCGGTGTCGTCCATGCA TATATCATTTACAAGTTACTGGCGTCAACTCTTCAAATATTCAAAATATCACCTAATCAA ACTTACTAACATTTTCCTTTTTTGTTTTCCTTCTTTTATAG Now use BlastX To what protein does these sequences code for? are these proteins conserved in other organisms?

Questions Now use BlastX To what protein does these sequences code for? are these proteins conserved in other organisms? A conserved protein component of the small (40S) subunit of S. cerevisiae. Query 3Query 4 No protein – e-value 3.2

Questions You are told that the sequences were extracted from the same gene. How could you explain the above results? Answer: query4 is extracted from a non-coding region (intron) and thus doesn’t code for any protein.