Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Similar Sequence Similar Function Charles Yan Spring 2006.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Introduction to Bioinformatics BLAST. Introduction –What is BLAST? –Query Sequence Formats –What does BLAST tell you? Choices –Variety of BLAST –BLAST.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Comp. Genomics Recitation 3 The statistics of database searching.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Doug Raiford Phage class: introduction to sequence databases.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
What is sequencing? Video: WlxM (Illumina video) WlxM.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Identifying templates for protein modeling:
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basics of Comparative Genomics
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Tutorial 3 BLAST 1

BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2

BLAST program Database Query BLAST What is BLAST? Basic Local Alignment Search Tool Set of similarity search programs for exploring sequence databases. 3

Why perform a similarity search? Find genes/proteins with possibly similar function Find the origin of a sequence (what organism it is taken form) Different degrees of similarity can be found in database search 4

Query type Database type blastnGenomic blastpProteomic blastxTranslated genomicProteomic tblastnProteomicTranslated genomic tblastxTranslated genomic BLAST Databases 5 Genomic: A T G C Proteomic: G A S T C V L I M P F Y W D E N Q H K R Translated genomic: The query is genomic, translated to protein using 6 possible reading frames ATGCCGTTC -> MPF, CR, AV

6

Place Query Choose Database ? 7 Job title – helpful when running multiple runs In case you want to restrict to a specific organism In case you want to eliminate specific sequences Query and DB parameters

How to choose the database? A good place to start if you don’t know what you’re looking for nr/nt : non-redundant nucleotide 8 Depends on what you’re looking for…

Alignment parameters 9 Optimizes the parameters for the desired similarity level of the search

10 Alignment parameters Threshold for results significance Primary word match (16-64 nt) Scores of matching and mismatching bases Cost to create and extend a gap

11 How to interpret BLAST results?

Search for homologous to chick “olfactory receptor 6” gene 12

Search results 13

14 Query sequence Matched sequences from DBs Graphic Summary

15 Descriptions Sequence Identifier + link Sequence description Score(bits) %Coverage %Identity E value

16 Descriptions Query covered=55% Only 55% of the query is covered => ~230 bp Identity=71% Out of the 230 bp of alignment only 71% was of matches

17 Alignments Query info Alignment info Alignment

It is possible to get multiple hits per sequence 18

E-values and scores 19

Score vs. E-value The score is a measure of the similarity of the query to the a sequence from the database. The E-value is a measure of the reliability of the score. The definition of the E-value is: The number of expected alignments with observed score or higher due to chance. 20

Score vs. E-value Score (S) =  (identities + mismatches) -  gaps Depends on search space Query length(bp) Effective length (total number of bases) of the database(bp) Depends on scoring system Score Bit Score (S’): 21 E-values cannot be compared across different DBs, even if the score is the same. ‘

Intuition for “significance” Think of the query as a ball, each color represents a part of the sequence. The DB is a pool of colored balls. If the ball has many colors (longer query) – there is a higher probability to see the same color in the pool by chance. If the pool of balls is very big, there is a higher probability to see one of the balls colors in the pool. 22

The typical threshold for a good E-value from a BLAST search is E=10 -6 ≈e -6 or lower. This does not mean that higher E-values are given for queries with no biological significance. 23 E-value Threshold feature=iv&annotation_id=annotation_234259

E-value vs. P-value

Exercise 25

Find homologs for CFTR gene in human 26 You can put the gene ID rather than the sequence Human DB only We’ll start with high similarity

27

28 Now change to more distinct sequences

29 We get more results

Find homologs for CFTR gene in other organisms 30 Not only human sequences

31

32 Where to run a nucleotide sequence - blastn or blastx ? blastn (genomic vs. genomics) blastx (translated genomics vs. proteomic) ncRNA If you know your sequence is a protein – blastx is better, since you will get more reliable results.

Cool Story of the day How Alzheimer is studied in yeast

Alzheimer's disease (AD) Alzheimer's disease leads to nerve cell death and tissue loss throughout the brain. Symptoms can include confusion, aggression, trouble with language, and long term memory loss. Gradually, bodily functions are lost, ultimately leading to death. There are no available treatments that stop or reverse the progression of the disease. The disease is associated with plaques and tangles in the brain

How can AD be studied in yeast? Yeast cells lack the specialized processes of neuronal cells and the cell-cell communications that modulate neuropathology. However, the most fundamental features of eukaryotic cell biology evolved before the split between yeast and metazoans. 35 Treusch et al. Science (2011)

36 Thinakaran et al JOURNAL OF BIOLOGICAL CHEMISTRY 2008

Susan Linquist’s lab showed it was toxic when expressed in yeast. Later they tested the affect of this protein on rat neuron cells and in C.elegans neurons. To recapitulate this multicompartment trafficking in yeast, we fused an endoplasmic reticulum (ER) targeting signal to the N terminus of Ab Treusch et al. Science (2011)

38 Treusch et al. Science (2011) Wild-type worms invariably have five glutamatergic neurons in their tails.