Identifying templates for protein modeling:

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Last lecture summary.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Searching Sequence Databases
BLAST (Basic Local Alignment Search Tool) In addition to the exact word, BLAST considers related words based on BLOSUM62: the neighborhood. Once a word.
Heuristic alignment algorithms and cost matrices
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
From Pairwise Alignment to Database Similarity Search.
Similar Sequence Similar Function Charles Yan Spring 2006.
From Pairwise Alignment to Database Similarity Search.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
BLAST : Basic local alignment search tool B L A S T !
Biology 224 Tom Peavy Sept 20 & 22, 2010
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Biology 4900 Biocomputing.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Sequence Alignment. MRCKTETGAR MRCGTETGAR % identity 90% CATTATGATA GTTTATGATT 70%
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
3/15/20161 BLAST : Basic local alignment search tools.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Chap. 4: Multiple Sequence Alignment
Introduction to Bioinformatics
Introduction to Bioinformatics DNA and Protein Database Searching BLAST: Basic local alignment search tool Xiaolong Wang College of Life Sciences Ocean.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
BLAST et BLAST avancé J.S. Bernardes/H. Richard
Courtesy of Jonathan Pevsner
Basic Local Alignment Sequence Tool (BLAST)
Lecture 3.1 BLAST.
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Chap. 5 Substitution Matrix
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Bioinformatics and BLAST
BLAST.
Sequence alignment, Part 2
Johns Hopkins School of Medicine
Basic Local Alignment Search Tool
Point Specific Alignment Methods
Basic Local Alignment Search Tool (BLAST)
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Searching Sequence Databases
Presentation transcript:

Identifying templates for protein modeling: Lecture 2 Identifying templates for protein modeling: Sequence alignment with BLAST and PSI-BLAST

Sources and additional information Images and other material in this presentation are taken from Bioinformatics and Functional Genomics third edition by Jonathan Pevsner, 2015 John Wiley & Sons, Inc. (http://pevsnerlab.kennedykrieger.org/) The lecture follows closely the contents of chapter 4 of Pevsner book, which contains an in-depth discussion of the issues covered during the lecture. For additional material, please go to the book website: http://www.bioinfbook.org

Sequence alignment with BLAST Webserver interface Aspects of the Algorithm Alignment strategies Detection of distant evolutionary relationships: PSI-BLAST

BLAST BLAST (Basic Local Alignment Search Tool) Scan large databases of sequences

Typical use identifying orthologs and paralogs discovering variants proteins Exploring structure-function relations

BLAST requires four choices Choose the query sequence Select the BLAST program Choose a database Select optional parameters

Web interface

How to get FASTA format for the query sequence

Five distinct BLAST programs blastn (nucleotide BLAST) blastp (protein BLAST) blastx (translated BLAST) tblastn (translated BLAST) tblastx (translated BLAST)

Some optional search parameters organism algorithm

Why low complexity filter? (a) Query: human insulin NP_000198 Program: blastp Database: C. elegans RefSeq Default settings: Unfiltered (“composition-based statistics”)

Why low complexity filter? (d) Query: human insulin NP_000198 Program: blastp Database: C. elegans RefSeq Option: Filter low complexity regions Different bit score !

BLAST search output

BLAST search output

BLAST search output

Sequence alignment with BLAST Webserver interface Aspects of the Algorithm Alignment strategies Detection of distant evolutionary relationships: PSI-BLAST

BLAST: what kind of alignment? Global alignment (Needleman & Wunsch 1970): Uses dynamic programming Gaps are inserted so that the total lengths of both sequences are aligned (“global”).

BLAST: what kind of alignment? Local alignment (Smith & Waterman, 1980): Just a portion of either sequence is aligned Useful to find matching domains in two sequences. BLAST finds a local alignment through a heuristic approach

How the BLAST works: three phases Phase 1: compile a list of word pairs (w=3) above threshold T Example: for a human RBP query …FSGTWYA… (query word is in yellow) A list of words (w=3) is: FSG SGT GTW TWY WYA YSG TGT ATW SWY WFA FTG SVT GSW TWF WYS

Phase 1: compile a list of words (w=3) and score them according to BLOSUM matrices GTW 6,5,11 22 neighborhood GSW 6,1,11 18 word hits ATW 0,5,11 16 > threshold NTW 0,5,11 16 GTY 6,5,2 13 GNW 10 neighborhood GAW 9 word hits < below threshold (T=11)

BLAST second phase Phase 2: Scan the database to find matches for the compiled list.

BLAST thrid phase extend extend Hit! KENFDKARFSGTWYAMAKKDPEG 50 query Phase 3: extend the hit in either direction (with Smith Waterman and scoring matrix). Stop when the score drops below some cutoff. KENFDKARFSGTWYAMAKKDPEG 50 query MKGLDIQKVAGTWYSLAMAASD. 44 hit extend extend Hit!

How to interpret a BLAST search: expect value It is important to assess the statistical significance of search results. For local alignments (including BLAST search results), the statistics are well understood. The scores follow an extreme value distribution (EVD) rather than a normal distribution.

E = Kmn e-lS E-value from extreme value distribution (number of high- scoring segment pairs expected to occur with a score of at least S) S = the score m, n = the length of two sequences l, K = Karlin Altschul statistics (empirical)

How to interpret BLAST: E values and p values Very small E values are very similar to p values. E values of about 1 to 10 are far easier to interpret than corresponding p values. E p 10 0.99995460 5 0.99326205 2 0.86466472 1 0.63212056 0.1 0.09516258 (about 0.1) 0.05 0.04877058 (about 0.05) 0.001 0.00099950 (about 0.001) 0.0001 0.0001000

Sequence alignment with BLAST Webserver interface Aspects of the Algorithm Alignment strategies Detection of distant evolutionary relationships: PSI-BLAST

A real match might have E value > 1 Where do we stop? running BLAST with a putative hit as a query might help to establish a threshold

Sometimes a similar E value occurs for a short exact match and long less exact match short, nearly exact long, only 31% identity, similar E value

Sequence alignment with BLAST Webserver interface Aspects of the Algorithm Alignment strategies Detection of distant evolutionary relationships: PSI-BLAST

PSI-BLAST is performed in five steps [1] Scan the protein database with a query [2] PSI-BLAST uses the hits to generate a multiple sequence alignment. The latter is used to initialize a position-specific scoring matrix (PSSM)

Inspect the blastp output to identify empirical “rules” regarding amino acids tolerated at each position R,I,K C D,E,T K,R,T N,L,Y,G

A R N D C Q E G H I L K M F P S T W Y V ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 20 amino acids all the amino acids from position 1 to the end of your PSI-BLAST query protein

A R N D C Q E G H I L K M F P S T W Y V ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0

A R N D C Q E G H I L K M F P S T W Y V ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 note that a given amino acid (such as alanine) in your query protein can receive different scores for matching alanine—depending on the position in the protein

A R N D C Q E G H I L K M F P S T W Y V ... 37 S 2 -1 0 -1 -1 0 0 0 -1 -2 -3 0 -2 -3 -1 4 1 -3 -2 -2 38 G 0 -3 -1 -2 -3 -2 -2 6 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 39 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -2 0 40 W -3 -3 -4 -5 -3 -2 -3 -3 -3 -3 -2 -3 -2 1 -4 -3 -3 12 2 -3 41 Y -2 -2 -2 -3 -3 -2 -2 -3 2 -2 -1 -2 -1 3 -3 -2 -2 2 7 -1 42 A 4 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 -2 0 note that a given amino acid (such as tryptophan) in your query protein can receive different scores for matching tryptophan—depending on the position in the protein

PSI-BLAST is performed in five steps [1] Scan the protein database with a query [2] PSI-BLAST uses the hits to generate a multiple sequence alignment. The latter is used to initialize a position-specific scoring matrix (PSSM) [3] The PSSM is used to score the alignments of the query with the database [4] Statistical significance (E values) are re-estimated on the basis of the new raw scores (from the PSSM)

Note the new entries: some hits bacame statistically significant with the PSSM

PSI-BLAST is performed in five steps [1] Scan the protein database with a query [2] PSI-BLAST uses the hits to generate a multiple sequence alignment. The latter is used to initialize a position-specific scoring matrix (PSSM) [3] The PSSM is used to score the alignments of the query with the database [4] Statistical significance (E values) are re-estimated on the basis of the new raw scores (from the PSSM) [5] Iterate through [3] and [4 until convergence (only in principle, in practice two or three times)

“Rate of Convergence” of PSI-BLAST searches # hits Iteration # hits > threshold 1 104 49 2 173 96 3 236 178 4 301 240 5 344 283 6 342 298 7 378 310 8 382 320