(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Analysis of Biomolecular Sequences 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
BLAST Sequence alignment, E-value & Extreme value distribution.
Sequence Similarity Searching Class 4 March 2010.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
1 BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman.
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence alignment, E-value & Extreme value distribution
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST Workshop Maya Schushan June 2009.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
School B&I TCD Bioinformatics Database homology searching May 2010.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Homology Modeling Workshop GHIKLSYTVNEQNLKPERFFYTSAVAIL.
Manually Adjusting Multiple Alignments Chris Wilton.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Automatic and manual sequence alignment Inferring phylogenetic trees Mining web-based databases Estimating rates of molecular evolution Testing evolutionary.
Copyright OpenHelix. No use or reproduction without express written consent1.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
PatchFinder. The ConSurf web-server calculates the evolutionary rate for each position in the protein. Surface clusters of spatially close & conserved.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Using BLAST to Identify Species from Proteins
Basics of BLAST Basic BLAST Search - What is BLAST?
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Basic Local Alignment Search Tool (BLAST)
Multiple sequence alignment & Phylogenetics Analysis
Basic Local Alignment Search Tool (BLAST)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

(PSI-)BLAST & MSA via Max-Planck

Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt or Uniprot (recommended!) How many? As many as possible, as long as the MSA looks good (next week…) General Issues

How long? (length of homologues) Fragments- short homologues (less than 50,60% the query’s length) = bad alignment Ensure your sequences exhibit the wanted domain(s) N/C terminal tend to vary in length between homologues How close? (distance from query sequence) All too close- no information Too many too far- bad alignment Ensure that you have a balanced collection! General Issues

From who? (which species the sequence belongs to) Don’t care, all homologues are welcome Orthologues/paralogues may be helpful Sequences from distant/close species provide different types of information Which method? (BLAST/PSI-BLAST) Depends on the protein, available homologues, the goal in mind… General Issues

Rules For Choosing Sequences Very similar sequences have little information Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set Choose sequences as distantly related as possible Sequence between 30-80% identical with more than half of the sequences in the set The more sequences the better General Issues

Overall work steps 1.Run the search- 1.Select database 2.E-value threshold 3.BLAST or PSI-BLAST- how many rounds? 2.Take out sequences- HSP (slider region) or full sequences 3.Align sequences- choose alignment program 4.View alignment with BioEdit tor another program 5.Calculate trees, conservation scores (ConSurf) etc…

(PSI-)BLAST via Max-Planck Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA. All BLAST programs Main advantage- you can easily extract and filter the HSPs, on top of full sequences

The Query Protein Name: Dihydrodipicolinate reductase Enzyme reaction: Molecular process: Lysine biosynthesis (early stages) Organism: E. coli Sequence length: 273 aa

Query: DAPB_ECOLI >DAPB_ECOLI MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAV KDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLL EKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATV RAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL The Query Protein

(PSI-)BLAST via Max-Planck Choose database or databases (selecting a few using CTRL) Upload sequence or MSA

(PSI-)BLAST via Max-Planc

(PSI-)BLAST via Max-Planck E-value threshold can be assessed using the distribution

Forward results to MSA

Forward results to MSA All marked hits or filter by e-value HSP (sider region) or full sequences

Forward results to MSA

Align via Max-Planck Alignment results: Save the alignment

Alignmen viewing & editing BioEdit Easy-to-use sequence alignment editor View and manipulate alignments up to 20,000 sequences. F our modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor. Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats

Easiest Using Bioedit Alignment viewing & editing

Easiest Using Bioedit Find a specific sequence: “Edit-> search -> in titles” Erase\add sequences: “Edit-> cut\paste\delete sequence” “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment. After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps. Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP” Alignment viewing & editing

A little of ConSurf Compute Conservation Scores Give an MSA or will compute one for you (given a FASTA sequence, BLAST & MSA) Main advantage: filters short HSPs, removes redundant sequences Shows conservation scores on sequence or on a protein structure (if available)

ConSurf

ConSurf

ConSurf

ConSurf MSA colored by conservation PSI-BLAST result MSA Phylogenetic tree Sequences used Sequence conservation

ConSurf

Jmol- Easy web-based viewer

WebLogo

WebLogo

Each sequence is a different story  adjust parameters: BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap… PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds… Try using HSP or full sequences, different MSA programs… No “Miracle solution” 