Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
COFFEE: an objective function for multiple sequence alignments
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Bioinformatics and Phylogenetic Analysis
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Tutorial 5 Motif discovery.
Introduction to bioinformatics
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple sequence alignment
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment.
Sequence alignment, E-value & Extreme value distribution
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Protein Sequence Alignment and Database Searching.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
1 Multiple Sequence Alignment and Molecular Evolution.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked

Multiple Sequence Alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG-- The purpose of multiple sequence alignments is to place homologous positions of homologous sequences into the same column.

ClustalW Based on phylogenetic analysis A phylogenetic tree is created using a pairwise distance matrix and nearest-neighbor algorithm The most closely-related pairs of sequences are aligned using dynamic programming Each of the alignments is analyzed and a profile of it is created Alignment profiles are aligned progressively for a total alignment

Progressive multiple alignment Perform pairwise alignments for all sequences Assume a match gives a score of 1, a mismatch is -0.25, indel is Total Score: 4.75

Progressive multiple alignment Create guide tree from pairwise alignments Use tree to build multiple sequence alignment Align most similar sequences first (give the most reliable alignments) Align the profile to the next closest sequence Align profiles to each other Multiple sequence alignment will be at the root of the tree

Progressive multiple alignment

Web ClustalW2 options: Operational optionsOutput options Output options, matrix choice, gap opening penalty Gap penalties, output tree type File input in GCG, FASTA, EMBL, GenBank, Phylip, or several other formats

Choose to run clustalw interactively or wait for results by . Interactive may take some time so be patient

Give your alignment a title.

You can choose between a fast or full alignment. Full is more accurate and is what we will be using.

We will use this option And this one

Alignment - considerations The programs simply try to maximize the number of matches –The “best” alignment may not be the correct biological one Multiple alignments are done progressively –Such alignments get progressively worse as you add sequences –Mistakes that occur during alignment process are frozen in. You will sometimes have to correct manually

ProblemWhat to do Many sequencesStart with sequences and avoid aligning more than 50 sequences. Very different sequencesSequences that are less than 30% identical with more than half of the other sequences in the set often cause troubles. Identical sequencesThey never help. Unless you have a very good reason to do so, avoid incorporating in your MSA any sequence that is more than 90%identical to another sequence in the set. Partial sequencesMSA programs prefer sequences that are roughly the same length. Programs often have difficulties comparing a mixture of complete sequences and shorter fragments. Repeated domainsSequences with repeated domains cause troubles to most MSA programs, especially if the number of domains is different.

Need more accuracy then Clustalw for low identity sequences?

PSI-BLAST

Position Specific Iterated BLAST: PSI-BLAST The purpose of PSI-BLAST is to look deeper into the database for matches to your query protein sequence by employing a scoring matrix that is customized to your query.

PSI-BLAST is performed in five steps [1] Select a query and search it against a protein database – REGULAR BLAST [2] PSI-BLAST constructs a multiple sequence alignment then creates a “profile” or specialized position-specific scoring matrix (PSSM) – user-assisted – you can help choosing the candidates. [3] The PSSM is used as a query against the database [4] PSI-BLAST estimates statistical significance (E values) [5] Repeat steps [3] and [4] iteratively, typically 5 times. At each new search, a new profile is used as the query.

PSSMPSSMPSSMPSSM

PSI-BLAST: self-positives PSI-BLAST is useful to detect weak but biologically meaningful relationships between proteins. The main source of false positives is the erroneous amplification of sequences not related to the query. For instance, a query with a coiled-coil motif may detect thousands of other proteins with this motif that are not homologous. Once even a single non-related protein is included in a PSI-BLAST search above threshold, it will not go away.

One way to check results: take newly found seqs and perform PSI-BLAST using them, then examine whether we ‘fish’ original seq (reciprocal identification)