“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Types of homology BLAST
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Profile-profile alignment using hidden Markov models Wing Wong.
Bioinformatics and Phylogenetic Analysis
Expected accuracy sequence alignment
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Multiple sequence alignment
Multiple Sequence Alignment
Protein Sequence Alignment and Database Searching.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Construction of Substitution Matrices
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Expected accuracy sequence alignment Usman Roshan.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Construction of Substitution matrices
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Expected accuracy sequence alignment Usman Roshan.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
張家銘 | Chang Jia Ming. 1. 自我介紹 2. 過往學術研究 簡報大綱 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Identify D. melanogaster ortholog
Comparative Genomics.
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Lucy R. Forrest, Christopher L. Tang, Barry Honig  Biophysical Journal 
BLAST Slides adapted & edited from a set by
1-month Practical Course Genome Analysis Iterative homology searching
Presentation transcript:

“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Transmembrane protein Membrane proteins are likely to constitute 20-30% of all ORFs contained in genomes. Odorant receptors Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no (October 16, 2009):

Transmembrane protein multiple sequence alignment 1994 first address alignment for transmembrane proteins – Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for transmembrane proteins. J Mol Biol 1994, 243(3): Few multiple sequence alignment software till now => 3 – Shafrir Y, Guy HR: STAM: simple transmembrane alignment method. Bioinformatics 2004, 20(5): – Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 2006, 91(2): – Pirovano W, Feenstra KA, Heringa J: PRALINE TM : a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):

BAliBASE 2.0 reference 7 Pirovano W, Feenstra KA, Heringa J: PRALINE TM : a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):

We need an accurate Transmembrane MSA!

Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):

Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):

Pair-hidden Markov Model Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency- based multiple sequence alignment. Genome Res 2005, 15(2): Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.

Probabilistic consistency transformation

Homology-extended probabilistic consistency New emission probabilities are like the following. where α m is the frequency with which residue m appears at position i and β n is the frequency with which residue n appears at position j; p(A.A. m, A.A. n ) is the original emission probabilities in ProbCons.

Homology-extended probabilistic consistency where α i, β j, and r k are the profile frequency.

Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3): Que1: how to build a profile? Que2: how to score profiles?

Que1: how to build a profile? Database Size Searching parameters – E-value : most used, anything else??? 1.Matrix file : -M 2.Filter the query sequence for low-complexity subsequence : -F 3.Neighborhood word threshold : -f 4.Truncates the report to number of alignments: -b

Word hit & Neighborhood

Searching parameters Fast, Insensitive search – High percent identity – blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5 Slow, Sensitive search – Increase sensitivity, decrease specificity – blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b –v Book “BLAST”, page 146, 147

UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM Different database UniProt (release – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"

Database Size Data SetNo. UniRef50-TM87,989 UniRef90-TM263,306 UniRef100-TM613,015 UniProt-TM818,635 UniRef503,077,464 UniRef906,544,144 UniRef1009,865,668 UniProt11,009,767 NCBI NR10,565,004 UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM UniProt (release – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"

Performance comparison of different database sizes for the BAliBASE2-ref7. UniRef50-TM contains about 100 times fewer sequences than the full UniProt. The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.

10% more columns are correctly aligned when compared with PRALINE TM. The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.

BAliBASE 3.0 The performance of other methods are from Rausch et al. The SP and TC scores of full- length sequences are evaluated by core blocks (by xml).

Que2: how to score profiles? Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):

Prediction mode : –template_file PSITM Output : -output tm_html This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species. Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.

Paolo Di Tommaso