Accuracy of structure-based sequence alignment of automatic (structure-alignment) methods Changhoon Kim and BK Lee Laboratory of Molecular Biology CCR/NCI/NIH.

Slides:



Advertisements
Similar presentations
Pfam(Protein families )
Advertisements

Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
FAST: A Novel Protein Structure Alignment Algorithm Jianhua Zhu and Zhiping Weng PROTEINS: Structure, Function, and Bioinformatics 58:618–627 (2005) Created.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
The Protein Data Bank (PDB)
Results Functional shape of original HSSP-curve adequate –But: A threshold of 25% not reasonable for an alignment length below residues Above an.
Sequence/Structure Alignment Resources from NCBI Steve Bryant Protein Data Bank Rutgers University November 19, 2005.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein Structure Prediction II
Selection Sort
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Homology Modeling Seminar produced by Hanka Venselaar.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Tools: Amino acid sequences (PDB, EBI) from many diverse organisms to be provided for students to select about 5-6 organisms representing the three domains.
Protein Sequence Alignment and Database Searching.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
1 Enter the following Micro-RNA sequence into the box Run MFold and look at the results MFold Using MFold to predict RNA secondary structure
Calculating branch lengths from distances. ABC A B C----- a b c.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Selection Sort
Protein Domain Database
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
3DM: Protein Super-family Platforms 3DM Protein super-family data integration Tom van den Bergh Bio-Prodict.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Protein Homologue Clustering and Molecular Modeling L. Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Chapter 14 Protein Structure Classification
NCBI Molecular Biology Resources
Optimizing Biological Data Integration
short tandem repeat str profile
Pairwise alignment incorporating dipeptide covariation
Phylogenetic Inference

Genome Annotation Continued
courtesy of C. Chothia Most proteins in biology have been produced by the duplication, divergence and recombination of the members of a small.
Sequence alignment of C-terminal phosphorylated plant aquaporins
פחת ורווח הון סוגיות מיוחדות תהילה ששון עו"ד (רו"ח) ספטמבר 2015
CSC2431 February 3rd 2010 Alecia Fowler
Classification: understanding the diversity and principles of
Sequence Based Analysis Tutorial
BLAST.
The future of protein secondary structure prediction accuracy
Phosphopeptides identified harboring minimal binding motifs
Explore Evolution: Instrument for Analysis
The evolutionary conservation of the phosphoproteomes.a, E. coli. b, B. subtilis. The evolutionary conservation of the phosphoproteomes.a, E. coli. b,
Volume 109, Issue 6, Pages (September 2015)
Structure prediction: Folding proteins by pattern recognition
Fractions and decimals
Phylogenetic comparison among selected Pasteurella multocida and Haemophilus influenzae species with completed genome sequences. Phylogenetic comparison.
The power of metagenomic read recruitment
Fig. 2. RGD and KGD motifs in N. vectensis thrombospondins
Types of Errors And Error Analysis.
Phosphopeptides identified harboring minimal binding motifs
Suvobrata Chakravarty, Roberto Sanchez  Structure 
Homology modeling in short…
Alignment of the Amino Acid Sequences of NCS and Other PR10/Bet v1 Proteins from Various Plant Species.Deduced amino acid sequences were aligned using.
Presentation transcript:

Accuracy of structure-based sequence alignment of automatic (structure-alignment) methods Changhoon Kim and BK Lee Laboratory of Molecular Biology CCR/NCI/NIH

What we did … Evaluated 7 structure alignment programs selected based on their availability and popularity (CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA, VAST) Standard of truth: NCBI’s CDD alignments Conserved Domain Database is manually procured. Contains only the conserved core residues. Performance Measure

Average fraction of correctly aligned residues (Average FCAR) Maximum allowed shift error

Sequence similarity dependence of Average FCAR(0) Average FCAR(0) Number of superfamilies Sequence similarity (% identity)

Variation of fCAR for individual structure pairs Correctly aligned fractions (fCAR) Alignments sorted by fCAR(0) within each superfamily

An example of alignment errors CDD alignment CE alignment DaliLite alignment