Seminar in structural bioinformatics Multiple structural alignment of proteins By Elad Kaspani.

Slides:



Advertisements
Similar presentations
Alignment Visual Recognition “Straighten your paths” Isaiah.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
PCA + SVD.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Heuristic alignment algorithms and cost matrices
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
FLEX* - REVIEW.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
QSD – Quadratic Shape Descriptors Surface Matching and Molecular Docking Using Quadratic Shape Descriptors Goldman BB, Wipke WT. Quadratic Shape Descriptors.
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
MASS and MultiProt methods. Problem Definition Input: a collection of 3D protein structures Goal: find substructures common to two or more proteins.
1 Seminar in structural bioinformatics Pairwise Structural Alignment Presented by: Dana Tsukerman.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Structure Alignment
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Protein Tertiary Structure Prediction
Computer vision.
Gene expression & Clustering (Chapter 10)
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Axial Flip Invariance and Fast Exhaustive Searching with Wavelets Matthew Bolitho.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Wenqi Zhu 3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform.
Pharm 201 Lecture 10, Reductionism and Classification Require Detailed Comparison Consider 3D Comparison Pharm 201/Bioinformatics I Philip E. Bourne.
A data-mining approach for multiple structural alignment of proteins WY Siu, N Mamoulis, SM Yiu, HL Chan The University of Hong Kong Sep 9, 2009.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Pair-wise Structural Comparison using DALILite Software of DALI Rajalekshmy Usha.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Finding Functionally Significant Structural Motifs in Proteins
Protein Structures.
Probing the “Dark Matter” of Protein Fold Space
Protein Structure Alignment
Protein structure prediction
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Seminar in structural bioinformatics Multiple structural alignment of proteins By Elad Kaspani

Multiple structural alignment

Outline Introduction Introduction What is Multiple structural alignment? What is Multiple structural alignment? Why do we need Multiple structural alignment? Why do we need Multiple structural alignment? Pairwise Vs. Multiple structural alignment Pairwise Vs. Multiple structural alignment MASS - Multiple structural alignment by secondary structures MASS - Multiple structural alignment by secondary structures Problem definition Problem definition General strategy General strategy Algorithm description Algorithm description

Outline Cont. MASS - Multiple structural alignment by secondary structures MASS - Multiple structural alignment by secondary structures Algorithm outline Algorithm outline Complexity Complexity Results Discussion Results Discussion Summary & Conclusions Summary & Conclusions

Introduction Proteins sharing a common substructure may have a similar function. Proteins sharing a common substructure may have a similar function. What is Multiple structural alignment ? What is Multiple structural alignment ? Discussion – we already have pairwise alignment, isn ’ t that enough? Discussion – we already have pairwise alignment, isn ’ t that enough?

Pairwise Vs. Multiple structural alignment We have many algorithms pairwise structural alignment task We have many algorithms pairwise structural alignment task Only a few methods are available for aligning multiple structures Only a few methods are available for aligning multiple structures Most of them are based on series of pairwise comparisons Most of them are based on series of pairwise comparisons SSAPm ( SSAPm (Taylor et al., 1994) Prism Prism (Yang and Honig, 2000b) STAMP (Russell and Barton, 1992)

What do we want? Classification of existing and newly discovered proteins Classification of existing and newly discovered proteins Gaining insights into evolutionary relations between proteins Gaining insights into evolutionary relations between proteins Detecting motifs common to a group of proteins that share a certain function Detecting motifs common to a group of proteins that share a certain function Structure prediction algorithms Structure prediction algorithms

What ’ s wrong with methods based on series of pairwise comparisons ???

Multiple structural alignment These methods are limited!!! These methods are limited!!! In each pairwise comp., the only information is about the two molecules In each pairwise comp., the only information is about the two molecules alignments optimal for the whole set can be disregarded alignments optimal for the whole set can be disregarded dynamic programming disadvantage - dependent on the sequence order of the polypeptide chain dynamic programming disadvantage - dependent on the sequence order of the polypeptide chain We can ’ t see the woods  We can ’ t see the woods 

WHAT DO WE DO THEN????????????? WHAT DO WE DO THEN????????????? multiple structural alignment by secondary structures multiple structural alignment by secondary structuresMASS

MASS Considers all the given structures at the same time Considers all the given structures at the same time Exploiting the secondary structure representation - reduced time complexity Exploiting the secondary structure representation - reduced time complexity Does not require that all the input molecules be aligned Does not require that all the input molecules be aligned Capable of detecting structural motifs shared only by a subset of the molecules Capable of detecting structural motifs shared only by a subset of the molecules

MASS Can find non-sequential and even non- topological structural motifs Can find non-sequential and even non- topological structural motifs Suitable for a broad range of applications Suitable for a broad range of applications filter noisy results filter noisy results highly efficient and robust highly efficient and robust Other multiple-based methods Other multiple-based methods (Escalier et al., 1988) (Escalier et al., 1988) MUSTA (Leibowitz et al., 2001) MUSTA (Leibowitz et al., 2001) MultiProt (Shatsky et al., 2002) MultiProt (Shatsky et al., 2002)

Secondary structure elements (SSE) Secondary structure elements (SSE)

Basic terms rigid transformation rigid transformation Q - a subset Q - a subset T (Q) =R(Q) + t where R is a 3x3 rotation matrix and t is a translation vector T (Q) =R(Q) + t where R is a 3x3 rotation matrix and t is a translation vector ε-congruent ε-congruent For ε>0, find two largest subsets of the input sets, P and Q, and a rigid transformation, T, so that distance(P, T (Q)) 0, find two largest subsets of the input sets, P and Q, and a rigid transformation, T, so that distance(P, T (Q)) < ε How do we measure distance? How do we measure distance? RMSD RMSD

Problem Definition The pairwise case: The pairwise case: given two proteins, represented by a set of points in 3D space given two proteins, represented by a set of points in 3D space each point is associated with an atom ’ s position each point is associated with an atom ’ s position find the largest set that is congruent to two subsets of points from each protein find the largest set that is congruent to two subsets of points from each protein In computational geometry - largest common point set (LCP) problem In computational geometry - largest common point set (LCP) problem

Problem Definition The multiple case: The multiple case: given a collection of m point sets, given a collection of m point sets, find the largest set of points, of which an ε-congruent copy appears in each of the input sets find the largest set of points, of which an ε-congruent copy appears in each of the input sets Unfortunately, it ’ s NP-hard..... Unfortunately, it ’ s NP-hard..... We want not only the largest set of points, but also smaller common substructures We want not only the largest set of points, but also smaller common substructures

Problem Definition The multiple subset case: The multiple subset case: find solutions where only a subset of the input proteins is well aligned find solutions where only a subset of the input proteins is well aligned this complicates the problem ! (why?) this complicates the problem ! (why?) number of subsets is exponential number of subsets is exponential trade-off between the size of the subset and the size of its core (match list) trade-off between the size of the subset and the size of its core (match list) scoring function (core size – L, proteins # -k) f(l,k) = k scoring function (core size – L, proteins # -k) f(l,k) = k 2 )( L.

The algorithm : The algorithm :

Method Input : Input : a set of m proteins P 1, P 2,..., P m. a set of m proteins P 1, P 2,..., P m. For each protein For each protein the sequence of the 3D coordinates of atoms the sequence of the 3D coordinates of atoms assignment of SSE types to each residue assignment of SSE types to each residue Output : Output : The multiple alignments with the largest cores, according to the scoring function. The multiple alignments with the largest cores, according to the scoring function.

General strategy We want multiple alignments with at least two SSEs We want multiple alignments with at least two SSEs Bases – ordered pairs of SSEs whose ε- congruent copies appear in several proteins Bases – ordered pairs of SSEs whose ε- congruent copies appear in several proteins We look for a set of ε-congruent bases {b 1, b 2,..., b k }, from proteins P i1, P i2,..., P ik respectively. We look for a set of ε-congruent bases {b 1, b 2,..., b k }, from proteins P i1, P i2,..., P ik respectively. First base (b 1 ) is our pivot First base (b 1 ) is our pivot

General strategy – cont. Compute all the k − 1 rigid transformations between this base and the others Compute all the k − 1 rigid transformations between this base and the others Result - (T 12, T 13,..., T 1k ) defines multiple alignment between P i1, P i2,., P ik Result - (T 12, T 13,..., T 1k ) defines multiple alignment between P i1, P i2,., P ik The core may contain more then one base The core may contain more then one base we will get several alignments with almost the same transformations we will get several alignments with almost the same transformations (one alignment per base in the core) (one alignment per base in the core)

General strategy – cont. Cluster the initial multiple base alignments Cluster the initial multiple base alignments Merge the alignment. the core of the new alignment is the union of the cores of the original alignments. Merge the alignment. the core of the new alignment is the union of the cores of the original alignments. We get smaller set of multiple alignments We get smaller set of multiple alignments Extend the clustered alignments Extend the clustered alignments Find additional matching residues Find additional matching residues Give a score to each alignment Give a score to each alignment Report the highest scoring alignments Report the highest scoring alignments

Algorithm outline

Algorithm outline - stage 1 Representation of secondary structure elements: Representation of secondary structure elements: Axis representation for SSEs Axis representation for SSEs The least squares line from all the Cα atoms The least squares line from all the Cα atoms Direction & length determined by protein structure Direction & length determined by protein structure

Algorithm outline – stage 2 Detection of multiple base alignments: Detection of multiple base alignments: Use Geometric Hashing to detect bases whose ε-congruent copies appear in several proteins Use Geometric Hashing to detect bases whose ε-congruent copies appear in several proteins Each base has fingerprint Each base has fingerprint invariant to a 3D rigid transformation invariant to a 3D rigid transformation the types of the two SSEs the types of the two SSEs the angle between their axial vectors the angle between their axial vectors the midpoint-to-midpoint distance the midpoint-to-midpoint distance their line distance their line distance

Base fingerprint

Algorithm outline – stage 2 Almost-congruent bases have similar fingerprints Almost-congruent bases have similar fingerprints the types of their SSEs are the same the types of their SSEs are the same the difference between their midpoint-to- midpoint and line distances is up to 1.5 Å the difference between their midpoint-to- midpoint and line distances is up to 1.5 Å difference between their angles is up to 0.3 radians difference between their angles is up to 0.3 radians reside close to each other in the grid reside close to each other in the grid

Algorithm outline – stage 2 For each grid bin, extract all the bases of the bin and of adjacent bins For each grid bin, extract all the bases of the bin and of adjacent bins Group them together in the same base bucket Group them together in the same base bucket Base bucket - stores bases in columns according to the protein they belong to Base bucket - stores bases in columns according to the protein they belong to Bases derived from the same protein are stored in the same column Bases derived from the same protein are stored in the same column

Base bucket Almost-congruent bases are stored in the same base bucket

Stage 2 cont. A collection of almost-congruent bases, each belonging to a different column induces a local multiple alignment between the respective proteins A collection of almost-congruent bases, each belonging to a different column induces a local multiple alignment between the respective proteins core consists of at least two SSEs core consists of at least two SSEs One basis is selected as a pivot One basis is selected as a pivot rest of the bases are superimposed on it rest of the bases are superimposed on it Selection of the pivot may influence the alignment Selection of the pivot may influence the alignment Optional – try each base as pivot Optional – try each base as pivot

Stage 2 cont. Multiple alignment is defined by an underlying set of pairwise alignments Multiple alignment is defined by an underlying set of pairwise alignments For each base bucket we compute all the alignments between two bases taken from two different columns For each base bucket we compute all the alignments between two bases taken from two different columns find the transformation between two bases that aligns the maximal number of atoms with minimal RMSD find the transformation between two bases that aligns the maximal number of atoms with minimal RMSD

Cα atomic level

Stage 3 - Clustering For pair of proteins that share more then one base For pair of proteins that share more then one base We get more alignments with almost the same transformation, but a different local SSE core We get more alignments with almost the same transformation, but a different local SSE core Cluster all the local base alignments to find the ones with similar transformations Cluster all the local base alignments to find the ones with similar transformations merge them into a new global alignment merge them into a new global alignment The match list (core) of the new global alignment The match list (core) of the new global alignment union of the original local match lists union of the original local match lists its transformation is the one that aligns the SSEs with minimal RMSD its transformation is the one that aligns the SSEs with minimal RMSD

Stage 4 - Global extension Now the core of each pairwise alignment is a set of SSEs Now the core of each pairwise alignment is a set of SSEs Then we extend these alignments by finding additional matching residues Then we extend these alignments by finding additional matching residues The residues not necessarily belong to SSEs The residues not necessarily belong to SSEs We want to extend the cores of these alignments by detecting corresponding Cα atoms We want to extend the cores of these alignments by detecting corresponding Cα atoms We want to transform the second protein, so that it is fully superimposed onto the pivot protein We want to transform the second protein, so that it is fully superimposed onto the pivot protein

Stage 4 - Global extension Detect in linear time close pairs of C atoms, one atom from each protein Detect in linear time close pairs of C atoms, one atom from each protein These atom pairs are added to the alignment ’ s match list These atom pairs are added to the alignment ’ s match list transformation of the alignment is refined by employing the Least-Squares Fitting method transformation of the alignment is refined by employing the Least-Squares Fitting method

Stage 5 – Filtering & Scoring Computing the best global multiple alignments Computing the best global multiple alignments What are the best global multiple alignments? What are the best global multiple alignments? Number of aligned molecules Vs. core size Number of aligned molecules Vs. core size core size Vs. size of the smallest molecule core size Vs. size of the smallest molecule number of possible multiple alignments defined by the base buckets is exponential number of possible multiple alignments defined by the base buckets is exponential We do not compute all of them We do not compute all of them

Stage 5 – Filtering & Scoring Heuristic solution: Heuristic solution: For each BB compute the set of best multiple alignments recursively over the colomns For each BB compute the set of best multiple alignments recursively over the colomns For a set of multiple base alignments, obtained by last stage (b 1,..., b k ) For a set of multiple base alignments, obtained by last stage (b 1,..., b k ) Check if there is a base, b k+1, from the current column that improve the alignment ’ s score Check if there is a base, b k+1, from the current column that improve the alignment ’ s score Core(b 1,..., b k+1 ) = Core(b 1,..., b k ) ∩ Core(b 1, b k+1 )

Stage 5 – Filtering & Scoring Our scoring function Our scoring function Core size – L Core size – L Proteins number - k Proteins number - k f(l,k) = k f(l,k) = k Report the highest scoring alignments Report the highest scoring alignments Finish ! Finish !. () 2 L

Complexity Worst case complexity: (i) m is the number of proteins (ii) k is the number of residues in an SSE (iii) s and n are the number of SSEs and the number of residues found in each protein respectively. n ~ 300, k ~ 10, s ~ 15 The number of bases for each protein is O(s 2 )

Complexity For each pair of proteins we construct, cluster and extend O(s 4 ) pairwise alignments. This results in O(m 2 (s 4 k 3 +s 8 log s +s 4 n)) time where O(m 2 ) is the number of ways of pairing two proteins In practice, the complexity is much smaller we only construct the pairwise alignments defined by the BBs and the clustering reduces their number even more

Complexity The number of evaluated multiple alignments is linear in the number of bases Each base can be a pivot for only one multiple alignment We have O(ms 2 ) bases It takes O(ms 2 n) time to construct a single multiple alignment and O(m 2 s 4 n) time to construct all of them Running time for intire algorithm is bounded by O(m 2 s 4 (k 3 + s 2 log s + n)), but experiments show that the actual running time is significantly lower

Algorithm outline (reminder)

Results and Discussion

Experiment 1 Example 1 - Detection of subset alignments and their use for structural classification Example 1 - Detection of subset alignments and their use for structural classification We have used MASS to align a set of 12 structures from two families: We have used MASS to align a set of 12 structures from two families: Cofilin-like (CL) Cofilin-like (CL) Gelsolin-like (GL) Gelsolin-like (GL) The two families are related structurally but not sequentially The two families are related structurally but not sequentially

Experiment 1 The 12-molecule ensemble contains: The 12-molecule ensemble contains: four CL structures four CL structures eight GL eight GL The running time of MASS on this ensemble was 36 sec. The running time of MASS on this ensemble was 36 sec. (Pentium MHz processor) (Pentium MHz processor)

Experiment 1: core Vs. # Molecules

Experiment 1: Results (A) The structural alignment of all 12 proteins of the ensemble. (B) A subset alignment between only the eight GL proteins.

Experiment 1: Results (C) A subset alignment between only the four CL structures. (D) A subset alignment between only three out of the four CL structures.

Results Discussion As expected, the maximal core size decreases as the number of aligned molecules increases As expected, the maximal core size decreases as the number of aligned molecules increases The dependence is not linear: The dependence is not linear: Large decrease between three to four molecules Large decrease between three to four molecules Between four to five molecules Between four to five molecules Between eight to nine molecules Between eight to nine molecules

Experiment 2 Non-topological motif detection Non-topological motif detection The ensembles share a common SSE motif, but different topology. The ensembles share a common SSE motif, but different topology. In topological motifs, the order and the direction of the corresponding SSEs along the polypeptide chain are conserved while in non-topological they are not. In topological motifs, the order and the direction of the corresponding SSEs along the polypeptide chain are conserved while in non-topological they are not.

Experiment 2 Helix bundle ensemble: The ten proteins in this ensemble belong to four different folds and six different superfamilies Helix bundle ensemble: The ten proteins in this ensemble belong to four different folds and six different superfamilies Running time: 48 seconds. Running time: 48 seconds. Also aligned by MUSTA Also aligned by MUSTA MASS detected two additional conserved α- helices.(why ?) MASS detected two additional conserved α- helices.(why ?) MASS is secondary structure oriented MASS is secondary structure oriented Directed to find solutions that contain more SSEs Directed to find solutions that contain more SSEs

Common core is shown by assigning a different color to each conserved helix.

The schematic TOPS representation Triangles represent strands and circles helices. Corresponding secondary structure regions are drawn in the same color. As one can see the solution is non-topological.

Large-scale structural alignments MASS can be applied on the order of tens of proteins in practical running times on a standard PC MASS can be applied on the order of tens of proteins in practical running times on a standard PC Three SCOP ensembles: Three SCOP ensembles: (i) Serine proteases: all structures from the ‘ Prokaryotic trypsin-like serine protease ’ SCOP family (68 molecules) (i) Serine proteases: all structures from the ‘ Prokaryotic trypsin-like serine protease ’ SCOP family (68 molecules) (ii) PK beta barrel: all structures from the ‘ Pyruvate kinase beta-barrel domain ’ SCOP family (66 molecules); (ii) PK beta barrel: all structures from the ‘ Pyruvate kinase beta-barrel domain ’ SCOP family (66 molecules); (iii) Unrelated proteins (80 proteins) (iii) Unrelated proteins (80 proteins)

Large-scale structural alignments

Results Discussion The results show that the running time is influenced by: The results show that the running time is influenced by: (i) the number of molecules (i) the number of molecules (ii) the average molecular size (and the average number of SSEs in a molecule) (ii) the average molecular size (and the average number of SSEs in a molecule) (iii) the structural variance among the molecules (iii) the structural variance among the molecules How do you think they influence the running time??? How do you think they influence the running time???

Results Discussion The first two parameters are expected – they increase the running time as they grow The first two parameters are expected – they increase the running time as they grow The more structurally variable is the ensemble, the shorter the running time is ! (?) The more structurally variable is the ensemble, the shorter the running time is ! (?) Why? Why? the more structurally homogeneous is the input, more SSE bases are stored in the same grid bin. the more structurally homogeneous is the input, more SSE bases are stored in the same grid bin.

Summary & Conclusions - MASS Pairwise comparisons are not enough Pairwise comparisons are not enough Novel method for aligning multiple protein structures and detecting their shared core Novel method for aligning multiple protein structures and detecting their shared core Capable of detecting also cores common only to subsets of the input (proteins) Capable of detecting also cores common only to subsets of the input (proteins) exploits a secondary structure representation of proteins. exploits a secondary structure representation of proteins. Many noisy solutions are filtered out Many noisy solutions are filtered out Highly efficient capable of aligning tens of protein molecules Highly efficient capable of aligning tens of protein molecules

Summary & Conclusions - MASS Disregards the sequence order of SSEs along the polypeptide chain Disregards the sequence order of SSEs along the polypeptide chain Can find non-sequential and non-topological structural motifs. Can find non-sequential and non-topological structural motifs. Advantage over dynamic-programming based methods Advantage over dynamic-programming based methods MASS program can be run in two modes: MASS program can be run in two modes: (i) using SSE information only (reduced running time) (when will we use this option?) (i) using SSE information only (reduced running time) (when will we use this option?) (ii) using both SSE and atomic information (ii) using both SSE and atomic information

Questions? Questions?

Lecture was based on article: MASS: multiple structural alignment by secondary structures By O. Dror, H. Benyamini, R. Nussinov, and H. Wolfson (School of Computer Science, Tel Aviv University, 2003 )