Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations.

Similar presentations


Presentation on theme: "Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations."— Presentation transcript:

1 Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations

2 Rotation in 3D - matrices

3 Rotations in 3D – Euler angles * Rotate the XYZ- system about the Z- axis by α. The X-axis now lies on the line of nodes. * Rotate the XYZ- system again about the now rotated X-axis by β. The Z-axis is now in its final orientation, and the x- axis remains on the line of nodes. * Rotate the XYZ- system a third time about the new Z-axis by γ.

4 axis-angle

5 Quaternions Extension of complex numbers. h=a+bi+cj+dk, a,b,c,d real numbers. i,j,k : imaginary components s.t.: i 2 =j 2 =k 2 =-1 ij=k, jk=i, ki=j ij = -ji, jk = -kj, ki = -ik

6 Unit quaternions Unit quaternions : Group under quaternion multiplication. can be mapped to:

7 Quaternions as axis-angle rotation

8 Some resources http://mathworld.wolfram.com/RotationMatrix.html http://mathworld.wolfram.com/EulerAngles.html http://mathworld.wolfram.com/Quaternion.html

9 Molecular distance measurements

10 Measuring protein structure similarity Visual comparison Dihedral angle comparison Distance matrix RMSD (root mean square distance) Given two “shapes” or structures A and B, we are interested in defining a distance, or similarity measure between A and B. Is the resulting distance (similarity measure) D a metric? D(A,B) ≤ D(A,C) + D(C,B)

11 Comparing dihedral angles Torsion angles (  ) are: - local by nature - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) Add 1 degree To all  But…

12 Internal distance matrix 1 2 3 4 6.0 8.1 5.9 1234 103.86.08.1 23.80 5.9 36.03.80 48.15.93.80

13 Internal distance matrix (2) Advantages - invariant with respect to rotation and translation - can be used to compare proteins Disadvantages - the distance matrix is O(n 2 ) for a protein with n residues - comparing distance matrix is a hard problem - insensitive to chirality

14 Quality Assessment through RMSD ▆ RMSD: Root Mean Squared Deviation one of the simplest measures to quantify how different two protein conformations really are advantage: simple to compute by representing conformations as 3N vectors (N atoms)‏ limitations: 1) limited to conformations of the same protein chain 2) atom-atom correspondence needed on different-length chains 3) not very descriptive if changes are localized ▆ RMSD: Average atomic distance given two conformations of a chain of N atoms represent the conformations as two 3N vectors x and y RMSD(x,y) is the euclidean distance between x and y, averaged over the N atoms ▆ lRMSD: least RMSD same conformation rigidly transformed in space (translated or rotated) should give an RMSD of 0 before computing RMSD(x,y) one needs to remove changes due to rigid-body transformations

15 Protein Structure Superposition A rigid-body transformation T is a combination of a translation t and a rotation R: T(x) = Rx+t The quantity to be minimized is : Where a and b are the two point sets.

16 The translation part E is minimum with respect to t when: Then: If both data sets A and B have been centered on 0, then t = 0 ! Step 1: Translate point sets A and B such that their centroids coincide at the origin of the framework

17 The rotation part Let  A and  B be the centroids of A' and B', and A and B the matrices containing the coordinates of the points of A and B centered on 0: Build covariance matrix: x= 3x3 3xN Nx3

18 The rotation part U and V are orthogonal matrices, and D is a diagonal matrix containing the singular values. U, V and D are 3x3 matrices Compute SVD (Singular Value Decomposition) of C: Define S by: Then

19 2. Build covariance matrix: The algorithm 1. Center the two point sets A and B 3. Compute SVD (Singular Value Decomposition) of C: 5. Compute rotation matrix 4. Define S: O(N) in time! 6. Compute lRMSD:

20 Some reading material http://cnx.org/content/m11608/latest/#RMSD

21 lRMSD has Shortcomings ▆ lRMSD cannot capture localized changes: if a small perturbation occurs in a part of the structure, e.g. rotation of a hinge connecting two domains, lRMSD will report a large value ▆ Main reason: lRMSD does not know how to attribute changes to specific atoms of the chain ▆ lRMSD distributes change equally (through the averaging) to all atoms in a protein chain ▆ Measuring conformational similarity is an active research area

22 Other Quality Assessment: Shape Similarity ▆ Sometimes assessment of cavities on the surface of a protein is more important than description of the rest of the structure, especially when the goal is prediction of a binding site rather than of the entire structure (which can be thought of as a scaffold)‏ ▆ Methods that assess surface area, solvent accessible surface area, that compute volumes, and detect cavities on proteins are very important in the context of binding and docking Model each atom as a vdw sphere, the union of which gives the molecular surface Not all molecular surface is accessible to solvent. Rolling a solvent ball over the vdw spheres traces out the solvent accessible surface area (SASA). SASA is important to quantitatively determine interactions of the protein

23 Solvent-accessible Solvent Area (SASA)‏ ▆ Computational geometry methods that use Delaunay triangulations and alpha shapes assess SASA and other geometric descriptors of molecular surfaces, volumes, and cavities ▆ We will come back to this topic in the context of molecular docking – further reading about shape computing at http://cnx.org/content/m11616/latest/ SASA for a 1.4 Å ball SASA for a 1.5 Å ball. Increasing the radius reduces the SASA due to more cavities that a bulkier ball cannot penetrate

24 Ultrafast Shape Recognition (USR) Drug design – Screening a number of potential compounds. Find a set of molecules which closely resemble a lead molecule from a HUGE database. Shape similarity may indicate similar binding properties and similar activity.

25 Overview Efficient global comparison of molecular shapes. The molecules are represented as feature vectors, representing the relative positions of the atoms. Does not require alignment of the molecules. Suitable for large database search.

26 Feature vector representation The shape of a molecule is uniquely determined by the relative positions of the atoms. … Which are determined by the inter-atomic distances. The set of distances can be constrained due to forces that hold the atoms together.

27 Strategic feature points The molecule is described as 4 sets of atomic distance distributions from feature points: Center of mass - ctd Point closest to ctd - cst Point farthest from ctd – fct Point farthest from fct – ftf The moments of the distributions are calculated and stored as a feature vector. Estimate of the size, compactness and symmetry of the molecule.

28 Feature vector for a molecule

29 Similarity score

30

31 Advantages and disadvantages Extremely fast due to calculation of only 4N distances and distributions. Very sensitive to small changes in the molecule shape. Does not directly account for chemical interactions and atom types.

32 Quality Assessment through LGA ▆ Local-Global-Alignment (LGA) introduced by Adam Zemla in 2003 is being used as a more accurate similarity assessment than lRMSD in CASP ▆ LGA generates many different local superpositions to find regions where two conformations are similar: combines longest continuous segment (LCS) and global distance test (GDT) to find local and global similarities ▆ LCS superimposes the longest segments that fit under a selected RMSD cutoff ▆ GDT complements evaluations made with LCS searching for the largest (not necessary continuous) set of ‘equivalent’ residues that deviate by no more than a specified distance cutoff ▆ Further reading: Zemla A., "LGA - a Method for Finding 3D Similarities in Protein Structures", Nucleic Acids Research, 2003, Vol. 31, No. 13, pp. 3370-3374 http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=12824330


Download ppt "Topics in bioinformatics CS697 Spring 2011 Class 12 – Mar-22-2011 Molecular distance measurements Molecular transformations."

Similar presentations


Ads by Google