Finding Functionally Significant Structural Motifs in Proteins Jennifer Hicks
Introduction Goal of Protein Classification: Techniques Trypsin Evolutionary Relationships Functional Similarities Techniques Sequence Matching Whole Structure Matching Motif Matching Trypsin Active Site
Project Goal Find structural motif in an example protein INPUT: 3D, labeled coordinates Known Motif or Pattern Example Protein Output: Best Match Optimal Transform and Correspondence Set Partial Match RMSD
Algorithm Overview Generate small set of interest regions and correspondences from the example protein Align the input pattern to each interest region Choose the alignment with the lowest RMSD
Regions of Interest Input: Output: 3 pivot elements of pattern motif Error Threshold Output: Correspondence sets that satisfy constraints and labels
Alignment: Approach 1 Find an initial transform Create a full correspondence set using a closest neighbor method Find the optimal transform by aligning small random subsets of the current correspondence set Repeat steps 2 and 3 until RMSD stabilizes
Alignment: Approach 2 Find transform for current Correspondence Set with SVD Transform motif and update Correspondence Set to include sufficiently close elements Repeat steps 1 and 2 until Correspondence Set stops growing
Partial Match RMSD Approach 2: Penalty for items not included in final correspond. set Approach 1: Tukey function to allow for outliers T(x)
Running Time Let m = # Pattern Elements Let n = # Example Protein Elements Regions of Interest: Cubic in number of example protein elements that match pivot label Each iteration of Alignment: O(m) to compute transform O(mn) to update correspondences
Testing Generated 3D, labeled motif Randomly perturbed motif Aligned motif to generated proteins with and without the perturbed motif
Results
Future Work Regions of Interest: Protein Description: Partial Match: Grid Method Choose pivots based on biological significance Initial Alignment Protein Description: Labeling technique Orientation vectors Partial Match: Determine biological significance
References Singh, Rohit. Computational study of protein structure and folding: some interesting problems. Masters Thesis, Stanford University, 2002. Umcyama, Shinji. Least-Squares Estimation of Transformation Parameters Between Two Point Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991, 13(4). Protein Data Bank