Download presentation
Presentation is loading. Please wait.
Published byἐλπίς Δοξαράς Modified over 5 years ago
1
TEXTAL: Applications of Pattern Recognition to Macromolecular Crystallography
Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M 7/2/2019
2
Automating Structure Determination
Typical Steps: obtain crystals collect data (e.g. MAD, at synchrotron) determine initial set of phases generate electron density map density modification/phase refinement construct model (atomic coordinates) 7/2/2019
3
Automating Structure Determination
Existing computational routines: heavy atom search, Patterson correlation, solvent flattening, maximum likelihood phase combination few methods to interpret electron density maps requires humans: potential bottleneck difficulty: low res., phase errors, weak density must automate for structural genomics and rational drug design 7/2/2019
4
Overview of TEXTAL Apply pattern recognition techniques
Exploit database of previously-solved maps Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates? 7/2/2019
5
Overview (cont’d) Divide-and-Conquer:
1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts Database contains many regions centered on CAs from previous maps ~5A radius right for “structural repetition” 7/2/2019
6
Overview (cont’d) Database: ~105 regions from ~100 maps
How to identify closest match (efficiently)??? Calculate numerical features that represent the pattern in each region Must be rotation-invariant Search can be very fast: just compare features 7/2/2019
7
Overview (cont’d) 7/2/2019
8
Database Construction
Ideally would use solved MAD/MIR maps Using “back-transformed” maps works well PDB structure factors (include B-factors) keep reflections down to 2.8A Fourier transform electron density map 50 proteins from PDBSelect (non-homol.) about 50,000 regions Feature extraction done offline 7/2/2019
9
Rotation-Invariant Features
Average density: m=(1/n)Sri, where ri is density at each lattice point in region Other Statistical Features: standard deviation, kurtosis… Distant to center of mass: <xc,yc,zc>=(1/n)< Sxiri/m,Syiri/m,Sziri/m> dcen=(xc2+ yc2+ zc2) 7/2/2019
10
More Features Moments of inertia
measures dispersion around axes of symmetry in a density distribution calculate 3x3 inertia matrix diagonalize to get eigenvalues sort from largest to smallest take magnitudes and ratios of moments 7/2/2019
11
More Features Spoke angles surface area of contours
if region centered on CA, should have 3 “spokes” of density emanating from center find best-fit vectors; calc. angles among them surface area of contours connectivity of density/bones in region other geometrical features... 7/2/2019
12
Details of Matching Process
Feature-based matching: Euclidean distance metric between feature vectors. dist(R1,R2)=Swi(Fi(R1)-Fi(R2))2 Must weight features by relevance less-relevant features add noise Slider algorithm: optimize weights by comparing features in matching regions versus mismatches Verify selections by density correlation requires search for optimal rotation 7/2/2019
13
Experiments Goal: evaluate potential of pattern-matching
Assumption: CA positions known Procedure 1. extract features for each region 2. collect top K=400 feature-based matches in DB 3. calculate density correlation, take best match 4. rotate backbone+sidechain atoms into position ~30sec/residue on SGI Origin 2000 7/2/2019
14
Feature Weights 7/2/2019
15
Results 1gcn = glucagon 1fnb = ferredoxin reductase
1tup = p53 tumor suppressor IFABP = intestinal fatty acid binding protein BT = back-transformed 7/2/2019
16
Results Structural similarity groups: Ala Asp, Asn, Leu Gly Glu, Gln
Pro Arg, Lys, Met Cys, Ser Phe, Trp, Tyr, His Ile, Val, Thr 7/2/2019
17
Results 7/2/2019
18
Example: Portion of 1tup
7/2/2019
19
Example: Glucagon 7/2/2019
20
Post-Processing Routines
Concatenate local models per a.a. into PDB Detect and repair flips by majority chain direction Utilize amino acid sequence information map chains into known sequence (alignment) re-lookup residues based on identity Real-space refinement 7/2/2019
21
CAPRA Need to find CAs automatically and accurately
Bones doesn’t identify CAs (except branches) Use pattern recognition again Extract features for all lattice points inside 1s contour, or along trace Use neural net to predict distance to true CA Training set: examples of {<F1,F2…>,Di} Status: currently 1A rms, need to get 7/2/2019
22
Example 7/2/2019
23
See our forthcoming paper in: Acta Cryst. D
Acknowledgements Dr. James C. Sacchettini Center for Structural Biology, Texas A&M Graduate students/post-docs: Dr. Jon Christopher, Tom Holton, Lydia Tapia Funding provided by: NIH (GM-59398) See our forthcoming paper in: Acta Cryst. D 7/2/2019
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.