Download presentation
Presentation is loading. Please wait.
Published byBrittney Caldwell Modified over 9 years ago
1
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry Department University of Wisconsin – Madison USA Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006
2
X-ray Crystallography Protein Crystal Collection Plate FFT Electron Density Map (“3D picture”) X-ray beam
3
Given: Sequence + Density Map Sequence + Electron Density Map
4
Find: Each Atom’s Coordinates
5
Our Subtask: Backbone Trace CαCα CαCα CαCα CαCα
6
The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein
7
The Unit Cell 3D density function ρ(x,y,z) provided over unit cell Unit cell may contain multiple copies of the protein
8
Density Map Resolution ARP/wARP (Perrakis et al. 1997) TEXTAL (Ioerger et al. 1999) Resolve (Terwilliger 2002) Our focus 2Å 3Å 4Å
9
Overview of ACMI (our method) Local Match Algorithm searches for sequence-specific 5-mers centered at each amino acid Many false positives Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace Global Consistency Use probabilistic model to filter false positives Find most probable backbone trace
10
5-mer Lookup and Cluster … VKH V LVSPEKIEELIKGY … PDB Cluster 1 Cluster 2 wt=0.67wt=0.33 NOTE: can be done in precompute step
11
5-mer Search 6D search (rotation + translation) for representative structures in density map Compute “similarity” Computed by Fourier convolution (Cowtan 2001) Use tuneset to convert similarity score to probability
12
Convert Scores to Probabilities 5-mer representative scores t i (u i ) search density map Bayes’ rule probability distribution over unit cell P(5-mer at u i | Map) match to tuneset score distributions POS NEG
13
In This Talk… Where we are now For each amino acid in the protein, we have a probability distribution over the unit cell Where we are headed Find the backbone layout maximizing
14
Pairwise Markov Field Models A type of undirected graphical model Represent joint probabilities as product of vertex and edge potentials Similar to (but more general than) Bayesian networks u1u1 u3u3 u2u2 y
15
Protein Backbone Model ALAGLYLYSLEU Each vertex is an amino acid Each label is location + orientation Evidence y is the electron density map Each vertex (or observational) potential comes from the 5-mer matching
16
Protein Backbone Model Two types of edge (or structural) potentials Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in the proper orientation ALAGLYLYSLEU
17
Protein Backbone Model Two types of structural (edge) potentials Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space ALAGLYLYSLEU
18
Backbone Model Potential Constraints between adjacent amino acids: =x
19
Constraints between nonadjacent amino acids: Backbone Model Potential
20
Observational (“amino-acid-finder”) probabilities Backbone Model Potential
21
Probabilistic Inference Exact methods are intractable Use belief propagation (BP) to approximate marginal distributions Want to find backbone layout that maximizes
22
Belief Propagation (BP) Iterative, message-passing method (Pearl 1988) A message,, from amino acid i to amino acid j indicates where i expects to find j An approximation to the marginal (or belief), is given as the product of incoming messages
23
Belief Propagation Example ALAGLY
24
Technical Challenges Representation of potentials Store Fourier coefficients in Cartesian space At each location x, store a single orientation r Speeding up O(N 2 X 2 ) naïve implementation X = the unit cell size (# Fourier coefficients) N = the number of residues in the protein
25
Speeding Up O(N 2 X 2 ) Implementation O(X 2 ) computation for each occupancy message Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space O(N 2 ) messages computed & stored Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator Improved implementation O(NX log X)
26
1XMT at 3Å Resolution 1.12Å RMSd 100% coverage HIGH LOW 0.17 0.82 prob(AA at location)
27
1VMO at 4Å Resolution 3.63Å RMSd 72% coverage 0.02 0.25 HIGH LOW prob(AA at location)
28
1YDH at 3.5Å Resolution 1.47Å RMSd 90% coverage 0.02 0.27 HIGH LOW prob(AA at location)
29
Experiments Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve Used ten model-phased maps Smoothly diminished reflection intensities yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps
30
RMS Deviation ACMI Textal Resolve Density Map Resolution Cα RMS Deviation ACMI
31
Model Completeness Density Map Resolution ACMI Textal Resolve % chain traced % residues identified ACMI
32
Per-protein RMS Deviation ACMI RMS Error TEXTAL RMS Error Resolve RMS Error
33
Conclusions ACMI effectively combines weakly-matching templates to construct a full model Produces an accurate trace even with poor-quality density map data Reduces computational complexity from O(N 2 X 2 ) to O(N X log X) Inference possible for even large unit cells
34
Future Work Improve “amino-acid-finding” algorithm Incorporate sidechain placement / refinement Manage missing data Disordered regions Only exterior visible (e.g., in CryoEM)
35
Acknowledgements Ameet Soni Craig Bingman NLM grants 1R01 LM008796 and 1T15 LM007359
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.