Download presentation
Presentation is loading. Please wait.
Published byJosephine Carter Modified over 9 years ago
1
Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12
2
What is the binding site? (Concave, cleft, hole) – shaped region on protein surface A key into a lock! Key-ligand Lock-protein Lock hole-binding sites
3
Why do we need to find binding sites? First step in many structure analyses: Functional/catalytic site prediction Comparisons of protein atomic configurations Docking calculations Structure-based drug design …
4
Algorithms for finding binding sites Grid-based Cover the protein into a 3D grid, Empty grid points are then defined a pockets if they satisfy a number of geometric or energetic conditions. Sphere-based A set of probe spheres are placed on protein surface. Pocket spheres are those generated probe spheres that satisfy a number of geometric conditions among the generated probe spheres. α-shape based Is defined as a subset of Delaunay tessellations of protein atoms, omitting edges longer than the sum of the radii of two atoms.
5
Algorithms for finding binding sites Grid-based POCKET, LIGSITE, LIGSITE CS,LIGSITE CSC,ConCavity, PocketPicker and GHECOM Sphere-based SURFNET, PASS, Q-SiteFinder, PHECOM α-shape based CAST, Fpocket
6
α-shape The shape surrounded by the black line The edge of Delaunay tessellations
7
No edge that its length is longer than the sum of the radii of two atoms
8
α-shape based: CAST Computes a triangulation of the protein’s surface atoms using α- shapes, then triangles are grouped by letting small triangles flow toward neighboring larger triangles, which act as sinks!
9
Grid-based The protein is projected onto a 3D grid. They focused on PSP (protein-solvent-protein) events of the grids. When a straight line drawn from a grid point is enclosed on both side by protein atoms, the arrangement of the line for that grid point is termed a PSP event. Grid points having more than a threshold number of PSP events are defined as pockets.
10
Sphere-based SURFNET: Places a sphere (called gap spheres) between two protein atoms. If the sphere contains any other atoms, reduce its radius until it just touches one protein atom. A set of these gap spheres are defined as pockets.
11
Grid-based: GHECOM By Takeshi Kawabata Kawabata T. (2010) Detection of multi-scale pockets on protein surfaces using mathematical morphology. Proteins,78, 1195-1121 To define pocket region on protein surface
12
Primary points: 1. A new definition of pockets by using the basic operations of mathematical morphology 2. Proposed an algorithm for finding pockets 3. Construct a useful dataset for algorithm testing 4. Introduced a new method for evaluate binding site predictions 5. Some useful discoveries about ligands bind to binding sites
13
Some Background: Multiscale pockets: Calculate deep and shallow pockets simultaneously “Multiscale pockets” need “multiscale probes”, they use many probes of different sizes to define pockets. “Size” and “Depth” of pockets: Two properties of pockets A definition of pockets using small and large spherical probes of his previous work: PHECOM A pocket region: a space into which a small spherical can enter but a large spherical probe cannot.
14
Pocket definition Mathematical Morphology It is a theory used in the analysis of geometric features of digital images based on rigorous set theory. Morphology can provide boundaries of objects, their skeletons, and their convex hulls. It is also useful for many pre- and post-processing techniques, especially in edge thinning and pruning.
15
mathematical morphology (con.) Four operations: dilation, erosion, opening, closing a: Molecular shape b: The shape of the probe c:X ⊕ P: Operation dilation of X by P d:XΘP: Operation erosion of X by P e:X ○ P: Operation opening of X by P f: X P: Operation closing of X by P The shape X is the vdW volume of a protein
16
mathematical morphology (con.) mathematical morphology language: The translation of the shape X by the vector p (p-translated X) is denoted by (X) p and is defined by:
17
mathematical morphology (con.) where X c is the complement of shape X X c = E 3 –X In other words, the closing of X by P is defined as a space where the probe P cannot enter when any overlaps between X and P are prohibited. The closing of X by P is called as the “molecular volume” of molecule X defined by probe P.
18
Pocket definition (con.) Eq.(12) is introduced by Masuya and Doi using mathematical morphological operations:
19
Pocket definition (con.)
20
Algorithm: Multiscale closing or multiscale molecular volume: Using K types of large probe spheres P1,P2, … Pk, and one Small probe S, must satisfy: The opening condition means that a large probe Pj can be reconstru- cted by a set of translated smaller probes Pi.
21
Algorithm (con.) If the opening condition [Eq. 16] is satisfied for all the probes {Pi}, then the following relation will hold: But …
22
Algorithm (con.) Not satisfy Eq.(16)
23
Algorithm (con.) Is the assumption WRONG ? NO! The assumption of Eq. (16) is still safe, because they use digitized pseudo-spheres as approximations of real spheres in continuous space, and therefore, the digitized pseudo-spheres should have the properties of real spheres.
24
Algorithm (con.) Only one index for the 3D grid I(x) is necessary to store K types of dilations, molecular volumes and pockets: x is a 3D point, I D (x), I C (x) and I P (x) are integers determined by a 3D point x. Multiscale dilation Multiscale closing or Multiscale molecular volume Multiscale pocket
26
Algorithm (con.) R inaccess : The minimum inaccessible radius, means the minimum radius of spheres that cannot touch the point x. As a measure of shallowness for probes on protein surface. R pocket The minimum pocket radius, means the minimum radius of spheres with which the point x is within the pocket.
27
Algorithm (con.) Eq.(17-19) suggest an efficient algorithm for calculating multiscale dilations, molecular volumes and pockets. To implement an efficient algorithm, a shell of pockets H k is defined as the difference of kth and (k-1)th probes as follows:
28
Algorithm (con.) A general strategy for an efficient algorithm is to process a shape X using a series of shells, progressing in size from smaller to large shell( H1, H2, …, Hk). The algorithm is shown in Figure 4. In this study, the grid width was set to 0.8 Å, the radius of the probe S was set to 1.87 Å, and 17 types of different large probes Pk were used, their radius were: 2.0, 2.5, 3.0, 3.5,…. And 10 Å.
29
Algorithm (con.) Calculation of R inaccess for ligand atoms A measure of pocket shallowness for probes or atoms of binding ligands is useful for characterizing binding pockets. |L| is the number of points in the sharp L of the ligand. A: 1/((1/3 + 1/4 + 1/4 )/3) = 3.6 Å B: 1/((1/6 + 1/5 + 1/5 )/3) = 5.3 Å
30
Algorithm (con.) Calculation of R inaccess and pocketness for protein atoms and residues A measure for characterizing the depth of a protein atom or residue is useful for analyzing the relationship between ligand types and surrounding protein atom types. For characterizing the depth of protein atoms, they introduced the concept of “accessible shell volume” around a part of protein Y: where shell Y is a part of a protein shape X (Y ⊂ X), and S is a spherical probe.
32
Algorithm (con.) The measure of pocketness for a protein atom or residue, indicating how much it contributes to binding ligands. Generally speaking, deep and large pockets tend to bind ligands. Here is a measure pocketness to indicate both size and depth of a pocket: A residue in a deeper and larger pocket has a larger value of pocketness.
33
Algorithm (con.) Clustering grids and filtering out small clusters Most of ligands are bound in the largest pockets. The procedure of clustering pockets and extracting only large pocket clusters have been widely used by researchers. In this study, using multiscale boundaries of pockets need a threshold value of the R pocket measure for the boundary between the pocket and the open outer space. [will shown in “Results” section]
34
Dataset Prepared from SCOP database, V 1.73 Included protein chains with mutual sequence identities of 40% or less. Exclude: Small proteins with less than 40 residues Protein chains with domains of class f,h,i,j,k, total 7375 chains Extract the chains bound to “proper” small molecules, exclude: Tiny molecules Unnatural precipitants: BOG, DTT, EPE, GOL, MES, MPD, MRD, PG4 and TRS. DNA, RNA ( >= 3 ntd) and proteins (>=10 aa) Chains with more than 10,000 heavy atoms As a result: 1817 chains were included. Each of which contacted at least one proper small molecule. Only use bound chains.
35
Evaluation of binding site predictions using recall-precision plots For purpose of comparison, calculated pockets and binding ligands were represented by pockets or ligands with 0.8Å width; each point was checked to determine if it was inside of the pockets or binding ligands. N P is the number of grid points in pockets, N L is the number of grid point overlapping with ligands, and N PL is the number of grid points in pockets that overlapped with ligands.
36
Results 1dwd
37
Results
40
Useful discoveries The majority of molecules binding in deep pockets were coenzymes In contrast, adenine and guanine mononucleotides tend to bind in medium- to-shallow pockets Macromolecules tend to bind in shallow pockets or protruded regions
41
Useful discoveries In the typical binding pose of the dataset HEM molecule, the aromatic atoms CBB and CMC are facing proteins, whereas the carboxyl atoms O1A and O2A are facing water. In the ADP molecule, the atom N6 in the adenine ring and the atom O1B, O2B and O3B of phosphate group favored deep pockets, the atoms of sugar, such as O2’ and O3’, favored shallow pockets. N6 side of adenine atoms and the phosphate termini are facing proteins, while the sugar atoms are facing water.
42
Summary: 1. A new definition of pockets by using the basic operations of mathematical morphology 2. Proposed an efficient algorithm for finding pockets 3. Construct a useful dataset for algorithm testing 4. Introduced a new method for evaluate binding site predictions with precision and recall. 5. Some useful discoveries
43
Thanks! Any questions? Please feel free to ask me!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.