Download presentation
Presentation is loading. Please wait.
1
A Geometric Clustering Algorithm
and Its Applications to Structural Data Guido Muscioni Lorenzo Semeria Shuangxi zhu
2
Outline Context introduction Datasets description The algorithm
Results From: “A Geometric Clustering Algorithm and Its Applications to Structural Data”, Shutan Xu, Shuxue Zou, and Lincong Wang
3
context Incredible growth of complex structured data in molecular biology Nuclear Magnetic Resonance (NMR) spectroscopy Protein-ligand Docking Molecular Dynamics (MD) simulations
4
Nuclear Magnetic Resonance spectroscopy
NMR machine at the École polytechnique fédérale de Lausanne Based on magnetic property of atoms Determine properties of: Atoms Molecules Produces a unique firm for each molecules
5
Protein-ligand docking
Protein modelling technique Predict the structure of a protein Image from:
6
Molecular dynamics simulation
Based on strength between molecules Computing and updating the position based on the iteractions between molecules N-body simulation techniques
7
dataset NMR dataset Protein-ligand dataset Based on SiR5
Two set of intermediates: Computed on 101 residues Large amount of them 22 set of poses (retrieved by GOLD suite) 500 poses available for each initial protein-ligand complex
8
algorithm Cluster based algorithm RSDM
Similarity measure between two structure in the same created cluster.
9
Steps Check for dmax Seed another cluster Reclustering based on d
Create two more seeds Initializing two cluster Cluster the former dcc<dmax yes Cluster the latter no
10
Number of structures Ns
results Metrics: Number of structures Ns VDW energy NOE violation
11
BASELINE Complete link Geometric clustering Average link VS K-medoid
12
Result-nmr Complete link Geometric clustering Average link VS K-medoid
13
Both show problems, the level of confidence of the results is not high
Result-Protein-lingand Better identification of clusters Sometimes fails in recognizing the best cluster Geometric clustering GOLD score VS Both show problems, the level of confidence of the results is not high
14
Discussion Pros & What we liked Cons & What we disliked
Algorithm is not clearly explained (elements are assigned to new clusters with no apparent criteria) Complexity is high (O (n^2 * log(n) ) ) A prior knowledge is needed to have a good result The datasets may not be big enough to be realistic The datasets may not represent real data# Second part in the result section does not have a good description It’s simple (it is a “standard” iterative clustering algorithm) The results are better than other clustering algorithms for these problems Not only implementing new algorithm, but It also came up with a relatively new scoring function for best cluster selection
15
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.