A Geometric Clustering Algorithm

A Geometric Clustering Algorithm
and Its Applications to Structural Data Guido Muscioni Lorenzo Semeria Shuangxi zhu

Outline Context introduction Datasets description The algorithm
Results From: “A Geometric Clustering Algorithm and Its Applications to Structural Data”, Shutan Xu, Shuxue Zou, and Lincong Wang

context Incredible growth of complex structured data in molecular biology Nuclear Magnetic Resonance (NMR) spectroscopy Protein-ligand Docking Molecular Dynamics (MD) simulations

Nuclear Magnetic Resonance spectroscopy
NMR machine at the École polytechnique fédérale de Lausanne Based on magnetic property of atoms Determine properties of: Atoms Molecules Produces a unique firm for each molecules

Protein-ligand docking
Protein modelling technique Predict the structure of a protein Image from:

Molecular dynamics simulation
Based on strength between molecules Computing and updating the position based on the iteractions between molecules N-body simulation techniques

dataset NMR dataset Protein-ligand dataset Based on SiR5
Two set of intermediates: Computed on 101 residues Large amount of them 22 set of poses (retrieved by GOLD suite) 500 poses available for each initial protein-ligand complex

algorithm Cluster based algorithm RSDM
Similarity measure between two structure in the same created cluster.

Steps Check for dmax Seed another cluster Reclustering based on d
Create two more seeds Initializing two cluster Cluster the former dcc<dmax yes Cluster the latter no

Number of structures Ns
results Metrics: Number of structures Ns VDW energy NOE violation

BASELINE Complete link Geometric clustering Average link VS K-medoid

Result-nmr Complete link Geometric clustering Average link VS K-medoid

Both show problems, the level of confidence of the results is not high
Result-Protein-lingand Better identification of clusters Sometimes fails in recognizing the best cluster Geometric clustering GOLD score VS Both show problems, the level of confidence of the results is not high

Discussion Pros & What we liked Cons & What we disliked
Algorithm is not clearly explained (elements are assigned to new clusters with no apparent criteria) Complexity is high (O (n^2 * log(n) ) ) A prior knowledge is needed to have a good result The datasets may not be big enough to be realistic The datasets may not represent real data# Second part in the result section does not have a good description It’s simple (it is a “standard” iterative clustering algorithm) The results are better than other clustering algorithms for these problems Not only implementing new algorithm, but It also came up with a relatively new scoring function for best cluster selection

Questions

A Geometric Clustering Algorithm

Similar presentations

Presentation on theme: "A Geometric Clustering Algorithm"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Geometric Clustering Algorithm

Similar presentations

Presentation on theme: "A Geometric Clustering Algorithm"— Presentation transcript:

Similar presentations

About project

Feedback