Download presentation
Presentation is loading. Please wait.
Published byChase Burns Modified over 11 years ago
1
UGM 2006 Miklós Vargyas Whats new in JKlustor
2
Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance LibMCS, an alternative approach to clustering chemical structures –Concepts, motivation –Features –Performance Future of JKlustor
3
Brief history of JKlustor First discovery tool in the JChem package –Jarp released in version 1.5.2 (March 22, 2001) –Compr 1.5.7 (May 27, 2001) –Ward 1.5.9 (Jun 25, 2001) API released in JChem 1.6.2 (May 16, 2002) Experimental LibMCS first released in JChem 3.0 (Dec 1, 2004) New JKlustor GUI to be released in JChem 3.?
4
JKlustor features Similarity based clustering –ChemAxons topological fingerprint –External data points, arbitrary dimension –Tanimoto, weighted Euclidean Hierarchical clustering: Ward –Reciprocal nearest neighbor algorithm –Kelley method Non-hierarchical clustering: Jarvis-Patrick Diversity calculation: Compr Structure based clustering: LibMCS
5
JKlustor usage Command line tools –Pipelining commands –Option flags –Structure file/database input –Manual creation of cluster views Input SDFile GenerateMDNNeib JarvisPatrickCreateViewMarvinViewPicture
6
JKlustor usage generatemd c input.sdf -k CF -c cfp.xml -D -o fingerprints.txt nneib -f 512 -t 0.1 -g –i fingerprints.txt –o neighborlists.txt jarp -c 0.2 -y –i neighborlists.txt –o clusters.txt Prepare data and run clustering View first cluster View centroids, display cluster id and size crview -i id -c "clid=1" -s input.sdf -t clusters.txt –o jarp_cluster1.sdf mview –c 3 -r 3 jarp_cluster1.sdf crview -i "centr:2" -c "size>=20" -d "clid:size" -s input.sdf -t clusters.txt -o jarp_centroids.sdf mview -c 3 -r 3 -f "clid:size" jarp_centroids.sdf
7
JKlustor usage
8
JKlustor performance Memory: O(n) Time: Jarvis-Patrick O(n 1.5 ), Ward O(n 2 )
9
What is MCS? The Maximum Common Substructure of two chemical structures
10
Clustering by MCS? Find the MCS of a group of structures
11
Very brief history of LibMCS Reaction automapper, based on Maximum Common Subgraph Search MCS class API made public Customer requested MCS based clustering –More intuitive than similarity based –Focused set analysis screens: 2000 – 10000 structures lead optimization: 3000 – 5000 structures –Should be hierarchical (outliers) –Ultimate goal: cluster 5000 compounds in 5 seconds
12
LibMCS features MCS based hierarchical clustering Flexible search options Hierarchy browser Filtering by chemical properties Cluster statistics No size limitation Fast operation
13
LibMCS – Dendogram view
14
LibMCS – Molecule view
15
LibMCS – Table view
16
LibMCS – Statistics
17
LibMCS – Selections
18
LibMCS – Property filters
19
LibMCS – Output files
20
CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)CN(C2CCCC2)C1=O 0 21 0 CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)C2CCCN2C1=O 0 21 0 OC(=O)C1CCCN1C(=O)CCS CC(CS)C(=O)N1CCCC1C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS [H]C1(CCCN1C(=O)CCS)C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCC2SC(=O)C3=CC=CC=C3 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCCC2S 0 19 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1SC(=O)C2(C)CC3=CC=CC=C3CN2C1=O 0 20 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1CSC(=O)C2CC3=C(CN2C1=O)C=CC=C3 0 20 0 CC1SC(=O)C2CCCN2C1=O CC1SC(=O)C2CCCN2C1=O 0 30 0 CC1SC(=O)CNC1=O CC1SC(=O)CNC1=O 0 29 0 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 0 31 0 CC(S)C(=O)NCC(O)=O CC(S)C(=O)NCC(O)=O 0 24 0 CCC1=CC=CC=C1 CC(NC(CCC1=CC=CC=C1)C(O)=O)C(=O)N2CCCC2C(O)=O 0 22 0 CCC1=CC=CC=C1 CCOC(=O)C(CC1=CC=CC=C1)NC(=O)NC(CC2=CC=CC=C2)C(=O)OCC 0 22 0 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 0 23 0 C\C(Cl)=N/OC(N)=O C\C(Cl)=N/OC(N)=O 0 27 > 1163 > 1 > 1 $$$$ Marvin 05290619172D 23 24 0 0 0 0 999 V2000 2.4230 -0.3587 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.0538 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.8788 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.4349 -1.1837 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.1494 -1.5962 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.8638 -1.1837 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0
21
LibMCS – RGroup decomposition
23
LibMCS – Performance Depends on –average structure size –total diversity –minimal required MCS size –atom/bond constraints Scales linearly Maximum speed achieved –1 000 structures in 3 seconds Memory requirements –100 000 structures occupy 200MB
24
LibMCS – Performance
25
LibMCS – Further applications Find the MCS of existing clusters Data retrieval Assay analysis Compound acquisition Combinatorial library profiling
26
Development plans Disconnected MCS Multi-group clustering More chemical sense (e.g. avoid opening rings, consider chirality) Performance tuning (e.g. NN) Integrate Ward/Jarp into new GUI Additive clustering Clustering million compound libraries Integrate Chemical Terms Integrate molecular descriptors, optimized metrics
27
Summary New tool in JKlustor based on MCS More plausible grouping Hierarchical with dendogram browser Statistics Filtering, coloring, selection
28
Acknowledgements Developers –Ferenc Csizmadia, Árpád Tamási, András Volford, Szilárd Doránt –Péter Vadász, Nóra Máté Special thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.