Presentation is loading. Please wait.

Presentation is loading. Please wait.

UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance.

Similar presentations


Presentation on theme: "UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance."— Presentation transcript:

1 UGM 2006 Miklós Vargyas Whats new in JKlustor

2 Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance LibMCS, an alternative approach to clustering chemical structures –Concepts, motivation –Features –Performance Future of JKlustor

3 Brief history of JKlustor First discovery tool in the JChem package –Jarp released in version 1.5.2 (March 22, 2001) –Compr 1.5.7 (May 27, 2001) –Ward 1.5.9 (Jun 25, 2001) API released in JChem 1.6.2 (May 16, 2002) Experimental LibMCS first released in JChem 3.0 (Dec 1, 2004) New JKlustor GUI to be released in JChem 3.?

4 JKlustor features Similarity based clustering –ChemAxons topological fingerprint –External data points, arbitrary dimension –Tanimoto, weighted Euclidean Hierarchical clustering: Ward –Reciprocal nearest neighbor algorithm –Kelley method Non-hierarchical clustering: Jarvis-Patrick Diversity calculation: Compr Structure based clustering: LibMCS

5 JKlustor usage Command line tools –Pipelining commands –Option flags –Structure file/database input –Manual creation of cluster views Input SDFile GenerateMDNNeib JarvisPatrickCreateViewMarvinViewPicture

6 JKlustor usage generatemd c input.sdf -k CF -c cfp.xml -D -o fingerprints.txt nneib -f 512 -t 0.1 -g –i fingerprints.txt –o neighborlists.txt jarp -c 0.2 -y –i neighborlists.txt –o clusters.txt Prepare data and run clustering View first cluster View centroids, display cluster id and size crview -i id -c "clid=1" -s input.sdf -t clusters.txt –o jarp_cluster1.sdf mview –c 3 -r 3 jarp_cluster1.sdf crview -i "centr:2" -c "size>=20" -d "clid:size" -s input.sdf -t clusters.txt -o jarp_centroids.sdf mview -c 3 -r 3 -f "clid:size" jarp_centroids.sdf

7 JKlustor usage

8 JKlustor performance Memory: O(n) Time: Jarvis-Patrick O(n 1.5 ), Ward O(n 2 )

9 What is MCS? The Maximum Common Substructure of two chemical structures

10 Clustering by MCS? Find the MCS of a group of structures

11 Very brief history of LibMCS Reaction automapper, based on Maximum Common Subgraph Search MCS class API made public Customer requested MCS based clustering –More intuitive than similarity based –Focused set analysis screens: 2000 – 10000 structures lead optimization: 3000 – 5000 structures –Should be hierarchical (outliers) –Ultimate goal: cluster 5000 compounds in 5 seconds

12 LibMCS features MCS based hierarchical clustering Flexible search options Hierarchy browser Filtering by chemical properties Cluster statistics No size limitation Fast operation

13 LibMCS – Dendogram view

14 LibMCS – Molecule view

15 LibMCS – Table view

16 LibMCS – Statistics

17 LibMCS – Selections

18 LibMCS – Property filters

19 LibMCS – Output files

20 CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)CN(C2CCCC2)C1=O 0 21 0 CCCN1CC(=O)SCC(C)C1=O CC1CSC(=O)C2CCCN2C1=O 0 21 0 OC(=O)C1CCCN1C(=O)CCS CC(CS)C(=O)N1CCCC1C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS [H]C1(CCCN1C(=O)CCS)C(O)=O 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCC2SC(=O)C3=CC=CC=C3 0 19 0 OC(=O)C1CCCN1C(=O)CCS OC(=O)C1CCCN1C(=O)C2CCCCC2S 0 19 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1SC(=O)C2(C)CC3=CC=CC=C3CN2C1=O 0 20 0 CCC(=O)N(CC1=CC=CC=C1)C(C)C=O CC1CSC(=O)C2CC3=C(CN2C1=O)C=CC=C3 0 20 0 CC1SC(=O)C2CCCN2C1=O CC1SC(=O)C2CCCN2C1=O 0 30 0 CC1SC(=O)CNC1=O CC1SC(=O)CNC1=O 0 29 0 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 OC(=O)C1CSCCCCCCCCC(CS)C(=O)N1 0 31 0 CC(S)C(=O)NCC(O)=O CC(S)C(=O)NCC(O)=O 0 24 0 CCC1=CC=CC=C1 CC(NC(CCC1=CC=CC=C1)C(O)=O)C(=O)N2CCCC2C(O)=O 0 22 0 CCC1=CC=CC=C1 CCOC(=O)C(CC1=CC=CC=C1)NC(=O)NC(CC2=CC=CC=C2)C(=O)OCC 0 22 0 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 OC(=O)C1CCCN1C(=O)NC2=CC=CC=C2 0 23 0 C\C(Cl)=N/OC(N)=O C\C(Cl)=N/OC(N)=O 0 27 > 1163 > 1 > 1 $$$$ Marvin 05290619172D 23 24 0 0 0 0 999 V2000 2.4230 -0.3587 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.0538 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.1375 0.8788 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.4349 -1.1837 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -1.1494 -1.5962 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.8638 -1.1837 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0

21 LibMCS – RGroup decomposition

22

23 LibMCS – Performance Depends on –average structure size –total diversity –minimal required MCS size –atom/bond constraints Scales linearly Maximum speed achieved –1 000 structures in 3 seconds Memory requirements –100 000 structures occupy 200MB

24 LibMCS – Performance

25 LibMCS – Further applications Find the MCS of existing clusters Data retrieval Assay analysis Compound acquisition Combinatorial library profiling

26 Development plans Disconnected MCS Multi-group clustering More chemical sense (e.g. avoid opening rings, consider chirality) Performance tuning (e.g. NN) Integrate Ward/Jarp into new GUI Additive clustering Clustering million compound libraries Integrate Chemical Terms Integrate molecular descriptors, optimized metrics

27 Summary New tool in JKlustor based on MCS More plausible grouping Hierarchical with dendogram browser Statistics Filtering, coloring, selection

28 Acknowledgements Developers –Ferenc Csizmadia, Árpád Tamási, András Volford, Szilárd Doránt –Péter Vadász, Nóra Máté Special thanks


Download ppt "UGM 2006 Miklós Vargyas Whats new in JKlustor. Overview An introduction to JKlustor –Brief history of the product –Main features –Usage examples –Performance."

Similar presentations


Ads by Google