Download presentation
Presentation is loading. Please wait.
Published byWillis Foster Modified over 9 years ago
2
Distributed Ligand and Monomer Object Database Milorad To s ic, John Westbrook, Helen Berman Rutgers, The State University of New Jersey Department of Chemistry
3
Introduction Key concepts –Use hyper-graph as a data model for abstract representation of unstructured data –Examine hyper-graph data model for subsimilarity search of chemical structures –Use graph-based database models for data with features that can not be numerically represented –Apply metric-based indexing for unstructured data Key applications –Bioinformatics, –Multimedia, –Query systems for World-Wide Web, –Geographic Information Systems (GIS) Software architecture –Client-Server –Distributed object database (CORBA) –Metadata description of software methodology
4
Application Areas Problems dealing with data that does not conform to traditional data models, like traditional relational or object oriented model. Bioinformatics –Structure subsimilarity search Introduction of topology based searching criteria improves efficiency of searching as well as gives more accurate results. Instead of comparing structures on the atom-bond level, it is much more promising to make comparison on different levels of basic building substructures hierarchy. Multimedia –Features of multimedia data (sound, picture, movie) can not be completely represented by numerical values. Usually, set of feature values and relation between them appears as necessary. Query systems for World-Wide Web –Physical data representation is diverse as well as connection pattern. Geographic Information Systems (GIS) –Nature of both data and query is topology-oriented (similar or exact paths between two points, similar areas, …)
5
LIMA (LIgand MAtching) Search Engine for Maximal Common Substructure Search Strategy Back-tracking –Back-tracking is used as an background algorithm for finding maximal common subgraph Topology-based comparison criteria –Topology-based features of chemical structures are employed for structure efficient description –Topological queries and indexing in collection of distributed objects can be generalized to other applications –Heuristics for reducing average searching time postpone the computational explosion for the larger structures based on a substructure-by-substructure rather than atom-by-atom search Distributed objects –Distributed computing is explored for increasing processing speed –Persistent objects are essential for robustness of the searching engine [XUJ96], [EST98], [WAN98] [PSV99]
6
LIMA software architecture LIMA version 1 is a client-server application –Employs a completely in-memory database model –high resource impact LIMA version 2 is a distributed object-oriented application –CORBA implementation for interoperability –Consists of CORBA interfaces representing: Agents - objects that obey Composite and Chain of responsibility design patterns –Agent object stores and searches a number of structures. –Agent objects can be created by independent creators. –Creator of agent is freed of any administration responsibilities. Agents check in to the SystemAdministration object which performs further administration. SystemAdministration - object that obeys Decorator design pattern –Keeps track of all agents in the system, and provides administrative and user services to outside users. Administrator - application providing user interface for administration purposes Client - application providing user interface for basic user services provided by the system as a whole.
7
Details of the LIMA software architecture - a CORBA based implementation
8
Details of the CORBA reference model Use case overview: IDL interface is defined and compiled Servant containing application code for defined interface is made. Object is created by linking code generated from IDL interface to Servant on both programming language and compiler level Object is connected to ORB by Object Adapter Client is created by linking code generated from IDL interface to user code on both programming language and compiler level Client gets generated Object’s reference and accesses the Object by reference as they are in the same program space. Object Management Group, The Common Object Request Broker: Architecture and Specification, 2.2 ed., Feb.1998.
9
LIMA - topology-based comparison criteria LIMA is used for experimental evaluation of topology-based comparison criteria –LIMA has been used for subsimilarity search of the small molecule components of the macromolecular-ligand complexes in the PDB and NDB. –LIMA was used to classify and check the topologies of the 2500 ligands and modified residues contained in the PDB that makes it very useful tool for experimental evaluation of our assumptions in the early system development phase. Is there any searching speed-up due to introduction of topology-based comparison criteria ? –Compare searching time with and without topology-based criteria. –The topology criterion based on ring number is used: An atom X matches atom Y iff they have the same atom types and number of rings that X belongs to is not greater than that Y belongs to.. –In order to examine how atom types influence searching process, the same set of target structures is applied including as well as excluding hydrogens. Is there any improvement in quality of the searching results due to introduction of topology-based comparison criteria ? –Does topology-based comparison criteria improve substructure similarity measure? –Compare resulting structures obtained by searching with and without topology-based criteria.
10
Experimental results from LIMA Is there any searching speed-up due to introduction of topology-based comparison criteria ? - YES Searching speed-up is evident if topology-based criteria are applied. Oscillations in searching time indicate further potential for improving speed. Exponential complexity remains (both curves have the same growing tendency), but by introducing topology-based criteria point of the run-time explosion is translated into the area of much more complex structures. Relative improvement is higher for the case where structures without hydrogens are considered.
11
Eliminated Experimental results from LIMA Is there any improvement in quality of the searching results due to introduction of topology-based comparison criteria ? - YES Topological criteria based on rings successfully decreases number of resulting structures. Increased probability for expected structures to be found in the set of resulting structures. Target Matching
12
Hyper-graph: definitions Definition: A hyper-graph HG is an ordered two-tuple HG = (C,E), where C is set of hyper-graphs that are containers of HG, and E is a set of hyper-graphs that are elements of HG : C = { c | c > HG }, E = { e | e < HG } Definition: An undirected hyper-graph HG is an ordered two-tuple HG = ((C, E), I), where ( C,E) is hyper-graph, and I is set of undirected hyper-graphs that are neighbors of the HG. We say that HG is in undirected connection relation with its neighbors. Definition: The undirected connection relation is an equivalence relation. Definition: An directed hyper-graph HG is an ordered three-tuple HG = ((C, E), I, O), where ( C,E) is hyper-graph, I is set of directed hyper-graphs that are input neighbors of the HG, and O is set of directed hyper-graphs that are output neighbors of the HG. We say that HG is in directed connection relation with its neighbors. Definition: The directed connection relation is an order relation. Note: We use the undirected hyper-graph in MCS.
13
Hyper-graph: example v1 v5 v7 v8 v6 v4 v2 v3 e23 e12 e45 e24 e35 e57 e46 e67 e68 v1: id = v1; type = VERTEX; Container = {G1}; Elements = {}; InElements = {e12}; v2: id = v2; type = VERTEX; Container = {G1}; Elements = {}; InElements = {e12, e23, e24}; G1: id = G1; type = GRAPH; Container = {}; Elements = {v1, …, v8, e12, e23, …,e68}; InElements = {};... e12: id = e12; type = EDGE; Container = {G1}; Elements = {}; InElements = {v1,v2}; e23: id = e23; type = EDGE; Container = {G1}; Elements = {}; InElements = {v2, v3};...
14
Hyper-graph: example (con’t) After simple-loop reduction v5 v7 v6 v4 e45 e57 e46 e67 G2: id = G2; type = GRAPH; Container = {}; Elements = {g1,g2,g3,g4, e1,e2,e3,e4}; InElements = {}; v1 v2 e12v5 v4 v2 v3 e23 e45 e24 e35 v8 v6 e68 g1g2g3g4 e1e2e3 g1: id = g1; type = GRAPH; Container = {G2}; Elements = {v1,v2,e12}; InElements = {e1}; g2: id = g2; type = LOOP; Container = {G2}; Elements = {v2,v3,v4,v5,e23,e24,e35,e45}; InElements = {e1, e2}; e1: id = e1; type = EDGE; Container = {G2}; Elements = {v2}; InElements = {g1,g2}; e2: id = e2; type = EDGE; Container = {G2}; Elements = {v4,v5,e45}; InElements = {g2, g3};
15
Conclusions Experimental analysis proved that topological abstraction of chemical structure data (e.g. rings in chemical structures) can improve both searching efficiency and accuracy of searching results. New hyper-graph model –is able to efficiently represent topology features of a chemical structure, in a hierarchical way. –is applicable to any application dealing with unstructured data –can be efficiently implemented by a distributed software architecture Hyper-graph model and distributed object technology used here generalize to other applications areas which use unstructured data
16
Related work Graph-based data models and query languages –Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu, Adding structure to unstructured data. In Proceedings of the International Conference on Database Theory, pages 3360350, Delphi, Greece, 1997, Springer Verlag. –Tova Milo, and Dan Suciu, Index Structures for Path Expressions, In ICDT’99, LNCS 1540, pp.277-295,1998, Springer Verlag –Ralf Hartmut G üting, GraphDB: A data model and query language for graphs in databases, VLDB 1994, pp.297-308. Similarity search in metric instead of vector space –Paolo Ciaccia, Marco Patella, Pavel Zezula, M-tree: An efficient access method for similarity search in metric spaces, In Proceedings of the 23rd VLDB Conference Athens, Greece, 1997. –Monika R. Henzinger, Thomas A. Henzinger, and Peter W. Kopke, Computing simulations on finite and infinite graphs, In Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science (FOCS95), October 1995, pp.453-462. –Tolga Bozkaya, Nasser Yazdani, and Meral Özsoyoglu, Matching and indexing sequences of different lengths, Proc. 1997 ACM CIKM, Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, Nov. 1997.
17
References [DOW96]Downs, G.M., and Willett, P. (1995), Similarity searching in databases of chemical structures., Rev. Comput. Chem., 7, 1-66. [GWW96]Gillet, V.J., Wild, D.J., Willet, P., and Bradshaw, J. (1998), Similarity and dissimilarity methods for processing chemical structure databases., The Computer Journal, 41, No. 8, 547- 558. [HAG92]Hagadone, T.R., (1992), Molecule substructure similarity searching: Efficient retrival in two- dimensional structure databases., J. Chem. Inf. Comput. Sci., 32, 515-521. [WAN98]Wang, T., and Zhou, J., (1998), 3DFS: A new 3D flexible searching system for use in drug design., J. Chem. Inf. Comput. Sci., 38, 71-77. [XUJ96]Xu, J., (1996), GMA: A generic match algorithm for structural homomorphism, isomorphism, and maximal common substructure match and its applications., J. Chem. Inf. Comput. Sci., 36, 25-34. [PSV99]Papadimitriou, C.H., Suciu, D., and Vianu, V., (1999), Topological queries in spatial databases., Journal of Comput. and Sys. Sci., 58, 29-53. [ART92]Artymiuk, J., et. all., (1992), Similarity searching of three-dimensional molecules and macromolecules., J. Chem. Inf. Comput. Sci., 32, 617-630. [BAR93]Barnard, J.M., (1993), Substructure searching methods: Old and New., J. Chem. Inf. Comput. Sci., 33, 532-538. [EST98]Estrada, E., (1998), Spectral moments of the edge adjacency matrix in molecular graphs., J. Chem. Inf. Comput. Sci., 38, 23-27.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.