Distributed Ligand and Monomer Object Database Milorad To s ic, John Westbrook, Helen Berman Rutgers, The State University of New Jersey Department of.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Persistent object-oriented hyper-graph model for Maximal Common Substructure (MCS) search Milorad Tosic, Ph.D. Rutgers, The State University of New Jersey.
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Project Proposal.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Mining Graphs.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Common Object Request Broker Architecture (CORBA) By: Sunil Gopinath David Watkins.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Object-Oriented Databases
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Automatic Data Ramon Lawrence University of Manitoba
Data Flow Analysis Compiler Design Nov. 8, 2005.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
University of Jyväskylä – Department of Mathematical Information Technology Computer Science Teacher Education ICNEE 2004 Topic Case Driven Approach for.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Similarity Methods C371 Fall 2004.
Software Design Refinement Using Design Patterns Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
The Architecture of Secure Systems Jim Alves-Foss Laboratory for Applied Logic Department of Computer Science University of Idaho By, Nagaashwini Katta.
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
An Introduction to Design Patterns. Introduction Promote reuse. Use the experiences of software developers. A shared library/lingo used by developers.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Querying Structured Text in an XML Database By Xuemei Luo.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
Manag ing Software Change CIS 376 Bruce R. Maxim UM-Dearborn.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Marina Drosou, Evaggelia Pitoura Computer Science Department
DEVS Based Modeling and Simulation of the CORBA POA F. Bernardi, E. de Gentili, Pr. J.F. Santucci {bernardi, gentili, University.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,
Graph Indexing From managing and mining graph data.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Introduction to Machine Learning August, 2014 Vũ Việt Vũ Computer Engineering Division, Electronics Faculty Thai Nguyen University of Technology.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Software Design Refinement Using Design Patterns
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Objective of This Course
Data Warehousing and Data Mining
Simulation based approach Shang Zechao
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Presentation transcript:

Distributed Ligand and Monomer Object Database Milorad To s ic, John Westbrook, Helen Berman Rutgers, The State University of New Jersey Department of Chemistry

Introduction Key concepts –Use hyper-graph as a data model for abstract representation of unstructured data –Examine hyper-graph data model for subsimilarity search of chemical structures –Use graph-based database models for data with features that can not be numerically represented –Apply metric-based indexing for unstructured data Key applications –Bioinformatics, –Multimedia, –Query systems for World-Wide Web, –Geographic Information Systems (GIS) Software architecture –Client-Server –Distributed object database (CORBA) –Metadata description of software methodology

Application Areas Problems dealing with data that does not conform to traditional data models, like traditional relational or object oriented model. Bioinformatics –Structure subsimilarity search Introduction of topology based searching criteria improves efficiency of searching as well as gives more accurate results. Instead of comparing structures on the atom-bond level, it is much more promising to make comparison on different levels of basic building substructures hierarchy. Multimedia –Features of multimedia data (sound, picture, movie) can not be completely represented by numerical values. Usually, set of feature values and relation between them appears as necessary. Query systems for World-Wide Web –Physical data representation is diverse as well as connection pattern. Geographic Information Systems (GIS) –Nature of both data and query is topology-oriented (similar or exact paths between two points, similar areas, …)

LIMA (LIgand MAtching) Search Engine for Maximal Common Substructure Search Strategy Back-tracking –Back-tracking is used as an background algorithm for finding maximal common subgraph Topology-based comparison criteria –Topology-based features of chemical structures are employed for structure efficient description –Topological queries and indexing in collection of distributed objects can be generalized to other applications –Heuristics for reducing average searching time postpone the computational explosion for the larger structures based on a substructure-by-substructure rather than atom-by-atom search Distributed objects –Distributed computing is explored for increasing processing speed –Persistent objects are essential for robustness of the searching engine [XUJ96], [EST98], [WAN98] [PSV99]

LIMA software architecture LIMA version 1 is a client-server application –Employs a completely in-memory database model –high resource impact LIMA version 2 is a distributed object-oriented application –CORBA implementation for interoperability –Consists of CORBA interfaces representing: Agents - objects that obey Composite and Chain of responsibility design patterns –Agent object stores and searches a number of structures. –Agent objects can be created by independent creators. –Creator of agent is freed of any administration responsibilities. Agents check in to the SystemAdministration object which performs further administration. SystemAdministration - object that obeys Decorator design pattern –Keeps track of all agents in the system, and provides administrative and user services to outside users. Administrator - application providing user interface for administration purposes Client - application providing user interface for basic user services provided by the system as a whole.

Details of the LIMA software architecture - a CORBA based implementation

Details of the CORBA reference model Use case overview: IDL interface is defined and compiled Servant containing application code for defined interface is made. Object is created by linking code generated from IDL interface to Servant on both programming language and compiler level Object is connected to ORB by Object Adapter Client is created by linking code generated from IDL interface to user code on both programming language and compiler level Client gets generated Object’s reference and accesses the Object by reference as they are in the same program space. Object Management Group, The Common Object Request Broker: Architecture and Specification, 2.2 ed., Feb.1998.

LIMA - topology-based comparison criteria LIMA is used for experimental evaluation of topology-based comparison criteria –LIMA has been used for subsimilarity search of the small molecule components of the macromolecular-ligand complexes in the PDB and NDB. –LIMA was used to classify and check the topologies of the 2500 ligands and modified residues contained in the PDB that makes it very useful tool for experimental evaluation of our assumptions in the early system development phase. Is there any searching speed-up due to introduction of topology-based comparison criteria ? –Compare searching time with and without topology-based criteria. –The topology criterion based on ring number is used: An atom X matches atom Y iff they have the same atom types and number of rings that X belongs to is not greater than that Y belongs to.. –In order to examine how atom types influence searching process, the same set of target structures is applied including as well as excluding hydrogens. Is there any improvement in quality of the searching results due to introduction of topology-based comparison criteria ? –Does topology-based comparison criteria improve substructure similarity measure? –Compare resulting structures obtained by searching with and without topology-based criteria.

Experimental results from LIMA Is there any searching speed-up due to introduction of topology-based comparison criteria ? - YES Searching speed-up is evident if topology-based criteria are applied. Oscillations in searching time indicate further potential for improving speed. Exponential complexity remains (both curves have the same growing tendency), but by introducing topology-based criteria point of the run-time explosion is translated into the area of much more complex structures. Relative improvement is higher for the case where structures without hydrogens are considered.

Eliminated Experimental results from LIMA Is there any improvement in quality of the searching results due to introduction of topology-based comparison criteria ? - YES Topological criteria based on rings successfully decreases number of resulting structures. Increased probability for expected structures to be found in the set of resulting structures. Target Matching

Hyper-graph: definitions Definition: A hyper-graph HG is an ordered two-tuple HG = (C,E), where C is set of hyper-graphs that are containers of HG, and E is a set of hyper-graphs that are elements of HG : C = { c | c > HG }, E = { e | e < HG } Definition: An undirected hyper-graph HG is an ordered two-tuple HG = ((C, E), I), where ( C,E) is hyper-graph, and I is set of undirected hyper-graphs that are neighbors of the HG. We say that HG is in undirected connection relation with its neighbors. Definition: The undirected connection relation is an equivalence relation. Definition: An directed hyper-graph HG is an ordered three-tuple HG = ((C, E), I, O), where ( C,E) is hyper-graph, I is set of directed hyper-graphs that are input neighbors of the HG, and O is set of directed hyper-graphs that are output neighbors of the HG. We say that HG is in directed connection relation with its neighbors. Definition: The directed connection relation is an order relation. Note: We use the undirected hyper-graph in MCS.

Hyper-graph: example v1 v5 v7 v8 v6 v4 v2 v3 e23 e12 e45 e24 e35 e57 e46 e67 e68 v1: id = v1; type = VERTEX; Container = {G1}; Elements = {}; InElements = {e12}; v2: id = v2; type = VERTEX; Container = {G1}; Elements = {}; InElements = {e12, e23, e24}; G1: id = G1; type = GRAPH; Container = {}; Elements = {v1, …, v8, e12, e23, …,e68}; InElements = {};... e12: id = e12; type = EDGE; Container = {G1}; Elements = {}; InElements = {v1,v2}; e23: id = e23; type = EDGE; Container = {G1}; Elements = {}; InElements = {v2, v3};...

Hyper-graph: example (con’t) After simple-loop reduction v5 v7 v6 v4 e45 e57 e46 e67 G2: id = G2; type = GRAPH; Container = {}; Elements = {g1,g2,g3,g4, e1,e2,e3,e4}; InElements = {}; v1 v2 e12v5 v4 v2 v3 e23 e45 e24 e35 v8 v6 e68 g1g2g3g4 e1e2e3 g1: id = g1; type = GRAPH; Container = {G2}; Elements = {v1,v2,e12}; InElements = {e1}; g2: id = g2; type = LOOP; Container = {G2}; Elements = {v2,v3,v4,v5,e23,e24,e35,e45}; InElements = {e1, e2}; e1: id = e1; type = EDGE; Container = {G2}; Elements = {v2}; InElements = {g1,g2}; e2: id = e2; type = EDGE; Container = {G2}; Elements = {v4,v5,e45}; InElements = {g2, g3};

Conclusions Experimental analysis proved that topological abstraction of chemical structure data (e.g. rings in chemical structures) can improve both searching efficiency and accuracy of searching results. New hyper-graph model –is able to efficiently represent topology features of a chemical structure, in a hierarchical way. –is applicable to any application dealing with unstructured data –can be efficiently implemented by a distributed software architecture Hyper-graph model and distributed object technology used here generalize to other applications areas which use unstructured data

Related work Graph-based data models and query languages –Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu, Adding structure to unstructured data. In Proceedings of the International Conference on Database Theory, pages , Delphi, Greece, 1997, Springer Verlag. –Tova Milo, and Dan Suciu, Index Structures for Path Expressions, In ICDT’99, LNCS 1540, pp ,1998, Springer Verlag –Ralf Hartmut G üting, GraphDB: A data model and query language for graphs in databases, VLDB 1994, pp Similarity search in metric instead of vector space –Paolo Ciaccia, Marco Patella, Pavel Zezula, M-tree: An efficient access method for similarity search in metric spaces, In Proceedings of the 23rd VLDB Conference Athens, Greece, –Monika R. Henzinger, Thomas A. Henzinger, and Peter W. Kopke, Computing simulations on finite and infinite graphs, In Proceedings of the 36th Annual IEEE Symposium on Foundations of Computer Science (FOCS95), October 1995, pp –Tolga Bozkaya, Nasser Yazdani, and Meral Özsoyoglu, Matching and indexing sequences of different lengths, Proc ACM CIKM, Sixth International Conference on Information and Knowledge Management, Las Vegas, Nevada, Nov

References [DOW96]Downs, G.M., and Willett, P. (1995), Similarity searching in databases of chemical structures., Rev. Comput. Chem., 7, [GWW96]Gillet, V.J., Wild, D.J., Willet, P., and Bradshaw, J. (1998), Similarity and dissimilarity methods for processing chemical structure databases., The Computer Journal, 41, No. 8, [HAG92]Hagadone, T.R., (1992), Molecule substructure similarity searching: Efficient retrival in two- dimensional structure databases., J. Chem. Inf. Comput. Sci., 32, [WAN98]Wang, T., and Zhou, J., (1998), 3DFS: A new 3D flexible searching system for use in drug design., J. Chem. Inf. Comput. Sci., 38, [XUJ96]Xu, J., (1996), GMA: A generic match algorithm for structural homomorphism, isomorphism, and maximal common substructure match and its applications., J. Chem. Inf. Comput. Sci., 36, [PSV99]Papadimitriou, C.H., Suciu, D., and Vianu, V., (1999), Topological queries in spatial databases., Journal of Comput. and Sys. Sci., 58, [ART92]Artymiuk, J., et. all., (1992), Similarity searching of three-dimensional molecules and macromolecules., J. Chem. Inf. Comput. Sci., 32, [BAR93]Barnard, J.M., (1993), Substructure searching methods: Old and New., J. Chem. Inf. Comput. Sci., 33, [EST98]Estrada, E., (1998), Spectral moments of the edge adjacency matrix in molecular graphs., J. Chem. Inf. Comput. Sci., 38,