Searching and ranking similar clusters of polyhedra in inorganic crystal structures Hans-Joachim Klein Institut f. Informatik Christian-Albrechts-Universität.

Slides:



Advertisements
Similar presentations
Atom 1 (centre) Atom 2 (centre) Joint face in both atoms polyhedra Fig. 3. Voronoi face between two atoms;it lies midway in between Analysis of atom-atom.
Advertisements

I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton.
The Topology of Graph Configuration Spaces David G.C. Handron Carnegie Mellon University
gSpan: Graph-based substructure pattern mining
22C:19 Discrete Math Graphs Fall 2010 Sukumar Ghosh.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
Motion Planning for Point Robots CS 659 Kris Hauser.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
Introduction This chapter explores graphs and their applications in computer science This chapter explores graphs and their applications in computer science.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Mining Graphs.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
CHAPTER 5: CONVEX POLYTOPES Anastasiya Yeremenko 1.
Solid State Physics (1) Phys3710
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Association Analysis (7) (Mining Graphs)
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
Symmetry Detection via String Matching Sergey Kiselev Symmetry Seminar Haifa University, Spring 2005.
1 Last lecture  Path planning for a moving Visibility graph Cell decomposition Potential field  Geometric preliminaries Implementing geometric primitives.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Computer Animations of Molecular Vibration Michael McGuan and Robert M. Hanson Summer Research 2004 Department of Chemistry St. Olaf College Northfield,
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Graph theory as a method of improving chemistry and mathematics curricula Franka M. Brückler, Dept. of Mathematics, University of Zagreb (Croatia) Vladimir.
MA5209 Algebraic Topology Wayne Lawton Department of Mathematics National University of Singapore S ,
Planar Graphs Graph G is planar, if it can be “properly” drawn in the plane. In order to explain this informal notion we have to define embeddings of graphs.
Similarity Methods C371 Fall 2004.
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
 Ⅰ Introduction to Set Theory  1. Sets and Subsets  Representation of set:  Listing elements, Set builder notion, Recursive definition  , ,  
Data Structures & Algorithms Graphs
What is a mineral? A mineral is a naturally formed, inorganic solid that has a definite crystalline structure. Inorganic means that they are not made.
Role of Rigid Components in Protein Structure Pramod Abraham Kurian.
Data Structures and Algorithms in Parallel Computing Lecture 2.
ICS 252 Introduction to Computer Design Lecture 12 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
CS 146: Data Structures and Algorithms July 16 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Graph Indexing From managing and mining graph data.
Objectives  Explain why atoms form bonds  Define chemical bond & name three types of chemical bonds  Compare and contrast the advantages and disadvantages.
Gspan: Graph-based Substructure Pattern Mining
Unit 6: Chemical Bonding and Intermolecular Forces
Cycles and Rings Hans-Joachim Klein Institut f
Algorithms for Finding Distance-Edge-Colorings of Graphs
Chapter 9 (Part 2): Graphs
Haim Kaplan and Uri Zwick
Graph Theory and Algorithm 02
Non-manifold Multiresolution Modeling (some preliminary results)
Algorithms and networks
Mining Frequent Subgraphs
of interfaces in glass/crystal composites for nuclear wasteforms
Computability and Complexity
Lab 1 Pauling’s rules.
Chapter 3: Solid State Chemistry
CRYSTAL LATTICE & UNIT CELL
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
12.1 Exploring Solids.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Algorithms and networks
CSE 589 Applied Algorithms Spring 1999
A (simple) graph is basically a network: it is a (finite) collection of points (called vertices or nodes) combined with a collection of connections between.
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Copyright © Cengage Learning. All rights reserved.
Graph Algorithms DS.GR.1 Chapter 9 Overview Representation
A Portrait of a Group on a Surface with Boundary
Presentation transcript:

Searching and ranking similar clusters of polyhedra in inorganic crystal structures Hans-Joachim Klein Institut f. Informatik Christian-Albrechts-Universität Kiel Germany

direction-dependent physical properties Definition: A crystal is an anisotropic homogeneous body consisting of a three-dimensional periodic ordering of atoms, ions, or molecules. direction-dependent physical properties parallel directions: same behaviour Three-dimensional, periodic: basic units (atoms, ions, molecules), repeating in all directions. ….. Definition

direction-dependent physical properties Definition: A crystal is an anisotropic homogeneous body consisting of a three-dimensional periodic ordering of atoms, ions, or molecules. direction-dependent physical properties parallel directions: same behaviour Three-dimensional, periodic: basic units (atoms, ions, molecules), repeating in all directions. . . . ….. ….. ….. ….. Definition

direction-dependent physical properties Definition: A crystal is an anisotropic homogeneous body consisting of a three-dimensional periodic ordering of atoms, ions, or molecules. direction-dependent physical properties parallel directions: same behaviour Three-dimensional, periodic: basic units (atoms, ions, molecules), repeating in all directions. ….. . . . Definition

Abstraction by graphs (the simple case) ligands (O) central atom (Si) Abstraction by graphs

Abstraction by graphs

Periodicity .... . . . Y X (0,-1) (-1,-1) (1,-1) (1,0) (1,1) (0,1) (-1,0) (-1,1) (0,0) Unit cell

Part of a layer in semenovite X .... . . . (0,-1) (-1,-1) (1,-1) (1,0) (1,1) (0,1) (-1,0) (-1,1)

Y X (0,-1) (-1,-1) (1,-1) (1,0) (1,1) (0,1) (-1,0) (-1,1) +(-10)- Abstraction by graphs

Labeled quotient graph / Direction-labeled graph Chung/Hahn/Klee (1984), Goetzke/Klein (1987) +(-10)- -(-10)+ +(-10)- -(-10)+ +(-10)- ≙ (10) (-10) +(0-1)- +(0-1)- +(0-1)- +(0-1)- +(-10)- Abstraction by graphs

Some properties of direction-labeled graphs n-colourability is decidable for n  2 but undecidable for n > 2. Decomposition into fundamental chains (Liebau method) is NP-complete. Isomorphism problem? Abstraction by graphs

(v1,e1,v2,...,vl-1,el-1,vl) =df  (ei) Dimensionality (v1,e1,v2,...,vl-1,el-1,vl) =df  (ei) l-1 i=1 (path direction) DR(Gdl) =df <{(c) | c Zyklus in Gdl}>  {(0,0,0)} (set of directions of repetition) Dimension of a graph Gdl: Rank (dimension) of DR(Gdl) wi wj vi vj e v <{(wi) + (e) - (wj) | e  E, (e) = vi, (e) = vj}>

Organic databases: Relatively small number of rigid substructures, mapping of 2D chemical structure onto 3D molecular structure can be used for substructure searches. Example: Cambridge Structural Database (organic and metal-organic compounds); more than 500,000 single-crystal X-ray structures. Search facilities: Structural search by drawing all or part of a molecule, selection by providing chemical, bibliographic, .... information. Problem description

Inorganic databases: Large variety of chemical elements and patterns  Problem to fix a suitable set of substructures for indexation. Example: Inorganic Crystal Structure Database; more than 140,000 entries. Search facilities: Selection in the categories cell, chemistry, symmetry, crystal chemistry, structure type, bibliography. Problem description

Example: Pearson's Crystal Data; about 212,500 entries. Inorganic databases: Large variety of chemical elements and patterns  Problem to fix a suitable set of substructures for indexation. Example: Pearson's Crystal Data; about 212,500 entries. Search facilities: interatomic distances, phase information, chemical composition, atomic environment (coordination number, atom coordination), ... Problem description

Part of the tetrahedral network of -quartz Common level of description: coordination polyhedra and their connections. Part of the tetrahedral network of -quartz Problem description

Vertex and edge sharing of octahedra and tetrahedra in zoisite Problem description

Given the structural pattern of a part of a (real or hypothetical) chemical compound. The problem Find all compounds in a given set of models with similar structural patterns. Problem description

+ Combination of graphical and textual query parameters Cations: Cu Anions: O Bonds: 'strong' Pyramids: inclination > 55° + addition of 'independent' polyhedra doesn't work Problem description

Difficulties - Definition of coordination polyhedra (linear/quadratic gap, bond valence > 0.02 vu, ... ?). Sodium chloride Spodumene ..... Na, Cl Li, Al, Si, O Regular convex polyhedra as well as distorted polyhedra Problem description

Search for substructures should be flexible! Similar: The use of the term 'similar', ...., arises from the inherent difficulty in defining a priori limits on the similarity of geometrical configurations or physical/chemical characteristics. (In: Terms that define different degrees of similarity between inorganic structures, Nomenclature Commission, International Union of Crystallography) Search for substructures should be flexible! Two phases: 1. Determine topologically equivalent substructures. 2. Check possible embeddings for geometrical conformity. Problem description

Modelling of polyhedral networks Problems to be solved: Large variety of polyhedra (regular or distorted). Three kinds of connections between polyhedra (vertex, edge, or face). Infinite periodic structures. Topological view of convex polyhedra: Three-connected planar graphs (Theorem of Steinitz). Equivalence of polyhedra: Isomorphic topological views (allowing distortions). are topologically equivalent: and Modelling of...

Clusters of polyhedra: Connected units of polyhedra. Equivalence of clusters? We are dealing with experimental data! Transformation should be possible without breaking connections between polyhedra (but allowing distortions). Example: Equivalent: Not equivalent: Modelling of ...

Ordered face representation Graph representation Ordered face representation 1432 154 125 235 345 (4,1)(3,2) 1432 154 125 235 345 (4,2)(3,1) Graph representation

Sodium chloride Polyhedra graph: Unit cell 145 152 123 134 465 256 263 364 (5,2)(4,3) (5,3) -(0,0,1)+ (2,3)(5,4) -(100)+ (1,3)(5,6) (4,1)(6,2) (5,1)(6,2) (1,2)(4,6) (6,1) +(0,-1,0)- Polyhedra graph: Unit cell z x Neighbourhood 2 5 3 4

Indexation of polyhedra graphs Topological search Subgraph isomorphism problem: computationally hard. Preprocessing of model structures in the database. Indexation of polyhedra graphs Proceeding Consider paths up to some fixed limit length. Extract information relevant for topological search. Organize this information as an index. Topological search

Retrieval of substructures

Substructure marked for search

topological information cis: Search chain trans: Polyhedron Immediate neighbours Connection octahedron 6 1 tetrahedron 4 1, cis octahedron 6 1, trans tetrahedron 4 - max, extendable, no precycle coding

Acmite Zoisite Precycle jump

Organization of search chains 1 4 3 Ordered prefix-tree 104, 1 4 3 306, (not max, extendable, precycle) p-node: Polyhedron identifier, kind of connection to successor or general information on search chain (in case of last polyhedron). Exact match l-node: Number of immediate neighbours. Matching:  i-node: Ordered list of identifiers of model graphs.

Determination of isomorphic substructures Compute search chains for the input structure. Collect candidates by inspecting the prefix tree. Determine an edge covering of the input structure. For every candidate model graph compute instances for all chains from this covering. Try to find a subset of instances such that the corresponding subgraph is isomorphic to the input structure up to vertex labels.

Compute annotated paths for the input structure. Sketch of search: Compute annotated paths for the input structure. Use the index to determine candidate model structures. Try to locate substructures in these model structures having the same path cover as the input structure. Check for permutations of polyhedra vertices to get isomorphic graphs. Topological search

Leifite Answer complexity Number of isomorphic substructures in a single model structure? Leifite 44 not symmetrically equivalent 15-membered rings in leifite. Topological search

Embedding Embedding

Permutations 1234 1465 3864 2783 1572 5687 1234 1465 3864 2783 1572 5687 Topological search

Geometric similarity Two levels: 1. Polyhedra 2. Relative positioning of polyhedra. hinge Geometric similarity

To solve: The problem of absolute orientation. Geometric similarity

Consider CS and CS' as rigid subsets of ℝ3. CS : {c1,...,cn}, CS' : {c1',...,cn'} sets of the coordinates of the central atoms of isomorphic structures S and S', resp. Consider CS and CS' as rigid subsets of ℝ3. Look for a motion T in the group of proper Euclidean motions solving the following least-squares problem: U   ∥ci' – T(ci)∥2  min 2 i=1 n Measuring similarity:  U n   (Root Mean Square) Implementation: Closed-form solution using unit quaternions (algorithm of B.K.P. Horn, 1987). Geometric similarity

The result Given: A structural pattern of a part of a chemical compound (real or hypothetical) and a database with structure data (including polyhedra graphs) and index. Answer: Compounds with isomorphic structural patterns and their RMS values. Jadeite: 0.680581 Zoisite: 0.789256 Spodumene: 0.110646 Geometric similarity

Search structure in aminoffite Epididymite: RMS 0.162449 Search structure in aminoffite Paracelsian: RMS 0.485688 Merrihueite: RMS 0.711812 Geometric similarity

Future work Searching: Improve the embedding algorithm (permutations, symmetries). Allow more than one connected component: Ranking: Include measures of distortion in the description of coordination polyhedra. General: Investigate the realization space of polyhedra graphs (subspace of generalized hinge motions, generators,...?). hinge motion with interpenetration Future work