The Red Pill Roger Sayle, Geoff Skillman, Matthew Stahl Robert Tolbert OpenEye Scientific Software
Integration The process of computing an integral; the inverse of differentiation.
Integration The organization of the psychological or social traits and tendencies of a personality into a harmonious whole.
Data Integration Merge (data) into a [harmonious] whole Chaining data generation Extensible data storage
OEChem Programming toolkit Python/C++ API's Public API Precise handling of chemistry Multiple models of chemistry Aromaticity Atom types Valence models Query semantics
Perception Kekule form Aromaticity (Daylight, Tripos, Merck, MDL, OpenEye) Atom types Topological symmetry Stereochemistry (tetrahedral, cis/trans) Partial charges Biomonomers recognition Bond orders from coordinates
Aromaticity Models Yes No Yes No YesYes/NoNo YesNoN/A No Yes No OpenEyeDaylightMMFFTriposMDL
Data Integration Merge (data) into a [harmonious] whole Chaining data generation Extensible data storage
Chaining Data Generation Software ASoftware BData -Challenging in a heterogeneous software environment -Lossless data conversion -Feature perception
Extensible Data Storage Source Data
Question How often do people (mis)use SD files for attaching data to molecules?
Extensible Data Structures Python: atom.SetStringData(“Spam”,”Eggs”) atom.GetStringData(“Spam”) C++: class Foo {}; Foo foo; mol.SetData(“VeryNiceData”,foo); mol.GetData (“VeryNiceData”);
Chemical EXchange An interchange language to enable components to communicate Model similar to Unix pipes and single purpose commands CEX stream contains objects (molecule, message) Extensible named property/value pairs Each component in the CEX pipeline can read some objects and properties from the input stream and add new ones to the output stream
OEBinary V2 Extensible tag/data format Heirarchical Persistent objects (automatic for POD types) Dynamic data parsing Efficient storage of conformers Ideal for storage as BLOB Lossless data storage possible Definition publicly available
Conclusions I have no idea what 'data integration' really means OEChem maintains the integrity of chemical data Extensible persistent data structures likely facilitate data integration OEChem provides extensible persistent data structures OEChem likely facilitates data integration
Acknowledgments Geoff Skillman Bob Tolbert Roger Sayle AstraZeneca Pharmaceuticals Vertex Pharmaceuticals