Building Hypotheses and Searching Databases
Two ways of creating an hypothesis Automatically create a chemical feature-based hypothesis from a set of compounds with respect to a type of activity. Build an hypothesis by assembling substructures and chemical functions and specifying the geometric constraints between them.
Components of an Hypothesis Assemble the components from known data such as the atomic coordinates available from X-ray crystallographic data. Express the characteristics of the hypothesis as a collection of particular chemical substructures or a collection of chemical functions such as hydrogen bond donors and hydrophilic groups, or a combination of substructures and chemical functions.
Chemical Substructures Available The feature dictionary contains a large library of chemical functional groups such as primary, secondary, tertiary amines, hydroxyl, carbonyl, acridyl, acetoamido, 1-beta-glucopyranosyl, amino acids etc, etc.
Chemical Functions Available The chemical functions available include HB ACCEPTOR, HB ACCEPTOR lipid, HB DONOR, HYDROPHOBIC, HYDROPHOBIC aliphatic, HYDROPHOBIC aromatic, NEG CHARGE, NEG IONIZABLE, POS CHARGE, POS IONIZABLE, RING AROMATIC
Chemical Functions Available The distances, angles, and/or torsions between items in an hypothesis, the preferred location of a chemical feature, and a range of elements per atom position may be specified within the hypothesis or a substructure of the hypothesis. Excluded volumes may also be specified.
Using the hypothesis? Having built the hypothesis databases may then be searched with the hypothesis to find compounds within the databases that match the hypothesis.
Building a Substructure Hypotheses and Searching Databases
Hydrogen count set to “anything”
Specifying atom range per atom position
Specifying atom range per atom position
Searching a Database with an Hypothesis Once the hypothesis has been designed and built it may be used to search a database for compounds that contain the defined features.
Results of Database search. Default 300 hits.
Hit example 1
Hit example 1
Compound data
Compound sort. Handling large datasets.
Compound property report.
Managing Databases: Coping with extremely large numbers. Are scientists going to manage in the new world without an in depth knowledge of mathematics and statistics? Are we training scientists to cope with tools such as, cluster analysis, discriminate analysis, cross validation techniques, neural networks, Fourier transformations etc, etc. Do we need courses in the design, building and interrogation of databases.
Building a Feature Based Hypotheses and Searching Databases
Setting distance constraints.
Setting distance constraints.
Hybrid hypothesis of chemical functions and fragments.
Using the generic b-adrenergic agonist to search a database.
Databases? Global structural databases e.g. CCSD Structure specific databases. Therapeutic area based databases. Multi-conformer databases. Composite databases that encapsulate the information of multi-gigabyte files. QSAR based databases. Commercially available versus problem specific databases.