Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics
Irene Kouskoumvekaki Associate Professor February 15th, 2013

Major Aspects of Chemoinformatics
Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Machine Learning Tries to teach the computer to draw conclusions based on previous experience.

Akinator, the Web Genius Akinator the Genius can read your mind and tell you who you're thinking of by answering a few questions.

Machine learning classifiers

Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure

Machine Learning

Machine Learning Molecular Structures Properties Molecular Descriptors
QSAR Virtual Screening Clustering Classification Molecular Structures Properties Molecular Descriptors

Different descriptor types
• Simple feature counts (such as number of rotatable bonds or molecular weight) • Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures • Physicochemical properties (density, solubility, vdWaals volume) • Topological indices (size, branching, overall shape)

Major Aspects of Chemoinformatics
Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Quantitative Structure-Activity Relationships (QSAR)
In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

Prediction of Solubility, ADME & Toxicity
Give guidelines Filter compounds that enter the lab Speed up the drug discovery process

Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.

Similarity Search Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

Fingerprints-based Similarity Search
widely used similarity search tool consists of descriptors encoded as bit strings Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)

Tanimoto Similarity or 90% similarity

Similarity Search

Example: Virtual Screening of PubChem
Go to Search PubChem for compounds that are similar to this structure with Tc>0.95: How many similar compounds do you find? Click on BioActivity Analysis How many of them are biologically active? On how many bioassays have they been tested on? Click on Structure-Activity Which compound is your query? Which compounds are most similar to your query? Are they active on the same bioassays?

Questions?

Use of Machine Learning in Chemoinformatics

Similar presentations

Presentation on theme: "Use of Machine Learning in Chemoinformatics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Use of Machine Learning in Chemoinformatics

Similar presentations

Presentation on theme: "Use of Machine Learning in Chemoinformatics"— Presentation transcript:

Similar presentations

About project

Feedback