Download presentation
Presentation is loading. Please wait.
1
Use of Machine Learning in Chemoinformatics
Irene Kouskoumvekaki Associate Professor February 15th, 2013
2
Major Aspects of Chemoinformatics
Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
3
Machine Learning Tries to teach the computer to draw conclusions based on previous experience.
4
Akinator, the Web Genius Akinator the Genius can read your mind and tell you who you're thinking of by answering a few questions.
18
Machine learning classifiers
19
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
20
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
21
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
22
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
23
Machine Learning
24
Machine Learning Molecular Structures Properties Molecular Descriptors
QSAR Virtual Screening Clustering Classification Molecular Structures Properties Molecular Descriptors
25
Different descriptor types
• Simple feature counts (such as number of rotatable bonds or molecular weight) • Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures • Physicochemical properties (density, solubility, vdWaals volume) • Topological indices (size, branching, overall shape)
26
Major Aspects of Chemoinformatics
Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
27
Quantitative Structure-Activity Relationships (QSAR)
In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)
28
Prediction of Solubility, ADME & Toxicity
Give guidelines Filter compounds that enter the lab Speed up the drug discovery process
29
Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.
30
Similarity Search Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.
31
Fingerprints-based Similarity Search
widely used similarity search tool consists of descriptors encoded as bit strings Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)
32
Tanimoto Similarity or 90% similarity
33
Similarity Search
34
Example: Virtual Screening of PubChem
Go to Search PubChem for compounds that are similar to this structure with Tc>0.95: How many similar compounds do you find? Click on BioActivity Analysis How many of them are biologically active? On how many bioassays have they been tested on? Click on Structure-Activity Which compound is your query? Which compounds are most similar to your query? Are they active on the same bioassays?
40
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.