Download presentation
Presentation is loading. Please wait.
Published byChristopher Bell Modified over 9 years ago
1
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course
2
2CBS, Department of Systems Biology Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
3
3CBS, Department of Systems Biology Machine Learning
4
4CBS, Department of Systems Biology
5
5
6
6
7
7
8
8
9
9
10
10CBS, Department of Systems Biology
11
11CBS, Department of Systems Biology
12
12CBS, Department of Systems Biology
13
13CBS, Department of Systems Biology
14
14CBS, Department of Systems Biology
15
15CBS, Department of Systems Biology
16
16CBS, Department of Systems Biology
17
17CBS, Department of Systems Biology
18
18CBS, Department of Systems Biology Machine learning classifiers
19
19CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure
20
20CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure
21
21CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure
22
22CBS, Department of Systems Biology Clustering: Self Organizing Maps Distinguishing molecules of different biological activities and finding a new lead structure
23
23CBS, Department of Systems Biology Machine Learning
24
24CBS, Department of Systems Biology Machine Learning Molecular Structures Properties Molecular Descriptors QSAR Virtual Screening Clustering Classification
25
25CBS, Department of Systems Biology Different descriptor types Simple feature counts (such as number of rotatable bonds or molecular weight) Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures Physicochemical properties (density, solubility, vdWaals volume) Topological indices (size, branching, overall shape)
26
26CBS, Department of Systems Biology Major Aspects of Chemoinformatics Databases: Development of databases for storage and retrieval of small molecule structures and their properties. Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data. Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
27
27CBS, Department of Systems Biology In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P) Quantitative Structure-Activity Relationships (QSAR)
28
28CBS, Department of Systems Biology Prediction of Solubility, ADME & Toxicity
29
29CBS, Department of Systems Biology hERG Classification with SVM
30
30CBS, Department of Systems Biology Evaluation of the data set
31
31CBS, Department of Systems Biology Performance of SVM
32
32CBS, Department of Systems Biology Performance of SVM
33
33CBS, Department of Systems Biology Virtual screening Computational techniques for a rapid assessment of large libraries of chemical structures in order to guide the selection of likely drug candidates.
34
34CBS, Department of Systems Biology Similarity Search Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity. Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.
35
35CBS, Department of Systems Biology Fingerprints-based Similarity Search –widely used similarity search tool –consists of descriptors encoded as bit strings –Bit strings of query and database are compared using similarity metric such as Tanimoto coefficient MACCS fingerprints: 166 structural keys that answer questions of the type: Is there a ring of size 4? Is at least one F, Br, Cl, or I present? where the answer is either TRUE (1) or FALSE (0)
36
36CBS, Department of Systems Biology Tanimoto Similarity or 90% similarity
37
37CBS, Department of Systems Biology Similarity Search
38
38CBS, Department of Systems Biology Questions?
39
39CBS, Department of Systems Biology Molecular editors and viewers http://www.chemaxon.com/products/marvin/
40
40CBS, Department of Systems Biology http://jmol.sourceforge.net/ Molecular editors and viewers
41
41CBS, Department of Systems Biology Format conversion http://cactus.nci.nih.gov/translate/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.