Download presentation
Presentation is loading. Please wait.
Published byLynn Mathews Modified over 9 years ago
1
Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08
2
Introduction cheminformatics – allow us to computationally describe similarity synthetic chemists – describe through visual inspection we will describe compounds by the presence of chemical substructures we will attempt to identify sets of substructures that predict biological performance 1011101000101 0101000101101 0101000101101 1
3
Previous work Clemons/Kahne/Wagner et al. -- disaccharide profiling in multiple cell states found sets of substructures relevant to biological activity patterns substructures highly specific to disaccharides 10 20 30 40 50 60 substructures
4
Biological performance profile 400 compounds, 8 assays in duplicate tested for cell proliferation in 8 different cell lines class labels are active (A) or inactive (I) active compound
5
What are fingerprints? compound collection fed into commercial software each substructure = 1 bit the fingerprint shows which substructures are present substructure #1725 substructure #886 substructure #7017
6
Overview of cheminformatic methods produced fingerprints 7700 total substructures filtered set left 2166 substructures
7
feature (substructure) selection to find predictive subsets evaluate methods for predictive value Overview of computational methods two steps independent of each other
8
ReliefF: substructure selection +10 2166 weights Top 5 Bottom 5
9
K nearest neighbors (knn): predictive accuracy Examples: k = 2, 5 compound being classified = ?
10
Similarity between compounds similarity between two fingerprints Tanimoto coefficient this is used twice: (1) in ReliefF (2) in knn Example: Compound a: 0 0 1 Compound b: 1 0 1 Tanimoto coefficient = 1 / 2 =.5
11
Cross-validation: predictive accuracy 10 subsets test set: one of the subsets training set: the remaining subsets test set training set
12
Picking parameters for methods which parameters produce the best predictive accuracies number of neighbors used in ReliefF {1, 2, 4, etc} number of neighbors used in knn {1, 2, 4, etc} number of ReliefF substructures used to predict classes in knn {1, 20, 100, etc}
13
Picking number of substructures predictive accuracy 1.0.9.8.7.6.5.4.3.2.1 0.0 1 20 all number of substructures used to predict
14
Group of substructures best able to predict
15
Future work multi-class different feature selection
16
Acknowledgements Computational Chemical Biology Joshua Gilbert Paul Clemons Hyman Carrinski Summer Research Program in Genomics Shawna Young Lucia Vielma Maura Silverstein
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.