Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.

Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08

Introduction  cheminformatics – allow us to computationally describe similarity  synthetic chemists – describe through visual inspection  we will describe compounds by the presence of chemical substructures  we will attempt to identify sets of substructures that predict biological performance 1011101000101 0101000101101 0101000101101 1

Previous work  Clemons/Kahne/Wagner et al. -- disaccharide profiling in multiple cell states  found sets of substructures relevant to biological activity patterns  substructures highly specific to disaccharides 10 20 30 40 50 60 substructures

Biological performance profile  400 compounds, 8 assays in duplicate  tested for cell proliferation in 8 different cell lines  class labels are active (A) or inactive (I) active compound

What are fingerprints?  compound collection fed into commercial software  each substructure = 1 bit  the fingerprint shows which substructures are present substructure #1725 substructure #886 substructure #7017

Overview of cheminformatic methods  produced fingerprints  7700 total substructures  filtered set  left 2166 substructures

feature (substructure) selection to find predictive subsets evaluate methods for predictive value Overview of computational methods  two steps independent of each other

ReliefF: substructure selection +10 2166 weights Top 5 Bottom 5

K nearest neighbors (knn): predictive accuracy  Examples: k = 2, 5 compound being classified = ?

Similarity between compounds  similarity between two fingerprints  Tanimoto coefficient  this is used twice: (1) in ReliefF (2) in knn Example: Compound a: 0 0 1 Compound b: 1 0 1 Tanimoto coefficient = 1 / 2 =.5

Cross-validation: predictive accuracy  10 subsets  test set: one of the subsets  training set: the remaining subsets test set training set

Picking parameters for methods  which parameters produce the best predictive accuracies number of neighbors used in ReliefF {1, 2, 4, etc} number of neighbors used in knn {1, 2, 4, etc} number of ReliefF substructures used to predict classes in knn {1, 20, 100, etc}

Picking number of substructures predictive accuracy 1.0.9.8.7.6.5.4.3.2.1 0.0 1 20 all number of substructures used to predict

Group of substructures best able to predict

Future work  multi-class  different feature selection

Acknowledgements Computational Chemical Biology Joshua Gilbert Paul Clemons Hyman Carrinski Summer Research Program in Genomics Shawna Young Lucia Vielma Maura Silverstein

Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.

Similar presentations

Presentation on theme: "Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.

Similar presentations

Presentation on theme: "Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08."— Presentation transcript:

Similar presentations

About project

Feedback