Download presentation
Presentation is loading. Please wait.
Published byKelsey Kellam Modified over 10 years ago
1
The Use of Graph Matching Algorithms to Identify Biochemical Substructures in Synthetic Chemical Compounds Application to Metabolomics Mai Hamdalla, David Grant, Ion Mandoiu, Dennis Hill, Sanguthevar Rajasekaran and Reda Ammar University of Connecticut
2
DNA RNA Proteins 2 Phenotype/Function Transcriptome Proteome Metabolome Sugars Nucleotides Lipids Amino Acids Genome Metabolites
3
Mammalian Metabolite Identifier List of Candidate Chemical Structures SMILES (simplified molecular-input line-entry system) C 8 H 7 N C1=CC=C2C(=C1)C=CN2 C 9 H 18 O 8 C(C1C(C(C(C(O1)OCC(CO)O)O)O)O)O C 6 H 12 O 6 C(C1C(C(C(O1)(CO)O)O)O)O SMILES (simplified molecular-input line-entry system) C 8 H 7 N C1=CC=C2C(=C1)C=CN2 C 9 H 18 O 8 C(C1C(C(C(C(O1)OCC(CO)O)O)O)O)O C 6 H 12 O 6 C(C1C(C(C(O1)(CO)O)O)O)O Ranked list of Candidate Structures with mammalian substructures Identification Process 3 N O O O O O O O O O O O OO O
4
Filtration List of Candidate Compound Structures List of Filtered Candidate Compounds Structure Matching Ranked list of identified Compounds Mammalian Scaffolds List non-Biological Scaffolds 4 S ugars Nucleotides Lipids Amino Acids
5
Collection and Curation of Scaffolds Retrieve All compounds in a Metabolic Pathway in KEGG Database Keep Participants of Mammalian Metabolic Pathway Groups (91 KEGG Pathways) Remove Compounds that did not have an entry in the PubChem Database. Remove Entries that were single elements, metals, or inorganic 1,987 compounds Carbohydrate, Energy, Lipid, Nucleotide, Amino Acid, Glycan, Cofactors, and Vitamins Metabolism 5 30 – 1,000 da
6
Identification Process Filtration List of Candidate Compound Structures List of Filtered Candidate Compounds Structure Matching List of Identified Compounds Mammalian Scaffolds List non-Biological Scaffolds 6 S ugars Nucleotides Lipids Amino Acids
7
N O O Where: N SBS : the number of atoms in the substructure and N SPR : the number of atoms in the superstructure. O O SMSD (Small Molecule Sub-graph Detector) toolkit is used for molecule similarity searches. N O Structure Matching 7 N O N O
8
Similarity Score = 0.29 (4/14) Similarity Score = 0.43 (6/14) Similarity Score = 0.29 (4/14)Similarity Score = 0.43 (6/14) Scaffolds-Structure Matching Candidate Structure Mammalian Scaffolds O O O N N O O ON O N O N O O N O O O O O O O N C1=CC=C2C(=C1)C(=O)C=C(N2)C(=O)O 0.29 O O O N 0.43 0.29 O O O N O O O N 0.43 O O O N C 10 H 7 NO 3 0.36 0.29
9
O O O N O O O N O O O N Union Scaffold Structure Candidate Structure Mammalian Scaffolds O O O N N O O ON O N O N O O N O O O O 0.29 0.43 0.29 O O O N 0.43 0.36 0.29 Similarity Score = 0.71 (10/14) Union Scaffold O O O N O O O N
10
10 About 30% of the mammalian structures were missed (FN) N O O O N O N O N O O S N O N 0.9 (9/10) 0.45 0.75 (9/12) 0.6 (9/15) Found to be a substructure of 38 Scaffolds! Similarity Score = 0.9 Union Scaffold Score = 0 N O O S N Superstructure Scaffolds Matching
11
Scoring Methods 11 O O O N O O O N O O O N O Candidate Structure Union Scaffold Structure Superstructure Scaffold Structure 0.710.93 US: Union Scaffold Score = 0.71 MS: Maximum Score (Union Scaffold Score, Superstructure Score) = 0.93 SS: Sum of Scores (Union Scaffold Score, Superstructure Score) = 1.64
12
Collection and Curation of Synthetic Compounds Retrieve synthetic compounds from ChemBridge and ChemSynthesis databases. – restricted to the 6 biological elements C, H, N, O, P, and S. The mass distribution – ChemBridge (150 – 700 da) – ChemSynthesis (50 –300 da) 1,400 compounds were randomly selected for training and 5,320 compounds were randomly chosen for testing. 12 mammalian scaffold list reduced to 1,400 compounds (50 – 700 da)
13
USMSSS 70%59%88% 2% 65%71%57% 3% 0.360.30.47 2% 5US5MS5SS 83%84%86% 1% 75%76%78% 2% 0.570.60.64 2% 1% Cross Validation Average Accuracy Results SENS AVG STDEV SPEC AVG STDEV MCC AVG STDEV 13
14
Leave one Out Accuracy 14 Sensitivity = 96%
15
Prospective Results of Synthetic Compounds 15 54% eliminated as non-mammalian
16
Conclusions A novel way of utilizing known mammalian metabolites (scaffolds database) to identify synthetic chemical compounds with mammalian substructures. The results show a sensitivity of 96% in the mammalian scaffolds leave-one-out experiments. The system was able to eliminate 54% of a random set of synthetic compounds. 16
17
Ongoing Work Exploring further improvements in accuracy by using known biological pathway information. Annotating PubChem Annotating existing and potential drugs Database independent compound search – Generate all possible structures of a given formula and rank them 17
18
Filtration Candidate Structures List of Filtered Candidate Compounds Structure Matching Ranked Compounds Mammalian Scaffolds List non-Biological Scaffolds 18 S ugars Nucleotides Lipids Amino Acids Thank you! O O O N O O O N O O O N O
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.