The Use of Graph Matching Algorithms to Identify Biochemical Substructures in Synthetic Chemical Compounds Application to Metabolomics Mai Hamdalla, David.

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
Fill in missing numbers or operations
Name: Date: Read temperatures on a thermometer Independent / Some adult support / A lot of adult support
From percentage to formula
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
& dding ubtracting ractions.
Sequential Logic Design
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
1 When you see… Find the zeros You think…. 2 To find the zeros...
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Prime and Composite Numbers. These are numbers that have only two factors – themselves and one. These are numbers that have only two factors – themselves.
0 - 0.
1 1  1 =.
1  1 =.
CHAPTER 18 The Ankle and Lower Leg
Addition Facts
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
$100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400.
Sampling in Marketing Research
The basics for simulations
MM4A6c: Apply the law of sines and the law of cosines.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Performance of units These slides complement the article How journal rankings can suppress interdisciplinary research. A comparison between innovation.
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
Visual Highway Data Select a highway below... NORTH SOUTH Salisbury Southern Maryland Eastern Shore.
By Grace and Mabel.  Population: 84,978 (2007)  What to do/see:  In the centre of Pau is a large castle, the Château de Pau, that dominates that quarter.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Artificial Intelligence
When you see… Find the zeros You think….
Properties of Exponents
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Daily Quiz and Journal Ch 1 Sect 1
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Addition 1’s to 20.
 Find the difference between the two numbers on the red boxes.  If the difference of the red boxes matches the blue box say “deal” f not, it’s “no.
Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.
Test B, 100 Subtraction Facts
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
& dding ubtracting ractions.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

The Use of Graph Matching Algorithms to Identify Biochemical Substructures in Synthetic Chemical Compounds Application to Metabolomics Mai Hamdalla, David Grant, Ion Mandoiu, Dennis Hill, Sanguthevar Rajasekaran and Reda Ammar University of Connecticut

DNA RNA Proteins 2 Phenotype/Function Transcriptome Proteome Metabolome Sugars Nucleotides Lipids Amino Acids Genome Metabolites

Mammalian Metabolite Identifier List of Candidate Chemical Structures SMILES (simplified molecular-input line-entry system) C 8 H 7 N C1=CC=C2C(=C1)C=CN2 C 9 H 18 O 8 C(C1C(C(C(C(O1)OCC(CO)O)O)O)O)O C 6 H 12 O 6 C(C1C(C(C(O1)(CO)O)O)O)O SMILES (simplified molecular-input line-entry system) C 8 H 7 N C1=CC=C2C(=C1)C=CN2 C 9 H 18 O 8 C(C1C(C(C(C(O1)OCC(CO)O)O)O)O)O C 6 H 12 O 6 C(C1C(C(C(O1)(CO)O)O)O)O Ranked list of Candidate Structures with mammalian substructures Identification Process 3 N O O O O O O O O O O O OO O

Filtration List of Candidate Compound Structures List of Filtered Candidate Compounds Structure Matching Ranked list of identified Compounds Mammalian Scaffolds List non-Biological Scaffolds 4 S ugars Nucleotides Lipids Amino Acids

Collection and Curation of Scaffolds Retrieve All compounds in a Metabolic Pathway in KEGG Database Keep Participants of Mammalian Metabolic Pathway Groups (91 KEGG Pathways) Remove Compounds that did not have an entry in the PubChem Database. Remove Entries that were single elements, metals, or inorganic 1,987 compounds Carbohydrate, Energy, Lipid, Nucleotide, Amino Acid, Glycan, Cofactors, and Vitamins Metabolism 5 30 – 1,000 da

Identification Process Filtration List of Candidate Compound Structures List of Filtered Candidate Compounds Structure Matching List of Identified Compounds Mammalian Scaffolds List non-Biological Scaffolds 6 S ugars Nucleotides Lipids Amino Acids

N O O Where: N SBS : the number of atoms in the substructure and N SPR : the number of atoms in the superstructure. O O SMSD (Small Molecule Sub-graph Detector) toolkit is used for molecule similarity searches. N O Structure Matching 7 N O N O

Similarity Score = 0.29 (4/14) Similarity Score = 0.43 (6/14) Similarity Score = 0.29 (4/14)Similarity Score = 0.43 (6/14) Scaffolds-Structure Matching Candidate Structure Mammalian Scaffolds O O O N N O O ON O N O N O O N O O O O O O O N C1=CC=C2C(=C1)C(=O)C=C(N2)C(=O)O 0.29 O O O N O O O N O O O N 0.43 O O O N C 10 H 7 NO

O O O N O O O N O O O N Union Scaffold Structure Candidate Structure Mammalian Scaffolds O O O N N O O ON O N O N O O N O O O O O O O N Similarity Score = 0.71 (10/14) Union Scaffold O O O N O O O N

10 About 30% of the mammalian structures were missed (FN) N O O O N O N O N O O S N O N 0.9 (9/10) (9/12) 0.6 (9/15) Found to be a substructure of 38 Scaffolds! Similarity Score = 0.9 Union Scaffold Score = 0 N O O S N Superstructure Scaffolds Matching

Scoring Methods 11 O O O N O O O N O O O N O Candidate Structure Union Scaffold Structure Superstructure Scaffold Structure US: Union Scaffold Score = 0.71 MS: Maximum Score (Union Scaffold Score, Superstructure Score) = 0.93 SS: Sum of Scores (Union Scaffold Score, Superstructure Score) = 1.64

Collection and Curation of Synthetic Compounds Retrieve synthetic compounds from ChemBridge and ChemSynthesis databases. – restricted to the 6 biological elements C, H, N, O, P, and S. The mass distribution – ChemBridge (150 – 700 da) – ChemSynthesis (50 –300 da) 1,400 compounds were randomly selected for training and 5,320 compounds were randomly chosen for testing. 12 mammalian scaffold list reduced to 1,400 compounds (50 – 700 da)

USMSSS 70%59%88% 2% 65%71%57% 3% % 5US5MS5SS 83%84%86% 1% 75%76%78% 2% % 1% Cross Validation Average Accuracy Results SENS AVG STDEV SPEC AVG STDEV MCC AVG STDEV 13

Leave one Out Accuracy 14 Sensitivity = 96%

Prospective Results of Synthetic Compounds 15 54% eliminated as non-mammalian

Conclusions A novel way of utilizing known mammalian metabolites (scaffolds database) to identify synthetic chemical compounds with mammalian substructures. The results show a sensitivity of 96% in the mammalian scaffolds leave-one-out experiments. The system was able to eliminate 54% of a random set of synthetic compounds. 16

Ongoing Work Exploring further improvements in accuracy by using known biological pathway information. Annotating PubChem Annotating existing and potential drugs Database independent compound search – Generate all possible structures of a given formula and rank them 17

Filtration Candidate Structures List of Filtered Candidate Compounds Structure Matching Ranked Compounds Mammalian Scaffolds List non-Biological Scaffolds 18 S ugars Nucleotides Lipids Amino Acids Thank you! O O O N O O O N O O O N O