05/02/2008 Jae Hyun Kim Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2):
Terminology Motivation Method Molecular Signature Signature Kernel Signature Product Kernel Results Conclusion 2 Contents
Catalyst Increases the rate of chemical reaction / biological process Remains unchanged Enzyme Biomolecules that catalyze chemical reactions Usually proteins Metabolite Intermediates & products of metabolism Restricted to small molecules 3 Terminology (1) Reference:
Inhibitor Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons 4 Terminology (2) Reference:
EC Number Numerical Classification scheme for Enzyme- catalyzed reactions Four levels of hierarchy Example: EC : tripeptide aminopeptidases EC 3 : hydrolases (enzymes that use water to break up some other molecules ) EC 3.4 : hydrolases that act on peptide bonds EC : hydrolases that cleave off the amino- terminal amino acid from polypeptide EC : hydrolases that cleave off the amino- terminal end from a tripeptide 5 Enzyme Commission (EC) Number Reference:
Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor 6 Motivation Protein-Chemical Interaction Large-scale Machine-learning Technique
G=(V,E) : Molecular Graph V : vertex (atom) set E : edge (bond) set Atomic Signature Canonical representation of subgraph surrounding a particular atom include atoms and bonds up to a predefined distance (height) Molecular Signature of G : h (G) h G (x) : atomic signature in G rooted at x of height h Height Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7) 7 Molecular Signature
Molecular Signature: Example 8 (Leucine) (Isoleucine)(Glycine) Depth First Search up to “height” deep ‘(‘ going down, ‘)’ going back up c_, n_: sp3 carbon/nitrogen atom c=, o= : sp2 (double-bond) carbon/oxygen atom h_: hydrogen
General form of enzymatic reaction R s 1 S 1 +s 2 S 2 +…+s n S n p 1 P 1 +p 2 P 2 +…+p m P m Height h signature of reaction R 9 Reaction Signature
To predict/classify protein-protein interactions To measure similarity between two pairs of proteins Kernel Function K( (X 1,X 2 ), (X’ 1,X’ 2 ) ) How to measure similarity between pairs? 10 Pairwise Kernel
Pairwise similarity by component similarity If X 1 ~X 1 ’ and X 2 ~X 2 ’ then (X 1,X 2 )~(X 1 ’,X 2 ’) Assess directly similarity between pairs x 12 = (x 1i x 2j + x 2i x 1j ): pairwise representation of (X 1, X 2 ) Similarity inside the pair Similarity between pairs 11 Kernel Types From Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.
Definition Apply to chemicals, proteins, reactions 12 Signature Kernel
P: Protein, C: Chemical Definition : Signature of Complex P C Two pairs of P-C interaction (P,C) & (Q,D) 13 Signature Product Kernel (1/2)
Similarly, Therefore, 14 Signature Product Kernel (2/2)
Signature Kernel : Example (height 1) 15 # of occurrence
Signature Product Kernel : Example 16
Signature Similarity VS. Sequence Alignment Scores 17 Computed for every pair of amino acids Correlation : Chemically similar high BLOSUM62 score
Positive Examples download from KEGG more than 50, max 500 Negative Examples: Equal Number, Random Selection Signature Kernel, 5-fold CV 18 EC Number Classification Using only reactions Using only protein sequences
EC Classification 19 Class 1Class 1.1 Class 1.1.1Class Using both sequences & reactions Signature Product Kernel
Comparison with other Methods 20 Accuracy = (TP+TN)/ (TP+TN+FP+FN) Auc = Area Under Curve Precision = TP/(TP+FP) Sensitivity=TP/(TP+FN) Specificity=TN/(TN+FP) Jaccard Coefficient = TP/(TP+FP+FN) A larger number indicates better results
Prediction EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a given reaction Signature Product Kernel 21 Predicting New Enzyme Interactions
Predict DRUGBANK Using KEGG 22 Area under ROC = 0.74 Signature Product Kernel Class I : Both in training set Class II: Different Partners Class III: Only Target Class IV: Only Drug Class V: None
Unified method for predicting protein- chemical interactions Atomistic structure representation of proteins encompasses information stored in substitution matrices. 23 Conclusion