Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute, NC Central University Adjunct Associate Professor Department of Medicinal Chemistry University of North Carolina at Chapel Hill UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.
Drug Discovery & Development Pipeline UKY Seminar Weifan Zheng, Ph.D.
Phases and Costs of Drug Discovery UKY Seminar Weifan Zheng, Ph.D.
GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery CADD: computer-assisted drug discovery ADMET: Absorption, distribution, metabolism, elimination, toxicity Drug Discovery Process and the Roles of CADD GRDRDDPreclin IND I IIIII THLC H2LLOT2H CADD Clinical trials UKY Seminar Weifan Zheng, Ph.D.
Basic Science Clinical Trials LGLOPre-C Drug Discovery Translational Research $ $$ e.g., CHDI, ALS, SMA Orphan Drug Discovery UKY Seminar Weifan Zheng, Ph.D.
SMA Compound Management Center Compound Vendors Compound Vendors Contract Chem Labs Contract Chem Labs Assay/Screening Centers Informatics System Lead Dev Team Steering Committee by IS Material + Info Data flow minimize Interactions Informatics in the Core of the Discovery Activities UKY Seminar Weifan Zheng, Ph.D.
Human Genome Project Success “Genome announcement 'technological triumph' Milestone in genetics ushers in new era of discovery, responsibility” CNN, June 26, 2000 UKY Seminar Weifan Zheng, Ph.D.
Chemogenomics/Chemical Genomics Chris Austin F. Collins UKY Seminar Weifan Zheng, Ph.D.
Chemogenomics –69,000 in google (Oct.16, 2006) Chemical genomics –113,000 in google (Oct.16, 2006) Chemical biology –4,210,000 (Oct.16, 2006) Chemical genetics –104,000 (Oct.16, 2006) Chemical Genomics UKY Seminar Weifan Zheng, Ph.D.
Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value. Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.
to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do so on genomic scale).
Chemical Synthesis Centers Chemical Synthesis Centers MLI MLSCN (9+1) 9 centers 1 NIH intramural 20 x 10 = 200 assays MLSCN (9+1) 9 centers 1 NIH intramural 20 x 10 = 200 assays PubChem (NLM) PubChem (NLM) ECCR (6) Exploratory Centers ECCR (6) Exploratory Centers CombiChem Parallel synthesis DOS 4 centers + DPI 100K – 1M compounds CombiChem Parallel synthesis DOS 4 centers + DPI 100K – 1M compounds compounds 200 assays SAR matrix NIH Molecular Library Initiative UKY Seminar Weifan Zheng, Ph.D.
Biochemical assays Cell-based functional assays Phenotypic assays Databases –PubChem ( –ChemBank ( –WOMBAT ( –Jubilant ( –Gvk/Bio ( Biological Assay Data UKY Seminar Weifan Zheng, Ph.D.
Virtual Libraries Diverse Lib Design Targeted Lib Design Combinatorial Synthesis HTS KDD (QSAR, P.R.) Rules Real Libraries SAR Data Drug Discovery Chemical Genomics Logistics Scientific High Throughput Chemistry and Screening: Informatics UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.
3,000 3 / 1,000 per week = ~0.5 million years!!! Library Design: rational selection of a subset of building blocks to obtain a maximum amount of information (3000) R1 R2 (3000) R3 (3000) Challenges in Combinatorial Chemistry UKY Seminar Weifan Zheng, Ph.D.
Design for Activity: Similarity If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select –A set of compounds that are similar to the active compound The similarity principle: similar compounds should have similar biological activity UKY Seminar Weifan Zheng, Ph.D.
X 1 X 2 X 3 X 20 Str Str Str Str X1 X2 Molecular Identity and Molecular Similarity UKY Seminar Weifan Zheng, Ph.D.
Design for General Application: Diversity UKY Seminar Weifan Zheng, Ph.D.
- Maxi Min - Minimize (Sum 1/Dij*Dij) Similarity and Diversity UKY Seminar Weifan Zheng, Ph.D.
Cluster Hits Obtained by SAGE and Random Sampling UKY Seminar Weifan Zheng, Ph.D.
Drug Discovery & Development Failures Venkatesh & Lipper, J. Pharm. Sci. 89, (2000) 39% 29% 21% 6% UKY Seminar Weifan Zheng, Ph.D.
Multi-Factorial Design UKY Seminar Weifan Zheng, Ph.D.
Total Score is the Weighted Sum of Individual Terms UKY Seminar Weifan Zheng, Ph.D.
Penalty Scores Iteration Initial Library Better Library Optimal Library Lipinski Properties P450 Activity Diversity R1R2 R1 R2 R1 R2 R1 R2
Initial Ten solutions (undesigned) The final ten solutions (well designed) clogP Designed Library Has a Better MW-clogP Distribution
X 1 X 2 X 3 X 20 Str Str Str Str X1 X2 Molecular Identity and Molecular Similarity UKY Seminar Weifan Zheng, Ph.D.
Iterative Random Sampling Original Space Embedding Space (2D) a b D(a,b)D’(a,b) If D’ > D, move a, b closer If D’ < D, move a, b apart SPE Algorithm (Agrafiotis) UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.
Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.
SPE Embedding of ChemSpace UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.
Quantitative Structure-Activity Relationship (QSAR) StructuresActivity str1a1 str2a2 str3a3 str4a4 str5a5 str6a6 str7a7 str8a8 str9a9 str10a predict actual predict actual q2=0.8 R2=0.75 Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN) UKY Seminar Weifan Zheng, Ph.D.
Structurally similar compounds should have similar biological activities Biological similarities are often due to similarities of substructures (pharmacophore) Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors Basic Assumptions of KNN-QSAR Method UKY Seminar Weifan Zheng, Ph.D.
Comparison of CoMFA, GA-PLS, and KNN-QSAR UKY Seminar Weifan Zheng, Ph.D.
QSAR Based Virtual Screening for GPCR Ligand Design UKY Seminar Weifan Zheng, Ph.D.
Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.
Docking and Scoring Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK GOLD, FRED, GLIDE, FLEXX, AutoDock, ICM X-ray structure
1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface 2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm Our Approach to Derive DT-SCORE UKY Seminar Weifan Zheng, Ph.D.
Receptor-ligand Complexes Descriptor Generation Tessellation of receptor -ligand interface Model Generation & Prediction Binding constant DT-SCORE Perceptron Learning algorithm Flowchart to Derive DT-SCORE UKY Seminar Weifan Zheng, Ph.D.
Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D) Delaunay Tessellation in 2D UKY Seminar Weifan Zheng, Ph.D.
Delaunay Tessellation of the Receptor-Ligand Interface UKY Seminar Weifan Zheng, Ph.D.
R R R L R R An atom is shared by several tetrahedra A Detailed View of Active Site Tessellation
RRRLRRLLRLLL RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RRRL: Formed by 3 receptor atoms and 1 ligand atom Each of the above tetrahedron types is further discriminated by atom types on the vertices 3 Types of Tetrahedra at the Receptor-Ligand Interface UKY Seminar Weifan Zheng, Ph.D.
RRRL RRLL RLLL NCNO ONOS …… CNOO NOCS …… COSC OSXN …… Geometrical Descriptors According to Tetrahedron Types UKY Seminar Weifan Zheng, Ph.D.
( R·L Interaction Pattern – Binding Affinity Relationship Table) Receptor-Ligand Complexes Binding Affinity RLLLRRLLRRRL NCNOONOS … CNOONOCS … COSCOSXN … (R L) 1 y1y1 03…28…13… (R L) 2 y2y2 17…31…03… …………………………… (R L) m-1 y m-1 34…05…46… (R L) m ymym 20…22…10… “QSAR” Input Table UKY Seminar Weifan Zheng, Ph.D.
Input LayerOutput Layer N x1x1 x2x2 x3x3 xNxN y w1w1 w2w2 w3w3 wNwN x i = input of neuron w i = weight associated with the input x i f n (.) = Activation function of output neuron. Single-Layer Perceptron Network
Entire dataset Test set Training set Model development (q 2 ) Prediction of the test set (R 2 ) 80% (214 complexes) 20% (50 complexes) (264 complexes) Training Vs. Test Set Selection and Validation UKY Seminar Weifan Zheng, Ph.D.
Average value from multiple (ca. 80) models Model Stability UKY Seminar Weifan Zheng, Ph.D.
214 complexes: q 2 = 0.73 Actual vs. Predicted Binding Affinity for the Training Set UKY Seminar Weifan Zheng, Ph.D.
50 complexes: R 2 = 0.61 Actual vs. Predicted Binding Affinity for the Test Set UKY Seminar Weifan Zheng, Ph.D.
NCCU and UNC –Jerry Ebalunode, Ph.D., BRITE –Min Shen, Ph.D., Lexicon –Alex Tropsha, Ph.D., Chair of MedChem, UNC-Chapel Hill Funding –NIH P20HG –NIH R21GM Acknowledgements UKY Seminar Weifan Zheng, Ph.D. GSK –Sunny Hung (GSK) –George Seibel (JNJ) –Ken Kopple (retired) –Jeff Wiseman (Locus) Lilly –Minmin Wang –Greg Durst –Jim Wikel (retired)
Training Set Size Test Set Size Training Set q 2 Test Set R 2 BLEEP35190N/A0.53 PMF69777N/A0.61 SMoG N/A0.304 SMoG N/A0.435 Wang DT-Score Comparison with Published Scoring Functions UKY Seminar Weifan Zheng, Ph.D.