Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute,

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
ABCD Flexsim-R: A new 3D descriptor for combinatorial library design and in-silico screening 2 nd Joint Sheffield Conference on Chemoinformatics: Computational.
Future CAMD Workloads and their Implications for Computer System Design IEEE 6th Annual Workshop on Workload Characterization.
Lipinski’s rule of five
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
A Study on Feature Selection for Toxicity Prediction*
Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
8 th Iranian workshop of Chemometrics 7-9 February 2009 Progress of Chemometrics in Iran Mehdi Jalali-Heravi February 2009 In the Name of God.
Design of Small Molecule Drugs Targeted to RNA RNA Ontology Group May
OMICS Group Contact us at: OMICS Group International through its Open Access Initiative is committed to make genuine and.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Chemoinformatics in Drug Design
A Statistical Geometry Approach to the Study of Protein Structure Majid Masso Bioinformatics and Computational Biology George Mason University.
Molecular Library and Imaging Francis Collins, NHGRI Tom Insel, NIMH Rod Pettigrew, NIBIB Building Blocks and Pathways Francis Collins,NHGRI Richard Hodes,
Bioinformatics Ayesha M. Khan Spring Phylogenetic software PHYLIP l 2.
Structure-based Drug Design
Important Points in Drug Design based on Bioinformatics Tools History of Drug/Vaccine development –Plants or Natural Product Plant and Natural products.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
GGAGATTCTGGGCCACTTTGGTTCCCCATGAGCCAAGACGGCACTTCTAATTTGCATTCCCTACCGGAGTCCCTGTCTGTAGCCAGCCTGGCTTTCAGCTGGTGCCCAAAGTGACAAATGTATCTGCAATGACAAAGGTAC CCTGGAAGGGCTCGCCCTCTGCGGAATTTCAGTTCATGCAGGCCTTGGTGCTTCCACATCTGTCCAAGGGCCTTTCAAATGTGACTTTTAACTCTGTGGATTGATTTGCCCGG
Computational Techniques in Support of Drug Discovery October 2, 2002 Jeffrey Wolbach, Ph. D.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Combinatorial Chemistry and Library Design
Asia’s Largest Global Software & Services Company Genomes to Drugs: A Bioinformatics Perspective Sharmila Mande Bioinformatics Division Advanced Technology.
Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Genomics Research Institute University of Cincinnati Compound Library Wm. L. Seibel January 10, 2007.
Faculté de Chimie, ULP, Strasbourg, FRANCE
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi.
Empirical Validation of the Effectiveness of Chemical Descriptors in Data Mining Kirk Simmons DuPont Crop Protection Stine-Haskell Research Center 1090.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
ECCR Overview/MLSCN. NIH Roadmap Series of initiatives designed to pursue major opportunities in biomedical research and gaps in current knowledge that.
A Web-Based Computational Tool for Combinatorial Library Design that Simultaneously Optimises Multiple Properties Weifan Zheng, Sunny T. Hung, Joel T.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Computer-aided drug discovery (CADD)/design methods have played a major role in the development of therapeutically important small molecules for several.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
Use of Machine Learning in Chemoinformatics
Basic Translational Clinical New Pathways to Discovery Harmonization Target ID & Valid. Phases I-III Research Teams of Future Translational Cores Clinical.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Shows tendency for mergers. These big companies may be shrinking – much research is now outsourced to low cost countries like Latvia, India, China and.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
TIDEA Target (and Lead) Independent Drug Enhancement Algorithm.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Important Points in Drug Design based on Bioinformatics Tools
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Bioinformatics in Drug Design
Virtual Screening.
Current Status at BioChemtek
“Structure Based Drug Design for Antidiabetics”
Nancy Baker SILS Bioinformatics Seminar January 21, 2004
Important Points in Drug Design based on Bioinformatics Tools
Presentation transcript:

Cheminformatics in Drug Discovery and Chemical Genomics Research Weifan Zheng, Ph.D. Associate Professor Department of Pharmaceutical Sciences BRITE Institute, NC Central University Adjunct Associate Professor Department of Medicinal Chemistry University of North Carolina at Chapel Hill UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.

Drug Discovery & Development Pipeline UKY Seminar Weifan Zheng, Ph.D.

Phases and Costs of Drug Discovery UKY Seminar Weifan Zheng, Ph.D.

GR: Genetic Research; DR: Discovery Research; DD: Drug Discovery CADD: computer-assisted drug discovery ADMET: Absorption, distribution, metabolism, elimination, toxicity Drug Discovery Process and the Roles of CADD GRDRDDPreclin IND I IIIII THLC H2LLOT2H CADD Clinical trials UKY Seminar Weifan Zheng, Ph.D.

Basic Science Clinical Trials LGLOPre-C Drug Discovery Translational Research $ $$ e.g., CHDI, ALS, SMA Orphan Drug Discovery UKY Seminar Weifan Zheng, Ph.D.

SMA Compound Management Center Compound Vendors Compound Vendors Contract Chem Labs Contract Chem Labs Assay/Screening Centers Informatics System Lead Dev Team Steering Committee by IS Material + Info Data flow minimize Interactions Informatics in the Core of the Discovery Activities UKY Seminar Weifan Zheng, Ph.D.

Human Genome Project Success “Genome announcement 'technological triumph' Milestone in genetics ushers in new era of discovery, responsibility” CNN, June 26, 2000 UKY Seminar Weifan Zheng, Ph.D.

Chemogenomics/Chemical Genomics Chris Austin F. Collins UKY Seminar Weifan Zheng, Ph.D.

Chemogenomics –69,000 in google (Oct.16, 2006) Chemical genomics –113,000 in google (Oct.16, 2006) Chemical biology –4,210,000 (Oct.16, 2006) Chemical genetics –104,000 (Oct.16, 2006) Chemical Genomics UKY Seminar Weifan Zheng, Ph.D.

Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value. Chemical genetics is a research method that uses small molecules to change the way proteins work—directly in real time rather than indirectly by manipulating their genes. It is used to identify which proteins regulate different biological processes, to understand in molecular detail how proteins perform their biological functions, and to identify small molecules that may be of medical value.

to create a national resource in chemical probe development. The center uses the latest industrial-scale technologies to collect data that is useful for defining the cross-section between chemical space and biological activity (and do so on genomic scale).

Chemical Synthesis Centers Chemical Synthesis Centers MLI MLSCN (9+1) 9 centers 1 NIH intramural 20 x 10 = 200 assays MLSCN (9+1) 9 centers 1 NIH intramural 20 x 10 = 200 assays PubChem (NLM) PubChem (NLM) ECCR (6) Exploratory Centers ECCR (6) Exploratory Centers CombiChem Parallel synthesis DOS 4 centers + DPI 100K – 1M compounds CombiChem Parallel synthesis DOS 4 centers + DPI 100K – 1M compounds compounds 200 assays SAR matrix NIH Molecular Library Initiative UKY Seminar Weifan Zheng, Ph.D.

Biochemical assays Cell-based functional assays Phenotypic assays Databases –PubChem ( –ChemBank ( –WOMBAT ( –Jubilant ( –Gvk/Bio ( Biological Assay Data UKY Seminar Weifan Zheng, Ph.D.

Virtual Libraries Diverse Lib Design Targeted Lib Design Combinatorial Synthesis HTS KDD (QSAR, P.R.) Rules Real Libraries SAR Data Drug Discovery Chemical Genomics Logistics Scientific High Throughput Chemistry and Screening: Informatics UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.

3,000 3 / 1,000 per week = ~0.5 million years!!! Library Design: rational selection of a subset of building blocks to obtain a maximum amount of information (3000) R1 R2 (3000) R3 (3000) Challenges in Combinatorial Chemistry UKY Seminar Weifan Zheng, Ph.D.

Design for Activity: Similarity If we know a compound is active, and we want to design a set of compounds that may be active against the same target, we may select –A set of compounds that are similar to the active compound The similarity principle: similar compounds should have similar biological activity UKY Seminar Weifan Zheng, Ph.D.

X 1 X 2 X 3 X 20 Str Str Str Str X1 X2 Molecular Identity and Molecular Similarity UKY Seminar Weifan Zheng, Ph.D.

Design for General Application: Diversity UKY Seminar Weifan Zheng, Ph.D.

- Maxi Min - Minimize (Sum 1/Dij*Dij) Similarity and Diversity UKY Seminar Weifan Zheng, Ph.D.

Cluster Hits Obtained by SAGE and Random Sampling UKY Seminar Weifan Zheng, Ph.D.

Drug Discovery & Development Failures Venkatesh & Lipper, J. Pharm. Sci. 89, (2000) 39% 29% 21% 6% UKY Seminar Weifan Zheng, Ph.D.

Multi-Factorial Design UKY Seminar Weifan Zheng, Ph.D.

Total Score is the Weighted Sum of Individual Terms UKY Seminar Weifan Zheng, Ph.D.

Penalty Scores Iteration Initial Library Better Library Optimal Library Lipinski Properties P450 Activity Diversity R1R2 R1 R2 R1 R2 R1 R2

Initial Ten solutions (undesigned) The final ten solutions (well designed) clogP Designed Library Has a Better MW-clogP Distribution

X 1 X 2 X 3 X 20 Str Str Str Str X1 X2 Molecular Identity and Molecular Similarity UKY Seminar Weifan Zheng, Ph.D.

Iterative Random Sampling Original Space Embedding Space (2D) a b D(a,b)D’(a,b) If D’ > D, move a, b closer If D’ < D, move a, b apart SPE Algorithm (Agrafiotis) UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.

Chemical Space - Compound Collection Comparison UKY Seminar Weifan Zheng, Ph.D.

SPE Embedding of ChemSpace UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.

Quantitative Structure-Activity Relationship (QSAR) StructuresActivity str1a1 str2a2 str3a3 str4a4 str5a5 str6a6 str7a7 str8a8 str9a9 str10a predict actual predict actual q2=0.8 R2=0.75 Multiple Linear regression (MLR); partial least square (PLS); Artificial neural nets; k-nearest neighbor (kNN) UKY Seminar Weifan Zheng, Ph.D.

Structurally similar compounds should have similar biological activities Biological similarities are often due to similarities of substructures (pharmacophore) Biological activities can be estimated from molecular similarities, which are calculated with pharmacophore-specific descriptors Basic Assumptions of KNN-QSAR Method UKY Seminar Weifan Zheng, Ph.D.

Comparison of CoMFA, GA-PLS, and KNN-QSAR UKY Seminar Weifan Zheng, Ph.D.

QSAR Based Virtual Screening for GPCR Ligand Design UKY Seminar Weifan Zheng, Ph.D.

Topics to Be Covered Biotech/PharmaOrphan DiseaseChemical Genomics Computational Needs Compound CollectionDocking ScoringData Analytics CECCR Cheminformatics Center UKY Seminar Weifan Zheng, Ph.D.

Docking and Scoring Early 1980’s, Kuntz, I.D. developed the first computerized molecular docking program: DOCK GOLD, FRED, GLIDE, FLEXX, AutoDock, ICM X-ray structure

1. Use Delaunay tessellation to derive geometrical chemical descriptors of protein ligand interface 2. Establish correlation between the geometrical chemical descriptors and protein-ligand binding affinity using Perceptron Learning algorithm Our Approach to Derive DT-SCORE UKY Seminar Weifan Zheng, Ph.D.

Receptor-ligand Complexes Descriptor Generation Tessellation of receptor -ligand interface Model Generation & Prediction Binding constant DT-SCORE Perceptron Learning algorithm Flowchart to Derive DT-SCORE UKY Seminar Weifan Zheng, Ph.D.

Rigorous definition of nearest neighbors in 2D & 3D space - Delaunay tessellation Nearest neighbors are unambiguously defined in sets of three (in 2D) and in sets of four (in 3D) Delaunay Tessellation in 2D UKY Seminar Weifan Zheng, Ph.D.

Delaunay Tessellation of the Receptor-Ligand Interface UKY Seminar Weifan Zheng, Ph.D.

R R R L R R An atom is shared by several tetrahedra A Detailed View of Active Site Tessellation

RRRLRRLLRLLL RLLL: Formed by 1 receptor atom and 3 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RRRL: Formed by 3 receptor atoms and 1 ligand atom Each of the above tetrahedron types is further discriminated by atom types on the vertices 3 Types of Tetrahedra at the Receptor-Ligand Interface UKY Seminar Weifan Zheng, Ph.D.

RRRL RRLL RLLL NCNO ONOS …… CNOO NOCS …… COSC OSXN …… Geometrical Descriptors According to Tetrahedron Types UKY Seminar Weifan Zheng, Ph.D.

( R·L Interaction Pattern – Binding Affinity Relationship Table) Receptor-Ligand Complexes Binding Affinity RLLLRRLLRRRL NCNOONOS … CNOONOCS … COSCOSXN … (R L) 1 y1y1 03…28…13… (R L) 2 y2y2 17…31…03… …………………………… (R L) m-1 y m-1 34…05…46… (R L) m ymym 20…22…10… “QSAR” Input Table UKY Seminar Weifan Zheng, Ph.D.

Input LayerOutput Layer N x1x1 x2x2 x3x3 xNxN y w1w1 w2w2 w3w3 wNwN x i = input of neuron w i = weight associated with the input x i f n (.) = Activation function of output neuron. Single-Layer Perceptron Network

Entire dataset Test set Training set Model development (q 2 ) Prediction of the test set (R 2 ) 80% (214 complexes) 20% (50 complexes) (264 complexes) Training Vs. Test Set Selection and Validation UKY Seminar Weifan Zheng, Ph.D.

Average value from multiple (ca. 80) models Model Stability UKY Seminar Weifan Zheng, Ph.D.

214 complexes: q 2 = 0.73 Actual vs. Predicted Binding Affinity for the Training Set UKY Seminar Weifan Zheng, Ph.D.

50 complexes: R 2 = 0.61 Actual vs. Predicted Binding Affinity for the Test Set UKY Seminar Weifan Zheng, Ph.D.

NCCU and UNC –Jerry Ebalunode, Ph.D., BRITE –Min Shen, Ph.D., Lexicon –Alex Tropsha, Ph.D., Chair of MedChem, UNC-Chapel Hill Funding –NIH P20HG –NIH R21GM Acknowledgements UKY Seminar Weifan Zheng, Ph.D. GSK –Sunny Hung (GSK) –George Seibel (JNJ) –Ken Kopple (retired) –Jeff Wiseman (Locus) Lilly –Minmin Wang –Greg Durst –Jim Wikel (retired)

Training Set Size Test Set Size Training Set q 2 Test Set R 2 BLEEP35190N/A0.53 PMF69777N/A0.61 SMoG N/A0.304 SMoG N/A0.435 Wang DT-Score Comparison with Published Scoring Functions UKY Seminar Weifan Zheng, Ph.D.