Hot Topics in Chemoinformatics in the Pharmaceutical Industry

Slides:

Advertisements

Similar presentations

1 Real World Chemistry Virtual discovery for the real world Joe Mernagh 19 May 2005.

Advertisements

An Introduction to Data Mining

Analysis of High-Throughput Screening Data C371 Fall 2004.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Project Proposal.

STATISTICS DEFINITION AND MEANING

Personalia: Pre-Sheffield Batchelor’s degree in Chemistry at Oxford Pre-university job in my local public library system Chemistry or information science?

Lipinski’s rule of five

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.

Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.

CSCD 555 Research Methods for Computer Science

Nanotechnology in Drug Discovery- Development and Delivery

The Application of the Scientific Method: Preclinical Trials Copyright PEER.tamu.edu.

1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.

Career Opportunities for PharmDs in the Pharmaceutical Industry: Research & Development.

Bioinformatics Ayesha M. Khan Spring Phylogenetic software PHYLIP l 2.

Structure-based Drug Design

Important Points in Drug Design based on Bioinformatics Tools History of Drug/Vaccine development –Plants or Natural Product Plant and Natural products.

LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.

Drug discovery and development

1 The Discovery Informatics Framework Pat Rougeau President and CEO MDL Information Systems, Inc. Delivering the Integration Promise American Chemical.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Combinatorial Chemistry and Library Design

Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.

Asia’s Largest Global Software & Services Company Genomes to Drugs: A Bioinformatics Perspective Sharmila Mande Bioinformatics Division Advanced Technology.

A substance used in the diagnosis, treatment, or prevention of a disease or as a component of a medication A substance used in the diagnosis, treatment,

Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.

Biotechnology in Medicine Chapter 12.

CS 790 – Bioinformatics Introduction and overview.

Daniel Brown. D9.1 Discuss the use of a compound library in drug design. Traditionally, a large collection of related compounds are synthesized individually.

TOPICS IN (NANO) BIOTECHNOLOGY

From the Lab to Market Unit 3.04 Understanding Biotechnology research & Development.

Biomedical Research.

A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.

Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.

Drug Discovery Process Massimiliano Beltramo, PhD.

Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.

20/03/2008 Dept. of Pharmaceutics 1. Use of BIOINFORMATICS in Pharmaciutics 2  Presented By  Shafnan Nazar  Hamid Nasir 

Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.

+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:

Developing medicines for the future and why it is challenging Angela Milne.

Strategies for developing India as a contract research hub Swaminathan Subramaniam Chief Operating Officer Aurigene Discovery Technologies.

Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.

Kendall & KendallCopyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall4-1 Interactive Methods to collect Information Requirements Interviewing.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

Information Technology in the Natural Sciences Biology – Chemistry – Physics.

1 PLEASING CLIENTS AT A MOLECULAR AND CELLULAR LEVEL AUGUST 7, 2015.

Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.

Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.

Use of Machine Learning in Chemoinformatics

Observing the Current System Benefits Can see how the system actually works in practice Can ask people to explain what they are doing – to gain a clear.

MDL Information Systems, Inc. Powering the Process of Invention Donna del Rey Director, Business Planning

Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.

Progression of New Drug: From Idea to Public Consumption Chris DeFarlo Writing in the HLTH Professions Unit 2 Prof. Edwards.

신기술 접목에 의한 신약개발의 발전전망과 전략 LGCI 생명과학 기술원. Confidential LGCI Life Science R&D 새 시대 – Post Genomic Era Genome count ‘The genomes of various species including.

Molecular Modeling in Drug Discovery: an Overview

Physiochemical properties of drugs Some background to the Sirius T3.

A substance used in the diagnosis, treatment, or prevention of a disease or as a component of a medication recognized or defined by the U.S. Food, Drug,

Designing Drugs Virtually P14D461P - Arni B. Hj. Morshidi P14D389P - Anisah Bt Ismail P14D397P - Syarifah Rohaya Bt Wan Idris P14D394P - Dayang Adelina.

Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.

Drug Discovery &Development

MIS2502: Data Analytics Advanced Analytics - Introduction

APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY

Important Points in Drug Design based on Bioinformatics Tools

Lixia Yao, James A. Evans, Andrey Rzhetsky Trends in Biotechnology

Nancy Baker SILS Bioinformatics Seminar January 21, 2004

Important Points in Drug Design based on Bioinformatics Tools

Drug Design and Drug Discovery

Presentation transcript:

Hot Topics in Chemoinformatics in the Pharmaceutical Industry David J. Wild, Ph.D. Scientific Computing Consultant, and Adjunct Professor of Pharmaceutical Engineering at the University of Michigan david@wild-ideas.org www.WildIdeasConsulting.com

About me B.Sc Computer Science Ph.D. Chemoinformatics (Willett Lab) Worked for 5 years in Scientific Computing leadership at Pfizer, responsible for the development of computational tools for scientists Now run a consulting firm based in Ann Arbor, Mich., and am also an Adjunct Professor at the University of Michigan. doing some research Wild Ideas Consulting www.WildIdeasConsulting.com University of Michigan www-personal.engin.umich.edu/~wildd

What we’ll cover today Overview of early-stage drug discovery and the big industry concerns Using information and technology together to improve the chances of finding a new drug Example – High Throughput Screening Some other examples of “hot” areas Genomics & Proteomics Information Handling Virtual Screening Combinatorial Chemistry Design of scientific software

Characteristics of the pharmaceutical industry Very segmented market – largest company (Pfizer) only has an 11% market share High risk, long term – takes 10-20 years to develop a drug, and most drugs fail to get to market Highly regulated (by FDA) High profit margins for drugs which do make it Investors traditionally expect high return on investment Four main phases: discovery, development, clinical trials and marketing

R&D spending up, new drugs down Taken from http://www.newscientistjobs.com/biotech/ernstyoung/blues.jsp

Drug Discovery & Development Identify disease Find a drug effective against disease protein (2-5 years) Isolate protein involved in disease (2-5 years) Preclinical testing (1-3 years) Human clinical trials (2-10 years) File IND Formulation & Scale-up File NDA FDA approval (2-3 years)

Impact of new technology on drug discovery The last few years have seen a number of “revolutionary” new technologies: Gene chips, genomics and HGP Bioinformatics & Molecular biology More protein structures High-throughput screening & assays Virtual screening and library design Docking Combinatorial chemistry In-vitro ADME testing Other computational methods How do we make it all work for us?

GENOMICS, PROTEOMICS & BIOPHARM. Potentially producing many more targets and “personalized” targets HIGH THROUGHPUT SCREENING Identify disease Screening up to 100,000 compounds a day for activity against a target protein VIRTUAL SCREENING Using a computer to predict activity Isolate protein COMBINATORIAL CHEMISTRY Rapidly producing vast numbers of compounds Find drug MOLECULAR MODELING Computer graphics & models help improve activity Preclinical testing IN VITRO & IN SILICO ADME MODELS Tissue and computer models begin to replace animal testing

There is little “hard data” on using the new technologies In a sense, the drug design process is becoming a big experiment Do we continue as before, and carefully introduce new technologies as we deem appropriate, or do we radically change the way things are done? Lots of pressure for the new technologies to yield results quickly How do we measure the results?

Some questions being asked Is our increasing spending on R&D and new technologies really going to pay off? Or was it a red herring? Is the paucity of drugs in the pipeline because we’re not doing things right, or are we just hitting limits on the number of major diseases with potential treatments still to be found? (“all the low-hanging fruit has gone”) Should we be looking in new areas (e.g. “life enhancment” rather than “life saving” or “quality of life”)

What’s being done Trying to get the right Attrition (=drugs dropping out of the pipeline). Aim to increase early-stage attrition and reduce late-stage attrition Risk analysis – look ideally for low-risk, high-payoff drugs Using metrics to monitor successes and failures

Analyzing risk High risk Low payoff High payoff Low risk

Using metrics to monitor improvement Split the discovery process into discrete units, with key decisions at the end of each unit. Come up with measurable properties that can be used to gauge success Look for good and bad decisions and why they were made Stage Decision Point Target exploration Go with this target? HTS Was the screen successful? HTS Analysis Follow up these 5-10 series Series Followup Produce 2-3 lead compounds ADME study Are compounds safe?

Summary The pharmaceutical industry is a high-risk industry with very long development times and short product lifespans There has been a lot of investment in new technologies for early stage drug discovery, but so far these are not resulting in more drug candidates (or profits) Companies are looking at ways to address this problem including managing attrition, risk analysis and metrics.

How Chemoinformatics can help out Producing and manage information for metrics In-silico analysis to reduce risk, e.g. Virtual screening Library design, Docking Cost/benefit analyses Making information available at the right time and the right place Needs to be integrated into processes

An example: High-Throughput Screening Screening perhaps millions of compounds in a corporate collection to see if any show activity against a certain disease protein

High-Throughput Screening Drug companies now have millions of samples of chemical compounds High-throughput screening can test 100,000 compounds a day for activity against a protein target Maybe tens of thousands of these compounds will show some activity for the protein The chemist needs to intelligently select the 2 - 3 classes of compounds that show the most promise for being drugs to follow-up

Informatics Implications Need to be able to store chemical structure and biological data for millions of datapoints Computational representation of 2D structure Need to be able to organize thousands of active compounds into meaningful groups Use cluster analysis or machine learning methods to group similar structures together and relate to activity Need to learn as much information as possible from the data (data mining) Apply statistical methods to the structures and related information

HTS Tools – Tripos SAR Navigator SAR Navigator is © Tripos, inc., www.tripos.com

BioReason ClassPharmer Clusters actives into groups representing series Attempts to find a scaffold using MCS algorithm Recovers inactives back into series Presents series as rows in a “spreadsheet” view Gives other statistics on series, such as activity distribution http://www.bioreason.com

BioReason Classpharmer www.bioreason.com

BioReason Classpharmer www.bioreason.com

Strategy for “HTS Triage” Run HTS Decided which compounds are “active” and which are “inactive” Cluster the actives to put them into series Visualize clusters of actives (showing 2D structures) and pick series of interest Identify “scaffold” for each series Use similarity or substructure search on inactives to find inactives related to these series Use SAR techniques to discover differences between actives and inactives in a series

Information generated at different points in the Drug Design process Gene chip experiments Protein structures Project selection decisions Assay protocols HTS results Series selection decisions SAR studies Combinatorial Expts. Pharmacophores ADME studies Lead cmpd decisions Toxicology studies Scaleup reactions File IND File NDA Clinical Trials data Doctor/patient studies Marketing, surveys, etc

Information generated at different sites

Distributed goals model

Shared goals model

Information storage breakdowns Large amounts of information generated: Some is not kept at all Some is kept but loses its meaning Often data is kept, but not semantics or decisions e.g. keep “the HVX2 assay result for this compound was 3.2”, but not what the assay protocol was, whether the compound was considered ‘active’, nor whether it was followed up on. “Bigger picture” or derivative information is usually not stored E.g. “all the compounds with a tri-methyl group seemed to have much lower activity for this project”

Information access breakdowns Some information is only available in one physical location Some information is only available within one part of the discovery process Often information is not “contextualized” for use outside a particular domain When someone is clear about a piece of information they need; that piece of information exists, but they don’t know how to access it. E.g. What system to use, what Oracle table it’s in, or even the knowledge of whether that piece of information does exist!

“Missed opportunities” Not a specific breakdown, but if the right piece of information had been available at the right time, better decisions could have been made E.g. A group of compounds is being followed up as potential drugs, but a rival company just applied for a patent on the compounds A large amount of money is being spent developing an HTS assay for a target, but marketing research shows any drug is unlikely to be a success A group of compounds is selected from an HTS as good candidates for follow up, but 20 years ago they were followed up for a similar project and had severe solubility problems

Information use breakdowns The meaning of data is incorrectly interpreted A single piece of information is used, whilst using a wider range of information would lead to different conclusions Lessons learned from one project are incorrectly applied to another “Fuzzy” information is taken as concrete information

What do we do? No large company has really solved the problem But ongoing attempts include: Defining information produced and needed at each stage of the discovery process Improving processes to be more consistent, especially across different sites Improving information flow between departments and sites Harmonizing terminology across disciplines and sites Defining needed “management information” as well as raw data Looking for “quick win” opportunities This will presumably impact what is stored in databases and what software is used Oracle Chemistry Cartridges help

Some Other Examples Genomics & Proteomics Information Handling Virtual Screening Combinatorial Chemistry Design of scientific software

Genomics & Proteomics Information Handling Understanding the link between diseases, genetic makeup and expression of proteins

Genomics Genomics is fast-forwarding our understanding of how DNA, genes, proteins and protein function are related, in both normal and disease conditions Human genome project has mapped the genes in human DNA Hope is that this understanding will provide many more potential protein targets Allows potential “personalization” of therapies ATACGGAT TATGCCTA functions

e.g. obese, cancer, caucasian compounds administered Gene Chips “Gene chips” allow us to look for changes in protein expression for different people with a variety of conditions, and to see if the presence of drugs changes that expression Makes possible the design of drugs to target different phenotypes people / conditions e.g. obese, cancer, caucasian compounds administered expression profile (screen for 35,000 genes)

“Chemogenomics” from Vertex Video: http://www.vrtx.com/Chemogenonone.html

Virtual Screening Build a computational model of activity for a particular target Use model to score compounds from “virtual” or real libraries Use scores to decide which to make, or pass through a real screen

Computational Models of Activity Machine Learning Methods E.g. Neural nets, Bayesian nets, SVMs, Kahonen nets Train with compounds of known activity Predict activity of “unknown” compounds Scoring methods Profile compounds based on properties related to target Fast Docking Rapidly “dock” 3D representations of molecules into 3D representations of proteins, and score according to how well they bind

Present molecules to model We may want to virtual screen All of a company’s in-house compounds, to see which to screen first A compound collection that could be purchased A potential combinatorial chemistry library, to see if it is worth making, and if so which to make Model will come out with with either prediction of how well each molecule will bind, or a score for each molecule

Combinatorial Chemistry By combining molecular “building blocks”, we can create very large numbers of different molecules very quickly. Usually involves a “scaffold” molecule, and sets of compounds which can be reacted with the scaffold to place different structures on “attachment points”.

Example Combinatorial Library Scaffold “R”-groups Examples R1 = OH OCH3 NH2 Cl COOH R2 = phenyl OH Br F CN R3 = CF3 NO2 phenoxy OH NH R1 NH CN O OH OH C OH NH NH R2 OH CF3 R3 O CH3 O OH C For this small library, the number of possible compounds is 5 x 6 x 5 = 150 NH OH O

Combinatorial Chemistry Issues Which R-groups to choose Which libraries to make “Fill out” existing compound collection? Targeted to a particular protein? As many compounds as possible? Computational profiling of libraries can help “Virtual libraries” can be assessed on computer

Design of Scientific Software Problems with scientific software tend to occur because of deficiencies in one of three areas: Software Relevance Software Usability Software Management

Software Relevance To be able to make software relevant requires the software designer to understand: the science, i.e. the domain the scientific computing techniques that are used in the domain the possibilites and limitations of software development. Even with this, it’s hard to match the things we can do with the things that people want or need to do Techniques like personas and contextual inquiry simply help us understand the people who use the software, their goals, and tasks they want to do

Software relevance: Bridge between computation & science clustering sim. searching activity models scaffold detection docking logp calculation goals: e.g. produce compounds that have high biological activity tools tasks: “doing a cluster analysis” “identifying activity-related fragments” tasks: work out a chemical synthesis choose good reagents try and document some reactions ? chemoinformatics science

Software Usability Tend to focus on the method and the science, but not how easy it is for people to get their job done using the software Programmers tend to make software intuitive for them, but not necessarily the people it is designed for A usability lab and other techniques can make a HUGE difference to the satisfaction of users and programmers alike!

Software Management Disparate set of tools & platforms Disparate programming styles, languages A variety of people tend to be writing software Trained software developers Enthusiastic scientists Scientific computing specialists Focus on the science tends to mean software management is neglected Everyone hates traditional software management “rules” But there are ways of making everything work better and having more fun doing it! Have a recommended basic setup that should help a lot

Foundation reading “The Inmates Are Running the Asylum” by Alan Cooper “Contextual Design” by Hugh Beyer and Karen Holtzblatt “Usability Engineering” by Jakob Nielsen “The Visual Display of Quantitative Information” by Edward Tufte “Don’t Make Me Think!” by Steve Krug See also, www.WildIdeasConsulting.com

Summary R&D in the pharmaceutical industry is undergoing a lot of technological changes, and there is pressure to make the investment pay off There is a big need to sensibly use the large amounts of chemical and biological-related information produced in the process Thoughtful use of chemoinformatics methods and software is becoming crucial to the success of drug discovery