Analysis of High-Throughput Screening Data C371 Fall 2004.

Analysis of High-Throughput Screening Data C371 Fall 2004

Drug Discovery Process The key steps of drug discovery are: –research - average 2 to 3 years –pre-clinical testing - average 1 year –clinical trial testing (involving human patients) - average 10 years –regulatory approval - average 2 years

Drug Discovery Process: Web Sites http://akosgmbh.de/Drug_discovery_process.htm http://www.ppdi.com/PPD_U7.htm

INTRODUCTION HTS allows hundreds of thousands of compounds to be assayed very quickly HTS data characterized by: –High volume –High level of noise –Diverse nature of the chemical classes involved –Possible presence of multiple binding modes

INTRODUCTION Select the most potent compounds to progress to the next stage Problems: –Functional groups that interfere with the assay (e.g., fluoresce) –Functional groups that react with biological systems –Catch these with substructure and “drug- likeness” filters

Techniques for Analysis of HTS Data Can’t use multiple linear regression or partial least squares as statistical tests –Data sets are too large Data visualization Data reduction Data mining (if activity data is known)

HTS Methodology Procedure: –Measure activity at different concentrations for a subset of compounds –Define IC50 (Inhibitory Concentration 50): the concentration of a material estimated to inhibit the biological endpoint of interest (e.g., cell growth, ATP levels) by 50% –Solid pure sample that tests positively gets structure determined (hits-to-leads phase)

DATA VISUALIZATION Need to display simultaneously large data sets with many thousands of molecules and their properties Typical software packages: –Draw various kinds of graphs –Color selected properties –Calculate simple statistics HTS data sets may be divided into subsets to aid navigation

SpotFire DecisionSite DecisionSite Examples http://www.spotfire.com/

Features of Data Visualization Often combined with structure searching to find compounds with certain features Unsupervised methods – don’t use activity data Supervised methods – incorporate activity data Use of molecular descriptors

Non-Linear Mapping Descriptors: –Physicochemical properties –Fingerprints: a Boolean array with the meaning of each bit not predefined List of patterns is generated for each –Atom, pair of adjacent atoms, bonds connecting them –Each group of atoms joined by longer pathways –Substructural fragments –Known activity against related targets

Non-Linear Mapping (cont’d) Non-Linear Mapping takes multidimensional data to a lower space (2- or 3-dimensional) Multidimensional scaling –Generate initial set of coordinates in the low- dimensional space –Modify the coordinates using optimization procedures

DATA MINING METHODS Construct models that enable the establishment of relationships between the structures and the observed activity Simple division of structures is desirable: –Active vs. inactive –High, medium, or low activity classes

Data Mining Methods: Techniques Substructural analysis: weight each aspect of the structure according to a pre- assigned activity designation act i W i = ---------------- act i + inact i

Data Mining Techniques Discriminant Analysis: aims to separate the molecules into constituent classes –Linear discriminant analysis works with two variables and two activity classes Straight line separates the data into areas where the maximum number of correct activities is found

Data Mining Techniques Neural Networks – need a training set of data Once trained, the program predicts values for new molecules Examples: feed-forward network and Kohonen network (self-organizing map) Problem: over-training—gives excellent results on the test data, but poor results on unseen data

Data Mining Techniques Decision Trees –Rules associate specific molecular and/or descriptor values with the activity or property of interest –Start with the entire data set and identify the descriptor or variable that gives the best split –Follow the procedure until no more splits are possible or desirable –Some consider multiple splits at each node

SUMMARY Much interest and research on HTS analysis New techniques being applied (e.g., support vector machines) Analysis of large diverse data sets needs the most work Results need to feed into subsequent analysis

Analysis of High-Throughput Screening Data C371 Fall 2004.

Similar presentations

Presentation on theme: "Analysis of High-Throughput Screening Data C371 Fall 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of High-Throughput Screening Data C371 Fall 2004.

Similar presentations

Presentation on theme: "Analysis of High-Throughput Screening Data C371 Fall 2004."— Presentation transcript:

Similar presentations

About project

Feedback