Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
CS6800 Advanced Theory of Computation
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
BRISK (Presented by Josh Gleason)
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.
KEG PARTY!!!!!  Keg Party tomorrow night  Prof. Markov will give out extra credit to anyone who attends* *Note: This statement is a lie.
HCS Clustering Algorithm
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Ch 13 – Backtracking + Branch-and-Bound
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Information Theory and Security
Monté Carlo Simulation MGS 3100 – Chapter 9. Simulation Defined A computer-based model used to run experiments on a real system.  Typically done on a.
Combinatorial Chemistry and Library Design
Active Learning for Class Imbalance Problem
Template attacks Suresh Chari, Josyula R. Rao, Pankaj Rohatgi IBM Research.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Maximum Network Lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Cardei, M.; Jie Wu; Mingming Lu; Pervaiz, M.O.; Wireless And Mobile.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Optimization Flow Control—I: Basic Algorithm and Convergence Present : Li-der.
Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 5 Turbo Code.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Selecting Diverse Sets of Compounds C371 Fall 2004.
An Algorithm for Construction of Error-Correcting Symmetrical Reversible Variable Length Codes Chia-Wei Lin, Ja-Ling Wu, Jun-Cheng Chen Presented by Jun-Cheng.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
MAXIMALLY INFORMATIVE K-ITEMSETS. Motivation  Subgroup Discovery typically produces very many patterns with high levels of redundancy  Grammatically.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
When is Key Derivation from Noisy Sources Possible?
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Journal of Computational and Applied Mathematics Volume 253, 1 December 2013, Pages 14–25 Reporter : Zong-Dian Lee A hybrid quantum inspired harmony search.
Molecular Modeling in Drug Discovery: an Overview
Introduction to Algorithms: Brute-Force Algorithms.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
Objective of This Course
≠ Particle-based Variational Inference for Continuous Systems
Boltzmann Machine (BM) (§6.4)
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Uncoordinated Optical Multiple Access using IDMA and Nonlinear TCM
Information Theoretical Analysis of Digital Watermarking
Error Correction Coding
Presentation transcript:

Luddite: An Information Theoretic Library Design Tool Jennifer L. Miller, Erin K. Bradley, and Steven L. Teig July 18, 2002

Outline  Overview  Search Strategy  Cost Function  Algorithms  Algorithm Extensions  Implementation Details  Results

Overview  Genomics and proteomics provide many novel targets  Need to find drugs for targets Which compound to screen? What target? Methods to answer debated for many years  QSAR  Recently combinatorial and parallel synthesis techniques have transformed question of which single compound to analyze to one of which collection of compounds (library).

Overview  Develop algorithm for design libraries Discrete – collection of individual compounds Combinatorial – collections of compounds synthesized in a parallel or combinatorial fashion  Based on information theoretic techniques

Overview  Idea – Use molecules to “interrogate” target receptor about what chemical features are required for binding  Objective – Compose library maximizing conclusions drawn from “answers” across all possible experimental outcomes  Goal – Design library that allows discovery of most information about optimization target

Search Strategy  Strategies used in “20 Questions” are applicable Binary Search  With every guess eliminate half the search space Codeword Search  Every outcome corresponds to a single codeword  Optimal set of questions can be asked simultaneously  Same set of optimal questions can be used every time

Search Strategy

 Library design analogous to “20 Questions” Searching for features required for ligand binding, desired phenotype, and/or good pharmacokinetic properties instead of a number  “feature” – four-point pharmacophore

Search Strategy - Example

Search Strategy - Assumptions  “20 Questions” Analogy useful but assumes 1.Every compound tests half of possible features 2.Can synthesize any compound in design space 3.Every assay value is accurate 4.Goal is a single feature

Search Strategy - Remedies  Eliminating Assumptions 1. Minimum of log2(F) bits to decode F outcomes  Loose upper bound on number of compounds 2. Ability of set of questions to decode message is invariant to column reordering – therefore not necessary that every compound in design space be obtainable in order to find a maximally efficient set of questions

Search Strategy - Remedies 3. Error-correcting codes (ECC) based on Hamming Distance 4. Adjust probability of features in an iterative process and prune unlikely features.  Will probably lead to convergence  Enhances Efficiency  Improves probability of success

Cost Function  Given set of features search for a set of compounds that allow decoding of each individual feature If not possible seek to decode as many features as possible with flattest distribution across size of feature classes  Feature Class – subset of features that all have same codeword  Entropy well suited to this calculation

Cost Function - Entropy  Entropy – measure of uncertainty All codewords same – no uncertainty -> minimal entropy All codewords different -> maximum entropy Wish to optimize following equation  M is library measure  H is entropy of feature classes  C is # distinct classes  ||ci|| is size of feature class i  F is # of features

Cost Function – Entropy Example

Algorithm - Overview  Start with list of synthesized compounds  Goal - select subset to maximize entropy  State - set of compounds whose entropy can be calculated  Note: From entropy calculation that state is a function of classes but our moves through state space are a function of the compounds. In general can’t be calculated incrementally and must be completely reevaluated whenever the state changes  Stark contrast with other library design methods Despite seeming limitation method is very efficient

Algorithm - Details  Approach to discrete and combinatorial designs very similar  Both use a greedy build-up of library to desired number of compounds Greedy – technique that utilizes local max to find global max  Followed by a second phase that reevaluates each of the library components looking for a better selection  Repeat till no improvement

Algorithm - Extensions 1.Often desirable to guarantee certain items included in library 2.Ability to sub sample source pool during build-up and optimization phases  Dramatically decrease run time  Only slightly impact quality of designs 3.Define minimum Tanimoto fingerprint similarity between any two compounds in discrete library  1 implemented for discrete and combinatorial algorithms.  2 and 3 only implemented for discrete algorithm.

Implementation Details  C++  Microsoft Window NT  500 MHz Intel Pentium III  500 MB RAM

Results  9 different libraries selected with algorithm 273,373 compound source pool 3 component reaction A+B+C->D Monomer lists of length 33, point pharmacophore signatures calculated for all compounds in source pool  Compared final measures to optimal result and random result

Results

Results - Entropy  Combinatorial algorithm lags behind discrete one for performance  Discrete Library of 91 compounds has same measure as optimal combinatorial library of 250 compounds  Still possibly more cost- effective to synthesize combinatorial library  General rule – twice as many compounds required in a combinatorial library to achieve same information as a discrete library  Iterative setting Use combinatorial algorithm early to discover Use discrete algorithm later to cherry-pick specific compounds