Scientific & technical presentation Fragmenter Nóra Máté Sept 2005.

Slides:



Advertisements
Similar presentations
February 2013 Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration.
Advertisements

Solutions for Cheminformatics
Virtual Synthesis - Reactor
August 2010, ACS National meeting, Boston Representation of Markush structures from molecules towards patents Szabolcs Csepregi Solutions for Cheminformatics.
Introduction to Web Design Lecture number:. Todays Aim: Introduction to Web-designing and how its done. Modelling websites in HTML.
1 Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia First presented at Applications of Cheminformatics.
Version 5.3, February 2010 Scientific & technical presentation JChem Base.
Scientific & technical presentation JChem Cartridge for Oracle
May, 2008 Presenting: Szabolcs Csepregi The ChemAxon Markush project overview and development discussion.
Scientific & technical presentation Calculator Plugins January 2011.
Instant JChem INFORMATICS MATTERS
Java Solutions for Cheminformatics Feb 2008 Whats new for PP.
Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
Calculator Plugins József Szegezdi, Nóra Máté. ChemAxon Calculator Plugins ChemAxons plugin handling mechanism provides a framework for calculating various.
Structural Search Using ChemAxon Tools
Scientific & technical presentation Standardizer January 2008.
Chemical Naming Daniel Bonniot, PhD October 2008.
Nov 2008 Scientific & technical presentation JChem for Excel.
Pipeline Pilot Integration Szilard Dorant Solutions for Cheminformatics.
In Silico Synthesis György Pirok, Nóra Máté. Elements of the Virtual Synthesis Technology A language for describing chemical rules –Chemical Terms A library.
Scientific & technical presentation Calculator Plugins József Szegezdi, Nóra Máté Sept 2005.
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Solutions for Cheminformatics
Standardizer Molecular Cosmetics for Chemoinformatics György Pirok Nóra Máte István Cseh Szilárd Dóránt Péter Kovács Szabolcs Csepregi Ferenc Csizmadia.
UGM, June, 2007 Presenting: Szabolcs Csepregi JChem Base and Cartridge latest.
1 Szabolcs Csepregi May, 2005 Structural Search Using ChemAxon Tools.
1 Péter Kovács May, 2005 Compound storage / retrieval with JChem Cartridge for Oracle.
UGM, June, 2007 Szabolcs Csepregi Markush: Whats new, development discussions.
1 György Pirok, Szilárd Dóránt May, 2005 What is Marvin and how to...
2008 Accelrys EUGM Pipelining ChemAxon Szilard Dorant Solutions for Cheminformatics.
Java Solutions for Cheminformatics March About Us Molecule Drawing and Visualization Structure Searching Cartridge Structure Standardization Molecular.
Solutions for Cheminformatics
XML III. Learning Objectives Formatting XML Documents: Overview Using Cascading Style Sheets to format XML documents Using XSL to format XML documents.
Analysis of High-Throughput Screening Data C371 Fall 2004.
CSCI3170 Introduction to Database Systems
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
September 2014, Version Szilárd Dóránt Scientific & technical Presentation Pipeline Pilot Integration.
Tutorial 11: Connecting to External Data
Access Tutorial 3 Maintaining and Querying a Database
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
With Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Office 2007 Intermediate.
With Microsoft Access 2007 Volume 1© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access 2007 Volume 1 Chapter.
Analyzing Data For Effective Decision Making Chapter 3.
Tutorial 5 Formatting with CSS. Objectives Session 5.1 – Evaluate why CSS styles are used – Determine where to write styles – Create an element selector.
May 2009 ChemAxon - What’s New?. What’s new and hot? All products have seen enhancements in the past 12 months BUT WHAT’S REALLY HOT?
Gmail Labels + Filters. Table of Contents Purpose Logging In What ARE labels Creating labels How can you USE labels What ARE filters Creating filters.
© 2009 Bentley Systems, Incorporated Chris Collins D&C Manager Quantities.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
SESSION 3.1 This section covers using the query window in design view to create a query and sorting & filtering data while in a datasheet view. Microsoft.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making Chapter.
Microsoft Office XP Illustrated Introductory, Enhanced Tables and Queries Using.
Parameter Study Principles & Practices. What is Parameter Study? Parameter study is the application of a single algorithm over a set of independent inputs:
1 Chapter 4: Creating Simple Queries 4.1 Introduction to the Query Task 4.2 Selecting Columns and Filtering Rows 4.3 Creating New Columns with an Expression.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Linux+ Guide to Linux Certification, Second Edition
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
XML Schema – XSLT Week 8 Web site:
General Architecture of Retrieval Systems 1Adrienn Skrop.
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
June 2016, Version Scientific & technical Presentation Pipeline Pilot Integration.
Finite State Machines Dr K R Bond 2009
Microsoft Office Access 2010 Lab 2
Pipeline pilot Components
Daylight and Discovery
Presentation transcript:

Scientific & technical presentation Fragmenter Nóra Máté Sept 2005

Contents Fragmenter creating fragments by cleavage rules Fragment statistics sorting fragments by activity values R-group decomposition finding a scaffold with attached ligands web:

Fragmenter Basics Fragmenter cleaves single bonds to generate molecular fragments. The cleavage rules correspond to chemical reactions in order to enhance synthetic accessibility. Fragmenter fragments molecules based on predefined cleavage rules. The cleavage rules are given in form of reaction molecules in the configuration XML. By default, all non-ring bonds matching the cleavage bonds in the rules are cleaved. However, it is possible to provide a revision algorithm that forbids certain cuts depending on predefined criteria (e.g. the resulting fragment size, the structural environment of the bond, the number of cleaved bonds in the resulting fragments, etc.). Currently one such algorithm is implemented: the RECAP method.

The RECAP Method All non-ring bonds matching the cleavage bonds in the rules are allowed to be cleaved by default. The RECAP algorithm forbids the cleavage of some bonds according to the following rules: 1.Never cut a hydrogen-connecting bond. 2.Never cut a bond connecting a ring-carbon and a hetero atom (optional). 3.Never cut ring bonds. (Fragmenter always keeps this rule, we add it here for completeness.) 4.Refuse a cut if any of the resulting fragments is on the specified Notlist. 5.Refuse a cut if the number of open bonds in any of the resulting fragments exceeds the specified limit. 6.Refuse a cut if the number of atoms in any of the resulting fragments is less than the predefined minimal atom count.

Fragmenter Cleavage Rules The following rules are typically used with the RECAP algorithm, but Fragmenter accepts any custom cleavage rules described by reaction equations. The cleavage points on the fragments are labeled with the cleavage rules:

Configuration Standardization Cleavage reactions Fragmenter parameters RECAP parameters

Fragmentation Example I. An example fragmentation of tamoxifen (left), an oestrogen antagonist and atenolol (right), an anti-hypertension drug:

Fragmentation Example II. An example fragmentation with amine type cleavage bonds: input molecule fragments amine cleavage

Fragmentation Example III. All fragments of the same input molecule (extensive fragmentation):

Fragment Statistics Basics FragmentStatistics creates statistical results from the output of Fragmenter. The simplest usage is to remove duplicate fragments and sort fragments by occurrence, but FragmentStatistics can also sort fragments by a scoring function based on molecule activity or other data read from the input molecules and stored together with the generated fragments.

Fragment Statistics Input / Output The input of FragmentStatistics is the output of Fragmenter in cxsmiles format with the following fields: 1.SMILES string 2.atom labels storing fragment cleavage data 3.unique ID (used for fragment duplicate check) 4.input molecule data read from SDFile tag (optional, e.g. molecule activity) The output of FragmentStatistics is a sorted cxsmiles table with the following data: 1.SMILES string 2.atom labels storing fragment cleavage data 3.atom count 4.fragment counts per activity categories (number of identical fragments in each activity category, one field for each) 5.score

The Scoring Function Fragments are sorted by activity which is calculated in form of a scoring function: ac x (w1*c1 + w2*c wN*cN) ac is the heavy atom count w1, w2,..., wN are the category weights in descending order (default: from +1 to -1, equidistant) c1, c2,..., cN are the fragment counts in each category, in descending activity order x is the exponent of the heavy atom count (default: 1 ) If there is no activity data then FragmentStatistics simply removes fragment duplicates and sorts fragments by ac x c1 where c1 is the fragment count. By default the exponent is 1 and the score is thus ac*c1. If there are two activity categories then the default scoring function is ac(c1 - c2), if there are three categories, then it is ac(c1 - c3).

Scoring Function Example – single cutoff value Two activity ranges with cutoff value 0.5 :

Scoring Function Example – discrete activity range Discrete activity values:

Generating Fragment Statistics I. We start with a large set of input molecules with activity data: Activity = 4 Activity = 0.05 Activity = 5 Activity = 50

Generating Fragment Statistics II. For the purpose of fragment statistics, start with generating a broad set of fragments without the RECAP (or any other) restrictions: Standardization Cleavage reactions Extensive fragmentation

Generating Fragment Statistics III. The generated fragments inherit the activity values from the parent molecule: Fragments (with cleavage and activity data) are stored in cxsmiles format. The activity data is stored in field_1. field_1 = 50

Generating Fragment Statistics IV. First make statistics with duplicate filtering and sorting. These are the 4 most active fragments (by score = atom count * occurrence): field_0: atom count field_1: fragment occurrence field_2: score (field_1 * field_2)

Generating Fragment Statistics V. Next include activity data in the scoring, with cutoff value 1. This means that molecules with activity value at least 1 are considered active, while all others are inactive. These are the 4 most active fragments (by score = atom count * (active occurrence - inactive occurrence)): field_0 : atom count field_1 : fragment occurrence in the active set ( score >= 1 ) field_2 : fragment occurrence in the inactive set ( score < 1 ) field_3 : score ( atom count * (active occurrence - inactive occurrence) )

R-group Decomposition – the query R-group decomposition is a special kind of substructure search that aims at finding a central structure - scaffold - and identify its ligands at certain attachment positions. The query molecule consists of the scaffold and ligand attachment points represented by R-groups: The two R1 nodes should match identical structures by default – but this behaviour can be changed by setting the –p (--skip-same-structure-check) parameter

R-group Decomposition – the targets Our sample targets all contain the query (scaffold with R-group attachment points) but not all of them can satisfy the condition of identical R1-ligands: single hit: identical R1 ligands single hit: the same R1 ligand (R-bridge) more hits: all with different R1 ligands more hits: one with identical R1 ligands

R-group Decomposition – decomposition I. Attachment points can be denoted by different symbols, depending on the –a (--attachment-symbol) option: N: none P: attachment point A: any-atom (default) M: atom map L: atom label

R-group Decomposition – decomposition II. SMILES table: the output is written in a SMILES table if the –f (--format) option is omitted: Otherwise the output is written as molecule series in the specified output form, with atom color codes (separated by ; characters) stored in the DMAP property (SDF/MRV tag). The code is: 0: scaffold atom n: Rn ligand atom (n > 0) -: non-hit atom Example: 0;0;1;1;1;1;2;2;1;0;-;-

R-group Decomposition – decomposition III. The DMAP property can be used in mview to color the atoms according to a color-map file that maps the color codes to colors. Set the –p (--skip-same-structure-check) option to allow the two R1 nodes match different ligands. Finally, use the –A (--allhits) option to see all possible decompositions. In this way our last target will also have two decompositions:

Visit other technical presentations MarvinSketch/View MarvinSpace Calculator Plugins JChem Base JChem Cartridge Standardizer Screen JKlustor Fragmenter Reactor

References Fragmenter, fragment statistics: R-group decomposition: 1.RECAP - Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorical Chemistry In: J. Chem. Inf. Comput. Sci. 1998, Schneider, G. et al.; De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J. Comput.-Aided Mol. Des. 2000, 14,