ACS Symposium – Challenges In Structure Searching

Slides:



Advertisements
Similar presentations
May, 2008 Presenting: Szabolcs Csepregi The ChemAxon Markush project overview and development discussion.
Advertisements

Version 5.3, April 2010 The ChemAxon Markush project overview and development discussion.
UGM, June, 2007 Szabolcs Csepregi Markush: Whats new, development discussions.
Solutions for Cheminformatics
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Functional Groups.
1 ISSUES IN SMALL ORGANIC MOLECULES Michael G. Hartley Supervisory Patent Examiner US Patent & Trademark Office Art Unit
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Chemical Non-Statutory Double Patenting Examples Daniel Sullivan SPE, Art Unit 1621.
Tips and Tricks Chemistry November Edition CAS... we are scientists, creating and delivering the most complete and effective digital information.
Functional groups The functional groups are atoms or combinations of atoms which determine the properties of organic molecules.
Structure Searching with STN Express
A Guide to MySQL 7. 2 Objectives Understand, define, and drop views Recognize the benefits of using views Use a view to update data Grant and revoke users’
1 MARPAT Basics (Markush Structures). 2 Where did the term Markush come from? In 1923 Dr Eugene A Markush filed a patent application in the United States.
Microsoft ® Office Word 2007 Training Mail Merge II: Use the Ribbon and perform a complex mail merge [Your company name] presents:
1 Chemical Structure Representation and Search Systems Lecture 5. Nov 13, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Aldehydes and ketones Chapter 15. The carbonyl group Aldehydes and ketones are among the first examples of compounds that possess a C-O double bond that.
Chapter 22: Hydrocarbon Compounds
1 Advanced Structure Search. 2 Structure search in BEILSTEIN
Aldehydes and Ketones Part II. Carbonyl vs. Cyano Group.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 24 Managing and Reporting Database Information 1 Morrison / Wells / Ruffolo.
Representing Markush Structures from Patents and Combinatorial Libraries Dr John M. Barnard Scientific Director Digital Chemistry.
COMP 208/214/215/216 – Lecture 8 Demonstrations and Portfolios.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
PubMed Overview From the main HINARI webpage, we can access PubMed by clicking on Search HINARI journal articles through PubMed (Medline). Note: If you.
Alkanes and Cycloalkanes. Hydrocarbons (contain only carbon and hydrogen) a)Saturated: (Contain only single bonds) Alkanes (C n H 2N + 2 ) Cycloalkanes.
1 SCH4U - Introduction to Organic Chemistry *S*T*A*R*R*I*N*G**S*T*A*R*R*I*N*G**S*T*A*R*R*I*N*G**S*T*A*R*R*I*N*G* ALKANES The ALKANES ALKENES The ALKENES.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
ORGANIC CHEMISTRY The scientific study of the structure, properties, composition, reactions, and preparation (by synthesis or by other means) of chemical.
11 MMS Merged Markush Service The QuestelOrbit Alternative for Chemical Information Elliott Linder, QuestelOrbit Joe Terlizzi, QuestelOrbit 227 th ACS.
Chapter 9. We earlier defined a class of compounds called hydrocarbons (containing C and H and nothing else). Hydrocarbons form the backbone of an important.
Chapter 5-2. Chemistry of Benzene: Electrophilic Aromatic Substitution
Amines. 2 Learning Objectives Chapter ten discusses the following topics and by the end of this chapter the students will:  Know.
Slide 1- 1 Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
Organic Chemistry Carbon is the basis of organic chemistry Carbon has the ability to make 4 covalent bonds. Carbon can repeatedly make covalent bonds to.
Chapter Twelve Introduction to Organic Chemistry: Alkanes James E. Mayhugh Copyright © 2010 Pearson Education, Inc. Fundamentals of General, Organic and.
Alkanes are hydrocarbons that contain only single bonds. Section 2: Alkanes K What I Know W What I Want to Find Out L What I Learned.
Chapter 4 Some basic Probability Concepts 1-1. Learning Objectives  To learn the concept of the sample space associated with a random experiment.  To.
Aromatic compounds
Chapter 12 Amines Suggested Problems: 24-6,30-32,34-5,36,38,50,54.
Amines
Amines
Comp Tax Presenter : Kara Moore.
Convegno AIDB Trieste – June 19, 2009
IUPAC nomenclature.
West Valley High School
Indexing Structures for Files and Physical Database Design
Aromatic compounds 1.
Aromatic compounds
Organic chemistry part II
Chapter 12: Query Processing
Chapter 2 Alkanes: Nomenclature and an Introduction to Synthesis
Aromatic compounds
Physical Database Design
A Brief overview of STN®
Carbon: Not Just Another Element
Benzene & Aromatic Compounds
Amines
Systems of Equations in Two Variables
Amines
Unsaturated Hydrocarbons Alkynes and dienes
Islamic University of Gaza
Amines Structure Organic derivatives of ammonia, NH3.
Organic Functional Groups
ORGANIC CHEMISTRY The scientific study of the structure, properties, composition, reactions, and preparation (by synthesis or by other means) of chemical.
Nomenclature of Heterocyclic Compounds
Organic Chemistry CHEM 145
Presentation transcript:

ACS Symposium – Challenges In Structure Searching April 8, 2007 Comparing Merged Markush Service (MMS) and Marpat Search Results: Two Case Studies Joe Terlizzi Questel, Inc. jterlizzi@questel.com

Background In October 2007, the JFA pharmaceutical group, a group of professional searchers from the pharmaceutical industry in Japan, asked me to compare two chemical structures searched in both MMS and Marpat. They had received inconclusive results when they ran the searches. The following two case studies conducted at that time illustrate many of the similarities and differences in the two systems. They show my search procedure, results, and conclusions.

Background The Merged Markush Service (MMS), jointly produced by Thomson and the French Patent Office (INPI) is a database containing both Markush structures and specific compounds from patents. It is based on the Markush Darc system and the service is exclusively hosted on Questel. Marpat, produced by Chemical Abstracts Service and available only on STN, contains Markush structures from patents. It can be searched with the REGISTRY file (for specific structures) using the CASLINK cluster on STN.

Background Both MMS and Marpat are usually recommended for a basic chemical structure search strategy covering Markush structures in patents. Derwent’s Chemical Fragmentation Code system (only available to Derwent Subscribers) can also yield unique answers, but since it is not a graphical system and was not requested by the JFA, it was not used for this study.

Comparing MMS and Marpat CASE 1: The following chemical structure is from an EP patent document published in 1987. In a JFA study, this document was retrieved from MMS. It was not retrieved in Marpat. Why not? X: N or CR2 Y: N or CR2 R: C1-5 alkyl, C3-6 cycloalkyl R1: halogen, amino, -N=CH-R2 R2: independently H, halogen, hydroxyl, C1-5 straight or branched alkyl, or an optionally substituted aromatic or heteroaromatic residue

Case 1:The query was created in MMS in the following way: X: N or CR2 Y: N or CR2 R: C1-5 alkyl, C3-6 cycloalkyl R1: halogen, amino, -N=CH-R2 R2: independently H, halogen, hydroxyl, C1-5 straight or branched alkyl, or an optionally substituted aromatic or heteroaromatic residue X = G1 G0 Y =G2

R =G3 R1 =G4 X: N or CR2 Y: N or CR2 R: C1-5 alkyl, C3-6 cycloalkyl R1: halogen, amino, -N=CH-R2 R2: independently H, halogen, hydroxyl, C1-5 straight or branched alkyl, or an optionally substituted aromatic or heteroaromatic residue R1 =G4

A free site was put on the carbon in G1, G2 and G4 to cover R2 X: N or CR2 Y: N or CR2 R: C1-5 alkyl, C3-6 cycloalkyl R1: halogen, amino, -N=CH-R2 R2: independently H, halogen, hydroxyl, C1-5 straight or branched alkyl, or an optionally substituted aromatic or heteroaromatic residue

Comparing MMS and Marpat In MMS, the AA search resulted in 19 answers and no RX candidates: 1 CN = 87060014-01 2 CN = 90010268-01 3 CN = 97085103-01 4 CN = 8722-02601 5 CN = 8751-17501 6 CN = 9004-11601 7 CN = 9044-07401 8 CN = 9048-37001 9 CN = 9048-43801 10 CN = 9144-00701 11 CN = 9631-16901 12 CN = RAITC9 13 CN = RAITCA 14 CN = RAITCM 15 CN = RAITCN 16 CN = RAITCP 17 CN = RAITD3 18 CN = RAITD8 19 CN = RAJLWV The first CN listed corresponds to the EP document AN - 87060014 CN - 87060014-01-N; 87060014-01-T PN - EP224121 - 19870603 [EP-224121] AP - EP86115667 19861112 [1986EP-0115667] PR - IT2288785 19851119 [1985IT-0022887] - IT2288885 19851119 [1985IT-0022888] RL - US4758567 - 19880719 [US4758567] PA - RORER ITALIANA S.p.A. / Via Valosa di Sopra, 9 / I-20052 S.Fruttuoso di Monza (Milan) (IT) - ROTTAPHARM S.p.A. / Via Dandolo, 4 / I-21100 Varese (IT) (Updated 8825) IC1 - C07D-215/56 IC2 - C07D-471/04; A61K-031/47; A61K-031/53; A61K-031/495 ET - 7- 4-amino-piperazinyl - or 7- 4-chloro-piperazinyl quinolinone and azaquinolinone derivatives, a process for the preparation thereof and pharmaceutical compositions containing them. EAB - 4-oxo-7-piperazino-(quinoline or azaquinoline)-3-carboxylic acid derivatives. Process of preparation thereof. These compounds are antibacterial agents PHCN- 11 : INFECTION - 08 : NEPHROLOGY, UROLOGY

Comparing MMS and Marpat CASE 1 results in MMS: There were 12 unique patent records retrieved – 3 from PHARM and 10 from DWPI, with 1 overlap. The most recent record was US7256187 from August 2007. The EP record from the JFA study was only in PHARM, since it was indexed in the BACKF segment (Backfile)

Comparing MMS and Marpat The query was created for Marpat: G1= G2= G3=

Comparing MMS and Marpat There were not many differences in creating this query in MMS and Marpat. Some differences were: G1 could be repeated in Marpat; you cannot repeat a G group in MMS Rings default to possible non-hydrogen substitution (see next slide) whereas in MMS, no free sites were substituted on ring

Non-Hydrogen Attachments Default is for Non-Hydrogen Attachments Searchers can choose to override defaults in Marpat.

Some other differences in this query in MMS and Marpat were: Atom/Class in Marpat (translation in MMS) was set with CLASS (equivalent to BT in MMS) on G group substituents. The MMS structure defaults to equal translation. Atom/Class Match Level

Comparing MMS and Marpat Case 1 Results using CASLINK (Marpat/Registry/MarpatPrev): CASLINK search had 25 results. There were unique patent results in both MMS and Marpat for this structure. Most recent US7256187 patent was only in MMS. The EP patent missing from JFA study was found in Marpat!

Comparing MMS and Marpat Why didn’t the JFA study retrieve this result in Marpat? The JFA query used for Marpat was not as broad as my query; therefore the EP patent was not retrieved. What were the possible differences in my query and the JFA’s? Not sure, but it could have been that my query allowed for substitution on the rings. Why the difference in Marpat results and MMS? Since the Marpat query was broader than the MMS query because of the open substitution, there were a greater number of results in Marpat.

Comparing MMS and Marpat CASE 1 Conclusions: Default levels in both systems must always be taken into account. Atom/Class in Marpat corresponds to Translation Level in MMS. Defaults are very different in both systems, with Marpat having more broader search defaults. Non- Hydrogen attachment defaults (Marpat) and free sites (MMS) must also be taken into account. Unique results are often achieved in both systems.

Comparing MMS and Marpat CASE 2: The following structure is from an US patent document published in 1990. Is there unique retrieval in MMS or Marpat in a freedom-to-operate search? (Do not take into account specific structures in MMS or differences in patent coverage.) X: N or CH Y: O, S, or NH Z: O, S, or NH R1: an unsubstituted carbocyclic or heterocyclic aromatic group, or a carbocyclic or heterocyclic aromatic group substituted with at least one lower alkyl, lower alkoxy, halogen, lower alkylthio or nitro group R2: C1-3 alkyl R3: C1-3 alkyl or R2 together with R3 may form a heterocyclic ring, which includes at least one heteroatom selected from O, N, or S.

Case 2 Using MMS X = G1 X: N or CH Y: O, S, or NH Original query Z: O, S, or NH R1: an unsubstituted carbocyclic or heterocyclic aromatic group, or a carbocyclic or heterocyclic aromatic group substituted with at least one lower alkyl, lower alkoxy, halogen, lower alkylthio or nitro group R2: C1-3 alkyl R3: C1-3 alkyl or R2 together with R3 may form a heterocyclic ring, which includes at least one heteroatom selected from O, N, or S. Original query X = G1

Case 2 Using MMS Y = G2 Z = G3 G2 and G3 are identical X: N or CH Y: O, S, or NH Z: O, S, or NH R1: an unsubstituted carbocyclic or heterocyclic aromatic group, or a carbocyclic or heterocyclic aromatic group substituted with at least one lower alkyl, lower alkoxy, halogen, lower alkylthio or nitro group R2: C1-3 alkyl R3: C1-3 alkyl or R2 together with R3 may form a heterocyclic ring, which includes at least one heteroatom selected from O, N, or S. Y = G2 Z = G3 G2 and G3 are identical

Case 2 Using MMS R1 = G4 5 free sites have been X: N or CH Y: O, S, or NH Z: O, S, or NH R1: an unsubstituted carbocyclic or heterocyclic aromatic group, or a carbocyclic or heterocyclic aromatic group substituted with at least one lower alkyl, lower alkoxy, halogen, lower alkylthio or nitro group R2: C1-3 alkyl R3: C1-3 alkyl or R2 together with R3 may form a heterocyclic ring, which includes at least one heteroatom selected from O, N, or S. R1 = G4 5 free sites have been applied to the superatoms

Case 2 Using MMS R2 and R3 are substituted with free sites X: N or CH Y: O, S, or NH Z: O, S, or NH R1: an unsubstituted carbocyclic or heterocyclic aromatic group, or a carbocyclic or heterocyclic aromatic group substituted with at least one lower alkyl, lower alkoxy, halogen, lower alkylthio or nitro group R2: C1-3 alkyl R3: C1-3 alkyl or R2 together with R3 may form a heterocyclic ring, which includes at least one heteroatom selected from O, N, or S. Because of MMS tautomer rules, unspecified bonds are applied

Case 2 Using MMS CASE 2 RESULTS There was 1 answer in MMS; it was the 1990 US patent. AN - 1990-375408 [50] TI - New penta:cyclic furo:benzoxazine derivs. having antimicrobial and antitumour activity and useful as intermediates to known tetra:cyclic antitumour cpds. PN - US4973693 A 19901127 DW1990-50 Eng * AP: 1989US-0401746 19890901 - JP03223292 A 19911002 DW1991-46 Jpn AP: 1990JP-0231832 19900901 AP - 1989US-0401746 19890901; 1990JP-0231832 19900901 PR - 1989US-0401746 19890901 PA - (FUJI) FUJISAWA PHARM CO LTD - (RIPT) RI PATENTS INC - (RICV) UNIV RICE CN - 9050-34901-N 9050-34902-N

Case 2 Using Marpat The query was drawn similarly in Marpat, but there was one problem. G groups in Marpat cannot be attached to more than 2 nodes and the system would not prepare the query. MMS allows attachments to more than two nodes. Since X has three attachments in the original query, a different solution had to be sought. Original query

Case 2 Using Marpat The following query was searched using CASLINK. Original query Case 2 Using Marpat The following query was searched using CASLINK. The A atom (any atom except H) was substituted for the G group. This allowed for a three node attachment. Cy variable was used for cyclic substitution at R1 All nodes were open for substitution

Original query Case 2 Using Marpat Another problem – This query had too many iterations in Marpat and would not run. The system limit was exceeded. S L2 SSS SAM FILE=MARPAT SAMPLE SEARCH INITIATED 16:22:46 FILE 'MARPAT' SAMPLE SCREEN SEARCH COMPLETED - 8566 TO ITERATE 23.3% PROCESSED 2000 ITERATIONS 0 ANSWERS INCOMPLETE SEARCH (SYSTEM LIMIT EXCEEDED) SEARCH TIME: 00.00.02 FULL FILE PROJECTIONS: ONLINE **INCOMPLETE** BATCH **INCOMPLETE** PROJECTED ITERATIONS: 167698 TO 174942 PROJECTED ANSWERS: 0 TO 0

Original query Case 2 Using Marpat Replacing A with C and N with the intention of running two queries also exceeded the system limits! Another tactic was approached. R2 and R3 were substituted with Ak (alkyl). I received the following message: STRUCTURE TOO LARGE - SEARCH ENDED A structure in your query is too large. You may delete attributes or atoms to reduce the size of the structure and try again.

Case 2 Using Marpat The query shown was entered: Original query Case 2 Using Marpat The query shown was entered: The bonds coming off the fused ring were denoted as ring bonds. Match level was ATOM for all nodes except the CY variable, which was CLASS. This query ran!

Case 2 Using Marpat => sss l1 full FULL SEARCH INITIATED 17:13:38 FILE 'MARPAT' FULL SCREEN SEARCH COMPLETED - 2868 TO ITERATE 100.0% PROCESSED 2868 ITERATIONS 1 ANSWERS SEARCH TIME: 00.00.06 L3 1 SEA SSS FUL L1 => dis bib L3 ANSWER 1 OF 1 MARPAT COPYRIGHT 2007 ACS on STN AN 114:164261 MARPAT Full-text TI Preparation of novel pentacyclic compounds as antimicrobial and antitumor agents IN Goto, Shunsuke; Fukuyama, Tohru PA Japan SO U.S., 11 pp. CODEN: USXXAM DT Patent LA English FAN.CNT 1 PATENT NO. KIND DATE APPLICATION NO. DATE --------------- ---- -------- --------------- -------- PI US 4973693 A 19901127 US 1989-401746 19890901 JP 03223292 A 19911002 JP 1990-231832 19900901 PRAI US 1989-401746 19890901 The full search had 1 answer: the US patent from 1990! No other answers were obtained in Marpat.

Comparing MMS and Marpat Conclusions: Is there unique retrieval for this case in Marpat or MMS? No Complex queries may not always run in MMS or Marpat but can usually be adjusted to run. Features such as bond attributes, unspecified bonds, unspecified atoms, can often help in allowing a structure to process. Speed and number of iterations will be different in each system.

Comparing MMS and Marpat Summary: Know the defaults in both MMS and Marpat, especially translation and free sites (MMS) and class level and non-hydrogen attachment (Marpat) defaults Familiarize yourself with bond normalization rules in both systems – they are different! Learn the unique features of both systems (ring isolation, subset searching, JOIN command, etc.) Use both systems for a comprehensive search!

HELP MMS Documentation and Information www.questel.com/mms MarPat Documentation and Information: www.cas.org Questel Help Desk: help@questel.com STN Help Desk: help@cas.org

Finally, much thanks to: Sandy Burcham (Service Is Our Business) Judy Philipsen (Philipsen Search Services) Kyoko Kaji (Pfizer) for allowing me to use these JFA examples Thank You!