Download presentation
Presentation is loading. Please wait.
Published byRalf Blake Modified over 9 years ago
1
finding the right stuff Mining chemical structural information from the drug literature… finding the right stuff Debra L. Banville, Ph.D.
2
September 2006 ACS Meeting San Francisco, CA BACKGROUND: BACKGROUND: The problem… Total # GPCR publications to date (articles + patents) 9 290 >14,000
3
September 2006 ACS Meeting San Francisco, CA BACKGROUND: Document Retrieval (DR) GatheringMining Information Extraction (IE) …would complicate interpretation of the results. Therefore, phosphorylation of sphingosine was measured in membranes prepared from mCB 1 -CHO cells and mouse cerebellum. No detectable levels of S1P were formed by any of the membrane preparations (Fig. 4). In contrast, formation of S1P from sphingosine was readily detected in membranes from HEK cells transfected with SphK1 only in the presence of added ATP, suggesting that membranes from CHO cells and cerebellum do not phosphorylate sphingosine in the binding assays. These results also suggest that …Fig. 4
4
September 2006 ACS Meeting San Francisco, CA Filter – Without losing information? Discover new information – Infusing knowledge & experience? Manage – Over time? DEFINE: DEFINE: How can a researcher…
5
September 2006 ACS Meeting San Francisco, CA DEFINE: =>Document Retrieval DEFINE: The First step =>Document Retrieval TITLE AUTHOR/AFFILIATION ABSTRACT INDEXING/KEYWORDS Document Retrieval
6
September 2006 ACS Meeting San Francisco, CA DEFINE: =>Information Extraction DEFINE: The Second Step =>Information Extraction MiningGathering 290 1992 >14,000 2005
7
September 2006 ACS Meeting San Francisco, CA SOLUTIONS: SOLUTIONS: Pharmaceutical Information Mining Ideally the marriage of biological & chemical information needs to be the ultimate focus of information mining applications MVCEGKRSASCPCFFLLTAKFYWILTMMQRTHSQ EYAHSIRVDGDIILGGLFPVHAKGERGCGELKKE KGIHRLEAMLYAIDQINKDPDLLSNITLGVRILD TCSRDTYALEQSLTFVQALIEKDASDVKCANGDP PIFTKPDKISGVIGAAASSVSIMVANILRLFKIP IA STAPELSDNTRYDFFSRVVPPDSYQAQAMVDIVT ALGWNYVSTLASEGNYGESGVEAFTQIS REIGGVCIAQSQKIPREPRPGEFEKIIKRLLETP NARAVIMFANEDDIRRILEAAKKLNQSGHFLWIG SDSWGSKIAPVYQQEEIAEGAVTILPK RASIDGFDRYFRSRTLANNRRNVWFAEFWEENFG CKLGSHGKRNSHIKKCTGLERIARDSSYEQEGKV QFVIDAVYSMAYALHNMHKDLCPGYIGLCPRMST IDGKELLGYIRAVNFNGSAGTPVTFNENGDAPGR YDIFQYQITNKSTEYKVIGHWTNQLHLKVEDMQW AHREHTHPASVCSLPCKPGERKKTVKGVPCCWHC ERCEGYNYQVDELSCELCPLDQRPNMNRTGCQLI PIIKLEWHSPWAVVPVFVAILGIIATTFVIVTFV RYNDTPIVRASGRELSYVLLTGIFLCYSITFLMI AAPDTII CSFRRVFLGLGMCFSYAALLTKTNRIHRIFEQGK KSVTAPKFISPASQLVITFSLISVQLLGVFVWFV VDPPHIIIDYGEQRTLDPEKARGVLKCDISDLSL ICSLGYSILLMVTCTVYANKTRGVPETFNEAKPI GFTMYTTCIIWLAFIPIFFGTAQSAEKMYIQTTT LTVSMSLSASVSLGMLYMPKVYIIIFHPEQNVQK RKRSFKAVVTAATMQSKLIQKGNDRPNGEVKSEL
8
September 2006 ACS Meeting San Francisco, CA The Major Barrier to information mining Lack of universal publication standards & structure – Terminology – Indexing policies – Etc…
9
September 2006 ACS Meeting San Francisco, CA Terminology Barriers… How many ways can you say aspirin… Abbreviations Systematic Acetylsalicylic acid salicylic acid, acetate 2-acetyloxybenzoate 2-carboxyl phenylacetate Common/generic Company Codes Trade names Index & Ref. Anaphors Compound 10… Generic & fragmented Chemical Structures
10
September 2006 ACS Meeting San Francisco, CA Indexing Barriers… Mapping interleukin-8 Receptor to G-protein coupled receptors EMBASE or Medline indexing? CAS indexing? No Yes
11
September 2006 ACS Meeting San Francisco, CA Formatting & Copyright/Licensing Barriers… Textual and/or images mixed together Diverse document formats Some are images only! Access to full text restricted
12
September 2006 ACS Meeting San Francisco, CA Other challenges… AMBIGUITY Methyl+Ethyl+Malonate
13
September 2006 ACS Meeting San Francisco, CA UNSTRUCTURED Other Challenges … Structured & Unstructured text
14
September 2006 ACS Meeting San Francisco, CA Barriers in identifying the “right” set of sources & managing diverse output… Full Text Bibliographic & Indexed Citations Document Retrieval MULTIPLE SOURCES The “right” content Medline/MeSh Embase/EMTREE CAS/CT, etc … Scirus HighWire Journals @ OVID internal reports recent USPTO eJournals e.g. Patent authorities – USPTO, WO, EP, JP, etc…
15
September 2006 ACS Meeting San Francisco, CA Lowering these barriers Chemical reading capability: –Recognition phenylacetate –Extraction …interaction of phenylacetate with… –Conversion to searchable form –Annotation/tagging of text to entity … interaction of phenylacetate with … Dream or reality…
16
September 2006 ACS Meeting San Francisco, CA Eugene Garfield Recognizing Chemical Names by Machine … Historical Solution— 1958 Chemical name recognition & extraction …Opler estimated that at least ten man-years would be required just to write the necessary computer programs for display any type of chemical diagram after suitable linguistic analyses…(Opler, Private communication to Garfield, 1959)….Subsequently, I turned to the possibility of calculating molecular formulas… Essays of an Information Scientist (1984) 7, 441
17
September 2006 ACS Meeting San Francisco, CA Historical Solutions— 30+ years later Chemical name recognition & extraction Standard Generalized Markup Language (SGML) 1980’s: Gail Hodge et al. -extraction and conversion of name from text fields 1990’s: Chowdury & Lynch then Kemp & Lynch –Extraction from abstract summaries: –Segmentation algorithms & statistics –Chemical names –Full text patents. 2000’s: 2000’s Focus on full unstructured text
18
September 2006 ACS Meeting San Francisco, CA Commercial developments: – Reel2’s SureChem – MDL/Temis Reading Machine – Simbiosys’ CLiDE – Etc… In-house/AZ example Other Solutions Chemical name recognition & extraction
19
September 2006 ACS Meeting San Francisco, CA SureChem Example
20
September 2006 ACS Meeting San Francisco, CA MDL Reading Machine Name Candidates Identify: Formal structure Physical properties Name Candidates Labels, Abbreviations, Anapher
21
September 2006 ACS Meeting San Francisco, CA SimBioSys CLiDE TM Example Table w/ chemical images
22
September 2006 ACS Meeting San Francisco, CA Chemical Image Extraction WO-Patent Application
23
September 2006 ACS Meeting San Francisco, CA Business rationale at AZ… AstraZeneca patent applications describing alpha ‑ 7 agonists Patent Number Cmpds WO1996006098024 WO1997030998036 WO1999003859118 WO2000042044082 WO2001029034126 WO2001036417110 WO2001060821261 Totals 757 named compounds including reagents & intermediates 1.3'-Methylspiro[1-azabicyclo[2.2.2]octane-3,5'-oxazolidin]-2'- one monohydrochloride 2.3-Hydroxy-1-azabicyclo[2.2.2]octane-3-acetic acid 3.(3S)-Spiro[1-azabicyclo[2.2.2]octane-3,5'-oxazolidin]-2'-one monohydrochloride 4.3-Hydroxy-1-azabicyclo[2.2.1]heptane-3-acetic acid hydrazide 5.Spiro[1-azabicyclo[2.2.2]octane-3,5'-oxazolidin]-2'-one monohydrochloride 6.Etc.. Nonstandard IUPAC name –Spiro[1-azabicyclo[2.2.2]octane-3,2'(1'H)- furo[2,3-c]isoquinoline]
24
September 2006 ACS Meeting San Francisco, CA Defining a vision >14,000 2005 Contextual-- Extracted Summaries
25
September 2006 ACS Meeting San Francisco, CA Undiscovered Public Knowledge Undiscovered Public Knowledge David R. Swanson Science may be better served by a new image of its literature as a… vast mosaic of undiscovered connections… – a world with its own endless frontier. Reference: Swanson DR: Medical literature as a a potential source of new knowledge. Bulletin of the Medical Library Association 1990 78(1): 29-37. a potential source of countless recombinant ideas
26
September 2006 ACS Meeting San Francisco, CA The End Acknowledgments: James Rosamond James Damewood Bob Stumpo Jessica Pfennig and… many thanks to you for your kind attention!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.