Download presentation
Presentation is loading. Please wait.
Published byDinah Davis Modified over 9 years ago
2
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake (cblake@ics.uci.edu) Information & Computer Science University of California, Irvine Wanda Pratt (wpratt@u.washington.edu) Information School and Division of Biomedical & Health Informatics University of Washington
3
Motivation Information overload –MEDLINE = 11 million citations 8,000 each week –additional 8,000 each week Specialization of research –low communication between scientific areas –little focus on ‘big picture’
4
Goal Provide scientists with promising new treatment strategies Medical literature has implicit links Deductive logic can identify these links If A then B and If B then C then A C Assumptions
5
Previous Approach Swanson and Smalheiser (1997) Target Literature A Magnesium Source Literature C Migraine B-Calcium Channel Blockers B-Platelet Activity B-Serotonin...
6
Current Pruning WordsDistinct Words No Pruning 14,0512,762 Stemmed 13,1122,492 Manual Pruning 150 - 200 Remove ‘redundancies and non-useful terms’ ~92-94% of B-terms are manually pruned !
7
Our Approach Semantic representation –Unify synonymous text expressions –e.g. Serotonin = {5-HT, 5HT, Enteramine, 5-Hydroxytryptamine, 3-(2-Aminoethyl)- 1H-indol-5-ol } Prune using semantic types –e.g. Serotonin is a {Organic Chemical, Pharmacologic Substance, Neuroreactive Substance or Biogenic Amige}
8
Unified Medical Language System (UMLS) (1) Metathesaurus 311 vocabularies 776, 940 concepts ~11 million relationships 2.10 million strings (2) Semantic Network 134 semantic types 54 semantic relations (3) SPECIALIST lexicon POS + morphological 163 899 entries 133 945 nouns 13 179 verbs
9
Methodology Collect migraine citations Generate alternative features –word –concept –semantically pruned concepts Evaluate C B connections
10
Word Representation Domain independent Common choice Title words (to compare with Swanson) Removed –417 generic stopwords* e.g. a, and, between, their, really, room, said, think, the,... –31 medical stopwords e.g clinical, observed, provide, selection, study, therapy, test,... * Source: Sanderson, M. (1999) Available at http://www.dcs.gla.ac.uk/idom/ir_resources
11
Concept Representation Medical specific Titles mapped to UMLS concept Mapped automatically (1) partition title sentences into phrases (2) for each phrase (2a) direct concept match (UMLS API) (2b) if not found approx match (UMLS API) select the best concept
12
Semantically Pruned Concept Used 37 of 134 semantic types in UMLS Substance Chemical Hormone Gene or Genome Enzyme Cell Amino Acid, Peptide or Protein Neuroreactive Substance or Biogenic Amine... Goal : generalize semantic types not blinded to B-terms
13
Evaluation Number of Relevant Items Step 1: Find potentially relevant titles –any representation + synonyms –e.g. calcium channel blockers any word in { calcium, channel, blokers, blocker } Step 2: Verify each title –Not all relevant B-terms indicated relevant links –E.g. Timolol maleate, a beta blocker, in the treatment of common migraine headache calcium channel blocker 461 366
14
Evaluation - Metrics (1) Precision = (2) Recall = (3) Number of C B links identified (4) Feature space dimensionality Number of relevant B-terms Number of B-terms returned Number of relevant B-terms Number of relevant titles
15
Interpolated Precision
16
Number of Links Identified
17
Dimensionality
18
Future Work Extend to B A connections Use abstracts –dimensionality consequences Generalize –Raynaud’s disease and fish oil –other research questions
19
Conclusions Concept vs Words improved precision and recall more of the 11 connections in top 50 B-terms Semantic Pruning vs Concept degraded recall improved precision more of the 11 connections in top 50 B-terms
20
http://www.ics.uci.edu/~cblake Catherine Blake (cblake@ics.uci.edu) Wanda Pratt (wpratt@u.washington.edu)
21
References Davis, R (1989). The Creation of New Knowledge by Information Retrieval and Classification. Journal of Documentation 45(4) 273-301. Lindsay, R. K. and M. D. Gordon (1999). Literature-Based Discovery by Lexical Statistics. Journal of the American Society for Information Science 50(7): 574-587. Sanderson, M. (1999). Stop word list. Available at: http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/ Swanson, D. R. (1988). Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31: 526-557. Swanson, D. R. and N. R. Smalheiser (1997a). An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artifical Intelligence: 183-203. Weeber, M., Klein,H., Mork,J.G, Jong-van den Berg,L., Vos,R. (2000). Text- Based Discovery in Biomedicine: The Architecture of the DAD-system. AMIA.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.