Download presentation
Presentation is loading. Please wait.
Published byMicah Poulton Modified over 9 years ago
1
©2013 MFMER | slide-1 An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu BioASQ 2013 Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette
2
©2013 MFMER | slide-2 Outline Motivation & Task Incremental Systems MetaMap-based Search-based LLDA-based Experiment Setup Evaluation Conclusion
3
©2013 MFMER | slide-3 Motivation of BioASQ Task Reduce human effort in MeSH indexing Increasing number of new articles Low consistency among annotators [Funk and Reid] Automatic MeSH indexing Suggest MeSH terms for a given new article
4
©2013 MFMER | slide-4 Motivation of Mayo’s Participation Information retrieval (IR)-based ontology annotation Traditional approach has been information extraction-based Three levels of intelligence in artificial intelligence Knowledge-base intelligence Data intelligence User intelligence > Explore the use of topic modeling and distant supervision for ontology annotation
5
©2013 MFMER | slide-5 Proposed Approaches MetaMap-based Search-based LLDA-based Three approaches can work either independently or together in an incremental way DUI
6
©2013 MFMER | slide-6 MetaMap-based System Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort … CUI Candidates Score C00078471000 C03025921000 C0998265861 …… MetaMap Restricted to MeSH ontology … … …… …… ….. …… A ranked list of CUI => a ranked list of DUI A ranked list of CUI => a ranked list of DUI
7
©2013 MFMER | slide-7 MetaMap-based System Parameter Tuning Titles concepts are more important Low threshold roughly leads to high precision/recall Tradeoff between P/R
8
©2013 MFMER | slide-8 Search-based System Retrieval Model DUI Aggregation Docs D01, D02, D03 … D08, D03, D01 … D02, D03, D01 … DUI ranked by tf * score(Q, D)
9
©2013 MFMER | slide-9 Search-based System #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)
10
©2013 MFMER | slide-10 Search-based System Parameter Tuning Less smoothing => better performance A small set of highly relevant documents Tradeoff between P/R
11
©2013 MFMER | slide-11 Systems LLDA-based LDA Process Each document is a mixture of topics Each topic is a multinomial word distribution Labeled LDA Incorporate label information
12
©2013 MFMER | slide-12 Systems LLDA-based Top categories in MeSH … … Top-level categories as topics (e.g., Anatomy Category, Chemicals and Drugs Category, etc.) root Each label below is converted to corresponding top-level labels
13
©2013 MFMER | slide-13 Systems LLDA-based DUI candidate list pruning A pruned rank list doc Search-based LLDA-based Categories DUI
14
©2013 MFMER | slide-14 Data Training -- Testing -- input: output:
15
©2013 MFMER | slide-15 Evaluation MM: MetaMap-based system Mi: micro LCA: lowest common ancestor
16
©2013 MFMER | slide-16 Conclusion and Future Work Three Systems MetaMap-based, search-based, LLDA-based Research findings Explored impact of various parameter on performance Promising results from search-based labeling Future Direction Better concept weighting strategies E.g., corpus-level statistics, external resources Comprehensive comparisons with existing methods A better strategy for incorporating hierarchical info. Into LLDA
17
©2013 MFMER | slide-17 Questions & Discussion
18
©2013 MFMER | slide-18 Baseline: MetaMap-based Labeling CONCEPT WEIGHTING CONCEPT DETECTOIN 1.Concepts (K): phrases or terms mapping to UMLS CUI 2.List (L) of CUI (c) with confidence scores (S c ) 3.Negation information for each K 1.Select non-negated CUI (c), with score higher than threshold h 2.Merge & rank c with weighted scores as follows α -> weights assigned to T(itle) β -> weights assigned to A(bstract) 3. β fixed to 1.0 while optimizing α Converge high ranked list of c to MeSH Descriptor Unique Identifiers (DUI)
19
©2013 MFMER | slide-19 Incremental Labeling: Search-based Labeling 1 Index training set with Indri Retrieve MeSH for testing set Filter out words with a medical stoplist Extract stems with Porter stemmer Indexing fields including titles and abstracts Retrieve Model Retrieve Model w i -> weights for ith matched query term q i f(q i,D) -> the query term matching function defined as: |D| and |C|: length of documents and collections tf qi, D & tf qi, C : document & collection term frequencies of q i μ : the Dirichlet smoothing parameter Query Formulation Result Aggregation
20
©2013 MFMER | slide-20 Search-based Labeling 2 Index training set with Indri Retrieve MeSH for testing set Retrieve Model Retrieve Model K T : terms in title extracted by MetaMap K A : terms in abstract likewise Query Formulation Result Aggregation Long Query (LQ) Phrase Query (PQ) Term Query (TQ) TQ Example: PQ Example: Longer query than phrase, order & proximity considered PQ: consider collocations
21
©2013 MFMER | slide-21 Parameter Explorations 2 Parameter setting for MetaMap-based Labeling a)Figure a shows the higher weights for Title, the better the results b)Figure b shows the best CI threshold at 600 c)Figure c shows recall is proportional to the number of DUI while precision is anti-proportional
22
©2013 MFMER | slide-22 Parameter Explorations 3 Parameter setting for MetaMap-based Labeling a)Figure d: more smoothing hurts the performance b)Figure e: best results come from number of top documents is 20 c)Figure f: similar to figure c, recall is proportional to the number of DUI while precision is anti-proportional
23
©2013 MFMER | slide-23 θdθd θdθd L mes h L mes h w w α α z z γ γ ψ ψ N D Incremental filtering with Labeled Latent Dirichlet Allocations (LLDA) Generative Story: 1)A generative topic model 2)Both α and ψ play the role of prior for topic generations 3)Θ d generates document topics tuned by both α and Mesh labels L 4)Word topic distribution γ and doc topic z d generate word w i Training and Testing Training: Parameter estimation with Gibbs Sampling for Θ and γ using 10% of provided PubMED corpus. Testing: The trained model suggests multiple mesh terms for testing data Filtering: Utilizing suggested mesh term sets to filter out results obtained from search- based labeling LLDA
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.