Download presentation
Presentation is loading. Please wait.
Published byRudolph Sherman Modified over 6 years ago
1
Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns
Terminology Quality Evaluation S60 Rashmie Abeysinghe Joint work with Michael A. Brooks, Jeffery Talbert, Licong Cui University of Kentucky
2
Disclosure Licong Cui is part of the startup called Synamtics Inc.
AMIA | amia.org
3
Outline NCI Thesaurus Terminology Quality Assurance
Non-lattice Subgraphs Structural-Lexical Features Containment Union Intersection Union-Intersection Inference-Union Inference-Contradiction Results Evaluation Conclusion and Future Directions AMIA | amia.org
4
NCI Thesaurus (NCIt) National Cancer Institute (NCI) Thesaurus
First published in 2000 Contains over 118,000 concepts Hierarchically organized in 19 domains Abnormal Cell Anatomic Structure, System, or Substance Biological Process Disease, Disorder or Finding Molecular Abnormality etc. maintained by a multidisciplinary team of editors. 900 concepts added each month. covers terminology for clinical care, translational and basic research, public information and administrative activities. AMIA | amia.org
5
Terminology Quality Assurance (TQA)
Essential part of terminology management lifecycle Manual review: labor-intensive and time-consuming Automating TQA is an active area of research Missing Relation! AMIA | amia.org
6
Non-lattice Subgraphs
Lattice – a desirable property for a well-formed terminology* Lattice – a DAG such that any two nodes have a unique maximal common descendant as well as a unique minimal common ancestor A non-lattice subgraph Upper Bounds (U) Lower Bounds (L) *Zhang GQ, Bodenreider O. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT. AMIA Annual Symposium Proc. 2010; AMIA | amia.org
7
Structural-Lexical Features
Considering the label of a concept as a set of words in lower case: Containment*: Union*: Intersection*: Union-Intersection*: Inference-Union: Inference-Contradiction 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 *Cui L, Zhu W, Tao S, Case JT, Bodenreider O, Zhang GQ. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. JAMIA Jul 1;24(4): AMIA | amia.org
8
Containment 𝐿 𝑗 ⊂ 𝐿 𝑖 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 𝐿 𝑖 𝐿 𝑗
𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 Non-lattice subgraph 𝐿 𝑗 ⊂ 𝐿 𝑖 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org
9
Containment 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 Suggested Fix AMIA | amia.org
10
Union 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 Non-lattice subgraph
malignant, testicular, non-seminomatous, germ, cell, tumor 𝐿 𝑘 AMIA | amia.org
11
Union 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 Suggested Fix AMIA | amia.org
12
Intersection 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 𝑈 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝐿 𝑖 𝐿 𝑗
𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 Non-lattice subgraph 𝑈 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = splenic, lymphoblastic, lymphoma 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org
13
Intersection 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 Suggested Fix AMIA | amia.org
14
Union-Intersection 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 =
Non-lattice subgraph 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩ 𝐿 𝑡 = localized, adult liver, carcinoma localized, adult liver, carcinoma 𝐿 𝑠 𝐿 𝑡 AMIA | amia.org
15
Union-Intersection 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 Suggested Fix
AMIA | amia.org
16
Inference-Union =𝐿 𝑖 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 𝑈 𝑠 𝐿 𝑖 ∩ 𝐿 𝑗 =
Non-lattice subgraph 𝑈 𝑠 𝐿 𝑖 ∩ 𝐿 𝑗 = gallbladder, papillary 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= gallbladder, papillary, neoplasm =𝐿 𝑖 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org
17
Inference-Union 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 Suggested Fix
AMIA | amia.org
18
Inference-Contradiction
Non-lattice subgraph anaplastic : neoplastic large anaplastic : neoplastic large AMIA | amia.org
19
Inference-Contradiction
Suggested Fix AMIA | amia.org
20
Five Patterns! Union, Union-Intersection, Inference-Union, Inference-Contradiction, Containment AMIA | amia.org
21
Results In total 8,143 non-lattice subgraphs were identified
809 of those exhibited lexical patterns 678 single patterns 131 multiple patterns AMIA | amia.org
22
Evaluation AMIA | amia.org
23
Evaluation Single-pattern non-lattice subgraphs: 44%
Multiple-pattern non-lattice subgraphs: 88% Overall: 66% AMIA | amia.org
24
Conclusion We investigated a hybrid approach to identifying potential errors in NCIt Remediations were automatically suggested An effective way for error detection and correction Applicable to other biomedical terminologies AMIA | amia.org
25
Future Work Investigate larger non-lattice subgraphs for evaluation
Using concept synonyms to complement concept labels Finding new patterns to uncover more errors AMIA | amia.org
26
Acknowledgement This work was supported by
National Institutes of Health National Center for Advancing Translational Sciences through grant UL1TR001998 National Science Foundation through grant IIS I would like to thank Dr. Licong Cui for the guidance AMIA | amia.org
27
Email me at: rashmie.abeysinghe@uky.edu
Thank you! me at:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.