1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
2 KMeD A Knowledge-Based Multimedia Medical Distributed Database System A Cooperative, Spatial, Evolutionary Medical Database System Knowledge-Based Image Retrieval with Spatial and Temporal Constructs Wesley W. ChuComputer Science Department Alfonso F. CardenasComputer Science Department Ricky K. TairaDepartment of Radiological Sciences October 1, 1991 to September 30, 1993 July 1, 1993 to June 30, 1997 May 1, 1997 to April 30, 2001
3 Research Team Students John David N. Dionisio Chih-Cheng Hsu David Johnson Christine Chih Collaborators Computer Science Department Alfonso F. Cardenas UCLA Medical School Denise Aberle, MD Robert Lufkin, MD Ricky K. Taira, MD
4 A NIH Grant at UCLA ( ) A Medical Digital library---A Digital File Room for Patient Care, Education, and Research Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD Bernard Churchill, MD
5 Significance Query multimedia data based on image content and spatial predicates Use domain knowledge to relax and interpret medical queries Present integrated view of multiple temporal and evolutionary data in a timeline metaphor Retrieve Scenario Specific Free-text documents in a Medical Digital Library
6 Overview Image retrieval by feature and content Query relaxation Spatial query answering Similarity query answering Visual query interface Timeline interface Retrieval of scenario specific free text medical documents
7 Image Retrieval by Content Features size, shape, texture, density, histology Spatial Relations angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission
8
9
10
11
12 Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise
13 OO’ 01 Om OO 01 On Evolution: Object O evolves into a new object O’ Fusion: Object 01, …, Om fuse into a new object Fission: Object O splits into object 01, …, On Representing of Temporal and Evolution Objects
14 Case a: Case c: The object exists with its supertype or aggregated type. The life span of the object starts with and ends before its supertype or aggregated type. Case b: Case d: The life span of the object starts after and ends with its supertype or aggregated type. The life span of the object starts after and ends before its supertype or aggregated type. Representing of Temporal and Evolution Objects (cont)
15 Lesion Micro- Lesion Micro- Lesion An Example of Temporal and Evolution Object
16
17
18 Spatial Distance and Angle of Coverage of Two Objects
19
20 Query Modification Techniques Relaxation Generalization Specialization Association
21 Generalization and Specialization More Conceptual Query Specific Query Conceptual Query Specific Query Generalization Specialization Generalization Specialization
22 Type Abstraction Hierarchy Presents abstract view of Types Attribute values Image features Temporal and evolutionary behavior Spatial relationships among objects Provides multi-level knowledge representation
23 TAH Generation for Numerical Attribute Values Relaxation Error Difference between the exact value and the returned approximate value The expected error is weighted by the probability of occurrence of each value DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data
24 TAH Generation for Numerical Attribute Values (cont.) Computation Complexity: O(n 2 ), where n is the number of distinct value in a cluster DISC performs better than Biggest Cap (value only) or Max Entropy (frequency only) methods MDISC is developed for multiple attribute TAHs. Computation Complexity: O(mn 2 ), where m is the number of attributes
25 Query Relaxation Relax Attribute Query Yes Display Query Modification Answers Database TAHs No
26 An Cooperative Query Answering Example Query Find the treatment used for the tumor similar-to (loc, size) X 1 on 12 year-old Korean males. Relaxed Query Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment.
27 Type Abstraction Hierarchies for Medical Domain Age Preteens TeenAdult Ethnic Group Asian Korean Chinese Japanese Filipino AfricanEuropean Tumor (location, size) Class X [loc 1 loc 3 ] [s 1 s 3 ] Class Y [loc Y s Y ] X 1 [loc 1 s 1 ] X 2 [loc 2 s 2 ] X 3 [loc 3 s 3 ]
28 Knowledge-Based Image Model Representation Level (features and contents) Brain Tumor Lateral Ventricle TAH SR(t,b) TAH Tumor Size TAH SR(t,l) TAH Lateral Ventricle SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge Level Schema Level SR(t,b) SR(t,l)
29 Queries Query Analysis and Feature Selection Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers Knowledge-based Query Processing
30 User Model To customize query conditions and knowledge- based query processing User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match
31 User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking
32
33
34 Query Preprocessing Segment and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a Type Abstraction Hierarchy (TAH)
35 Similarity Query Answering Determine relevant features based on query input Select TAH based on these features Traverse through the TAH nodes to match all the images with similar features in the database Present the images and rank their similarity (e.g., by mean square error)
36 Spatial Query Answering Preprocessing Draw and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a type abstraction hierarchy (TAH)
37 Spatial Query Answering (cont.) Processing Select TAH based on t he query conditions and context Search nodes to match the query conditions Return images linked to the TAH node
38 Similarity Query Answering Preprocessing Select objects and specify features of interest in the image Create a feature database of the selected objects for all images Classify the feature databases as type abstraction hierarchies
39 Similarity Query Answering (cont.) Processing Determine relevant features based on query input Select TAH based on these features (interact with user to resolve ambiguity) Traverse through the TAH nodes to match all the images with similar features in the databases Present the images and rank their similarity (e.g., by mean square error)
40
41 Visual Query Language and Interface Point-click-drag interface Objects may be represented iconically Spatial relationships among objects are represented graphically
42 Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture
43
44
45
46
47
48
49 A Visual Query Example
50 A Visual Temporal Query Example
51
52
53 Implementation Sun Sparc 20 workstations (128 MB RAM, 24-bit frame buffer) Oracle Database Management System X/Motif Development Environment, C++ Mass Storage of Images (9 GB)
54
55
56
57
58 Summary I Image retrieval by feature and content Matching and relaxation images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in a timeline metaphor
59 A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library
60 NIH Program Project Grant ( ) A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty Project 1-- teleradaiology infrastructure Project 2-- neuroradiology workstation Project 3-- multimedia information architecture Project 4-- natural language processing for medical reports Project 5-- medical digital library
61 Project 5 Personnel Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Project leader: Wesley W. Chu
62 Data in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text Patient reports Teaching files Literature News articles
63 System Overview Patient reports Medical literature Medical Digital Library (MDL) Teaching materials Query results Ad-hoc query Patient report for content correlation News Articles
64 Treatment- related articles ??? How to treat the disease Diagnosis- related articles ??? How to diagnose the disease Scenario Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …
65 Challenge I: Indexing Extracting domain-specific key concepts in the free text for indexing Free-text: Lung cancer, small cell, stage II Concept terms in knowledge source: stage II small cell lung cancer Conventional methods use NLP Not scalable Cannot adapt to various forms of word permutation
66 Challenge II: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … ? √ Query: lung cancer, chest x-ray, bronchography, …
67 Challenge III: Mismatching between terms used in query and documents Example Query: … lung cancer, … Document 3: anti-cancer drug combinations… ? ? ? Document 1: … lung carcinoma … Document 2: … lung neoplasm …
68 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents
69 IndexFinder: Extracting domain-specific key concepts Technique Permute words from text to generate concept candidates. Use knowledge base to select the valid candidates. Problem Valid candidates may be irrelevant to specific domain indexing.
70 Eliminating irrelevant concepts Syntactic filter: Limit permutation of words within a sentence. Semantic filter: Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts Use ISA relationship to filter out general concepts and yield specific concepts.
71 IndexFinder Performance Two orders of magnitude faster than conventional approaches No NLP Knowledge base (UMLS) and index files are resided in main memory Time complexity is linear with the number of distinct words in the text Preliminary Evaluation IndexFinder generates 4% more concepts than conventional approaches (using a single noun phrase) All concepts are relevant
72 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents
73 Query Expansion (QE) Queries in the following form benefit from expansion: + e.g. lung cancer e.g. diagnosis options + e.g. lung cancer e.g. chest x-ray, bronchography expansion
74 Traditional QE Appends all terms that statistically co-occur with the key terms in the query Not semantically focused Original Query: lung cancer, diagnosis options expansion Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate
75 Knowledge-based QE Knowledge source (UMLS, by the NLM) diagnoses Concept Disease or Syndrome Diagnostic Procedure Sign or Symptom Pharmacologic Substance lung cancer chest x-ray Semantic Type Key concept Specific supporting concepts A class of concepts that belong to a Semantic Type Body Parts Injury or Poisoning Semantic Network Metathesaurus diagnoses
76 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents
77 Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drug combinations … Phrase-based Vector Space Model (VSM) Query: … lung cancer, … ? Knowledge-source lung cancer = lung carcinoma … √ lung neoplasm … parent_of √ anti-cancer drug combinations missing!!! Query: … lung cancer, … √ ??
78 Phrase-based VSM Examples Query Document [(C ); “lung” “cancer”] … [(C ); “anti” “cancer” “drug” “combin”] … Query: “lung cancer …” Phrases: [(C ); “lung” “cancer”]… Document: “anti-cancer drug combinations …” Phrases: [(C ); “anti” “cancer” “drug” “combin”]…
79 Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16% 100 queries vs. 5% 50 queries
80 System Overview Patient reports Medical literature Medical Digital Library (MDL) Teaching materials Query results Ad-hoc query Patient report for content correlation News Articles
81 Application: Query Answering via Templates Sample templates: “, treatment,” “, diagnosis ” Query Expansion … Template: “, treatment” lung cancer radiotherapy chemotherapy cisplatin relevant documents IndexFinder lung cancer, treatment Phrase-based VSM
82 Application: Scenario Specific Content Correlation Query Templates Scenario Selection e.g. treatment, diagnosis, etc. Patient Report Query Expansion … relevant documents Phrase-based VSM IndexFinder
83 Summary of MDL Knowledge based (UMLS) approach provides scenario- specific medical free-text retrieval IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents Phrase based indexing – transform document indexing into phrase paradigm (concept and its word stems) to improve retrieve effectiveness
84 Acknowledgement This research is supported in part by NIC/NIH Grant#
85 Indexing of free text Clinical text Prostate, right (biopsy) - fibromuscular and glandular hyperplasia C :biopsy prostate >>T060:Diagnostic Procedure C :prostate hyperplasia >>T046:Pathologic Function C :right >>T080:Qualitative Concept C :hyperplasia fibromuscular >>T046:Pathologic Function C :hyperplasia glandular >>T046:Pathologic Function Concepts The problem: Extract key terms from free text. Represent in standard concept terms (e.g. UMLS concepts) Concept types
86 Extracting domain-specific key concepts Conventional approach Use NLP to discover noun phrases. Map each noun phrase into concepts. Problems A concept that is contained in a noun phrase will not be discovered. Difficult to scale to large text.
87 Generate concept candidates from free text Sort the concept terms (phrases) in the knowledge base (UMLS) by their length and assign each phrase a unique ID. Create an inverted index for the word(s) used in the phrases; each word has a list of phrase IDs. To generate a concept candidate: Remove replicated words. Based on the list of phrase IDs of each word, aggregate the occurrence of each phrase ID. The phrases with ID occurrences that are equal to their phrase lengths are the concept candidates.
88 Demo Test Texts Technically successful left lower lobe nodule biopsy. Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus. CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule. Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted. There may be a tiny left apical air collection in the pleural space lateral to the apical bulla. Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy.
89 References 1.Yuri L. Zieman and Howard L. Bleich. Conceptual Mapping of User’s Queries to Medical Subject Headings. Proc AMIA Suresh Srinivasan, Thomas C. Rindflesch, William T. Hole, Alan R. Aronson, and James G. Mork. Finding UMLS Metathesaurus Concepts in MEDLINE. Proc AMIA Alan R. Aronson, Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. Proc AMIA Joshua C. Denny, Jeffrey D. Smithers, Anderson Spickard, III, Randolph A. Miller. A New Tool to Identify Key Biomedical Concepts in Text Documents. Proc AMIA National Library of Medicine. Documentation, UMLS Knowledge Sources, 14 th Edition, January Elkin PL, Cimino JJ, Lowe HJ, Aronow DB, Payne TH, Pincetl PS and Barnett GO. Mapping to MeSH: The art of trapping MeSH equivalence from within narrative text. Proc 12th SCAMC, , Tuttle MS, Olson NE, Keck KD, Cole WG, Erlbaum MS, Sherertz DD et al. Metaphrase: an aid to the clinical conceptualization and formalization of patient problems in healthcare enterprises. Methods Inf Med Nov;37(4-5): Hole W. T, Srinivasan S. Discovering Missed Synonymy in a Large Concept-Oriented Metathesaurus. Proc AMIA Symp 2000: Morioka CA, El-Saden S, Duckwiler, G. et al, Workflow Management of HIS/RIS Textual Documents with PACS Image Studies for Neuroradiology, Proc AMIA Symp 2003 (submitted for publication).
90 Performance Comparison Corpus: OHSUMED, 41 queries
91 Traditional QE Statistical-based Any terms that statistically co-occur with the original query terms are appended Not semantically focused May expand terms irrelevant to the “treatment” of “lung cancer” e.g. “survival,” “survival rate,” …
92 Document Retrieval Find free-text documents to answer queries like: “Hyperthermia, leukocytosis, increased intracranial pressure, and central herniation.” “Cerebral edema secondary to infection, diagnosis and treatment.”
93 Vector Space Model (VSM) Leukocytosis Hyperthermia Words as terms d q d q
94 Stem-based VSM Morphological variants bear similar content E.g., “edema” and “edemas” Use stemmer to extract stems Lovins stemmer and Porter stemmer Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… Stems: “hypertherm”, “leukocytos”, “increas”, “intracran”, “pressur”… Baseline of comparison
95 Shortcomings of Stem-based VSM Inability to capture multi-word concepts 1. “Increased intracranial pressure” Inability to utilize the relations between concepts: 2. Synonyms: “hyperthermia” and “fever” 3. IS-A relation: “hyperthermia” and “body temperature elevation”
96 Concept-based VSM Uses concepts in knowledge base (KB) as terms KB: Metathesaurus in UMLS Captures multi-word concepts Captures synonyms Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… CUIs: (C ), (C ), (C )…
97 Shortcomings of Concept-based VSM Concepts may be related: E.g. “hyperthermia” and “body temperature elevation” are not identical but related concepts Need to quantify conceptual relations Knowledge bases are often incomplete, which reduces the retrieval effectiveness
98 Shortcomings of Concept-based VSM (cont’d) Concepts may be related: The conceptual similarity measure, s(c i,c j ), quantifies relations between concepts. Knowledge bases are often incomplete, which reduces the retrieval effectiveness.
99 Incompleteness of the Knowledge Bases Missing concepts in KB, e.g., “Infiltrative small bowel process” (), (C ), () In general, concept-based VSM cannot outperform stem-based VSM (cerebral edema)(cerebral lesion) Missing links between related concepts, e.g.,
100 To Compare Retrieval Effectiveness The test set: OHSUMED 106 queries, 14K documents Expert relevance judgment: R or N Retrieval effectiveness: Recall – the percentage of relevant documents retrieved so far Precision – the percentage of retrieved documents that are relevant
101 Evaluation of Phrase-based Document Similarity Due to the conceptual similarity s(c i,c j ) between concepts in p q and p d Due to the stem overlap in p q and p d
102
103
104 Semi-Automatic Segmentation of Lung Tumors classification seed estimation seed estimation adaptive fusion region growing region growing tumor segment tumor segment interesting area
105
106
107
108
109
110
111
112