1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Free-text Medical Document Retrieval via Phrase-based Vector Space Model Wenlei Mao, MS and Wesley W. Chu, PhD and Computer.
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA
Chapter 5: Introduction to Information Retrieval
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
Information Retrieval in Practice
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
SWE 423: Multimedia Systems Chapter 4: Graphics and Images (4)
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA
Recommender systems Ram Akella November 26 th 2008.
Methodology Conceptual Database Design
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Search Engines and Information Retrieval Chapter 1.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Workshop on The Transformation of Science Max Planck Society, Elmau, Germany June 1, 1999 TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Information Retrieval
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
W. Scott Campbell, Ph.D., MBA University of Nebraska Medical Center
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Introduction Multimedia initial focus
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Personalized Social Image Recommendation
A Knowledge-based Medical Digital Library
CS 698 | Current Topics in Data Science
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Chapter 5: Information Retrieval and Web Search
Ying Dai Faculty of software and information science,
Panagiotis G. Ipeirotis Luis Gravano
CHAPTER 7: Information Visualization
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles

2 KMeD A Knowledge-Based Multimedia Medical Distributed Database System A Cooperative, Spatial, Evolutionary Medical Database System Knowledge-Based Image Retrieval with Spatial and Temporal Constructs Wesley W. ChuComputer Science Department Alfonso F. CardenasComputer Science Department Ricky K. TairaDepartment of Radiological Sciences October 1, 1991 to September 30, 1993 July 1, 1993 to June 30, 1997 May 1, 1997 to April 30, 2001

3 Research Team Students John David N. Dionisio Chih-Cheng Hsu David Johnson Christine Chih Collaborators Computer Science Department Alfonso F. Cardenas UCLA Medical School Denise Aberle, MD Robert Lufkin, MD Ricky K. Taira, MD

4 A NIH Grant at UCLA ( ) A Medical Digital library---A Digital File Room for Patient Care, Education, and Research Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD Bernard Churchill, MD

5 Significance Query multimedia data based on image content and spatial predicates Use domain knowledge to relax and interpret medical queries Present integrated view of multiple temporal and evolutionary data in a timeline metaphor Retrieve Scenario Specific Free-text documents in a Medical Digital Library

6 Overview Image retrieval by feature and content Query relaxation Spatial query answering Similarity query answering Visual query interface Timeline interface Retrieval of scenario specific free text medical documents

7 Image Retrieval by Content Features size, shape, texture, density, histology Spatial Relations angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission

8

9

10

11

12 Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise

13 OO’ 01 Om OO 01 On Evolution: Object O evolves into a new object O’ Fusion: Object 01, …, Om fuse into a new object Fission: Object O splits into object 01, …, On Representing of Temporal and Evolution Objects

14 Case a: Case c: The object exists with its supertype or aggregated type. The life span of the object starts with and ends before its supertype or aggregated type. Case b: Case d: The life span of the object starts after and ends with its supertype or aggregated type. The life span of the object starts after and ends before its supertype or aggregated type. Representing of Temporal and Evolution Objects (cont)

15 Lesion Micro- Lesion Micro- Lesion An Example of Temporal and Evolution Object

16

17

18 Spatial Distance and Angle of Coverage of Two Objects

19

20 Query Modification Techniques Relaxation Generalization Specialization Association

21 Generalization and Specialization More Conceptual Query Specific Query Conceptual Query Specific Query Generalization Specialization Generalization Specialization

22 Type Abstraction Hierarchy Presents abstract view of Types Attribute values Image features Temporal and evolutionary behavior Spatial relationships among objects Provides multi-level knowledge representation

23 TAH Generation for Numerical Attribute Values Relaxation Error Difference between the exact value and the returned approximate value The expected error is weighted by the probability of occurrence of each value DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

24 TAH Generation for Numerical Attribute Values (cont.) Computation Complexity: O(n 2 ), where n is the number of distinct value in a cluster DISC performs better than Biggest Cap (value only) or Max Entropy (frequency only) methods MDISC is developed for multiple attribute TAHs. Computation Complexity: O(mn 2 ), where m is the number of attributes

25 Query Relaxation Relax Attribute Query Yes Display Query Modification Answers Database TAHs No

26 An Cooperative Query Answering Example Query Find the treatment used for the tumor similar-to (loc, size) X 1 on 12 year-old Korean males. Relaxed Query Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment.

27 Type Abstraction Hierarchies for Medical Domain Age Preteens TeenAdult Ethnic Group Asian Korean Chinese Japanese Filipino AfricanEuropean Tumor (location, size) Class X [loc 1 loc 3 ] [s 1 s 3 ] Class Y [loc Y s Y ] X 1 [loc 1 s 1 ] X 2 [loc 2 s 2 ] X 3 [loc 3 s 3 ]

28 Knowledge-Based Image Model Representation Level (features and contents) Brain Tumor Lateral Ventricle TAH SR(t,b) TAH Tumor Size TAH SR(t,l) TAH Lateral Ventricle SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge Level Schema Level SR(t,b) SR(t,l)

29 Queries Query Analysis and Feature Selection Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers Knowledge-based Query Processing

30 User Model To customize query conditions and knowledge- based query processing User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match

31 User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking

32

33

34 Query Preprocessing Segment and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a Type Abstraction Hierarchy (TAH)

35 Similarity Query Answering Determine relevant features based on query input Select TAH based on these features Traverse through the TAH nodes to match all the images with similar features in the database Present the images and rank their similarity (e.g., by mean square error)

36 Spatial Query Answering Preprocessing Draw and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a type abstraction hierarchy (TAH)

37 Spatial Query Answering (cont.) Processing Select TAH based on t he query conditions and context Search nodes to match the query conditions Return images linked to the TAH node

38 Similarity Query Answering Preprocessing Select objects and specify features of interest in the image Create a feature database of the selected objects for all images Classify the feature databases as type abstraction hierarchies

39 Similarity Query Answering (cont.) Processing Determine relevant features based on query input Select TAH based on these features (interact with user to resolve ambiguity) Traverse through the TAH nodes to match all the images with similar features in the databases Present the images and rank their similarity (e.g., by mean square error)

40

41 Visual Query Language and Interface Point-click-drag interface Objects may be represented iconically Spatial relationships among objects are represented graphically

42 Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture

43

44

45

46

47

48

49 A Visual Query Example

50 A Visual Temporal Query Example

51

52

53 Implementation Sun Sparc 20 workstations (128 MB RAM, 24-bit frame buffer) Oracle Database Management System X/Motif Development Environment, C++ Mass Storage of Images (9 GB)

54

55

56

57

58 Summary I Image retrieval by feature and content Matching and relaxation images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in a timeline metaphor

59 A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library

60 NIH Program Project Grant ( ) A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty Project 1-- teleradaiology infrastructure Project 2-- neuroradiology workstation Project 3-- multimedia information architecture Project 4-- natural language processing for medical reports Project 5-- medical digital library

61 Project 5 Personnel Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Project leader: Wesley W. Chu

62 Data in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text Patient reports Teaching files Literature News articles

63 System Overview Patient reports Medical literature Medical Digital Library (MDL) Teaching materials Query results Ad-hoc query Patient report for content correlation News Articles

64 Treatment- related articles ??? How to treat the disease Diagnosis- related articles ??? How to diagnose the disease Scenario Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …

65 Challenge I: Indexing Extracting domain-specific key concepts in the free text for indexing Free-text: Lung cancer, small cell, stage II Concept terms in knowledge source: stage II small cell lung cancer Conventional methods use NLP Not scalable Cannot adapt to various forms of word permutation

66 Challenge II: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … ? √ Query: lung cancer, chest x-ray, bronchography, …

67 Challenge III: Mismatching between terms used in query and documents Example Query: … lung cancer, … Document 3: anti-cancer drug combinations… ? ? ? Document 1: … lung carcinoma … Document 2: … lung neoplasm …

68 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents

69 IndexFinder: Extracting domain-specific key concepts Technique Permute words from text to generate concept candidates. Use knowledge base to select the valid candidates. Problem Valid candidates may be irrelevant to specific domain indexing.

70 Eliminating irrelevant concepts Syntactic filter: Limit permutation of words within a sentence. Semantic filter: Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts Use ISA relationship to filter out general concepts and yield specific concepts.

71 IndexFinder Performance Two orders of magnitude faster than conventional approaches No NLP Knowledge base (UMLS) and index files are resided in main memory Time complexity is linear with the number of distinct words in the text Preliminary Evaluation IndexFinder generates 4% more concepts than conventional approaches (using a single noun phrase) All concepts are relevant

72 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents

73 Query Expansion (QE) Queries in the following form benefit from expansion: + e.g. lung cancer e.g. diagnosis options + e.g. lung cancer e.g. chest x-ray, bronchography expansion

74 Traditional QE Appends all terms that statistically co-occur with the key terms in the query Not semantically focused Original Query: lung cancer, diagnosis options expansion Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate

75 Knowledge-based QE Knowledge source (UMLS, by the NLM) diagnoses Concept Disease or Syndrome Diagnostic Procedure Sign or Symptom Pharmacologic Substance lung cancer chest x-ray Semantic Type Key concept Specific supporting concepts A class of concepts that belong to a Semantic Type Body Parts Injury or Poisoning Semantic Network Metathesaurus diagnoses

76 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents

77 Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drug combinations … Phrase-based Vector Space Model (VSM) Query: … lung cancer, … ? Knowledge-source lung cancer = lung carcinoma … √ lung neoplasm … parent_of √ anti-cancer drug combinations missing!!! Query: … lung cancer, … √ ??

78 Phrase-based VSM Examples Query Document [(C ); “lung” “cancer”] … [(C ); “anti” “cancer” “drug” “combin”] … Query: “lung cancer …” Phrases: [(C ); “lung” “cancer”]… Document: “anti-cancer drug combinations …” Phrases: [(C ); “anti” “cancer” “drug” “combin”]…

79 Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16% 100 queries vs. 5% 50 queries

80 System Overview Patient reports Medical literature Medical Digital Library (MDL) Teaching materials Query results Ad-hoc query Patient report for content correlation News Articles

81 Application: Query Answering via Templates Sample templates: “, treatment,” “, diagnosis ” Query Expansion … Template: “, treatment” lung cancer radiotherapy chemotherapy cisplatin relevant documents IndexFinder lung cancer, treatment Phrase-based VSM

82 Application: Scenario Specific Content Correlation Query Templates Scenario Selection e.g. treatment, diagnosis, etc. Patient Report Query Expansion … relevant documents Phrase-based VSM IndexFinder

83 Summary of MDL Knowledge based (UMLS) approach provides scenario- specific medical free-text retrieval IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents Phrase based indexing – transform document indexing into phrase paradigm (concept and its word stems) to improve retrieve effectiveness

84 Acknowledgement This research is supported in part by NIC/NIH Grant#

85 Indexing of free text Clinical text Prostate, right (biopsy) - fibromuscular and glandular hyperplasia C :biopsy prostate >>T060:Diagnostic Procedure C :prostate hyperplasia >>T046:Pathologic Function C :right >>T080:Qualitative Concept C :hyperplasia fibromuscular >>T046:Pathologic Function C :hyperplasia glandular >>T046:Pathologic Function Concepts The problem: Extract key terms from free text. Represent in standard concept terms (e.g. UMLS concepts) Concept types

86 Extracting domain-specific key concepts Conventional approach Use NLP to discover noun phrases. Map each noun phrase into concepts. Problems A concept that is contained in a noun phrase will not be discovered. Difficult to scale to large text.

87 Generate concept candidates from free text Sort the concept terms (phrases) in the knowledge base (UMLS) by their length and assign each phrase a unique ID. Create an inverted index for the word(s) used in the phrases; each word has a list of phrase IDs. To generate a concept candidate: Remove replicated words. Based on the list of phrase IDs of each word, aggregate the occurrence of each phrase ID. The phrases with ID occurrences that are equal to their phrase lengths are the concept candidates.

88 Demo Test Texts Technically successful left lower lobe nodule biopsy. Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus. CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule. Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted. There may be a tiny left apical air collection in the pleural space lateral to the apical bulla. Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy.

89 References 1.Yuri L. Zieman and Howard L. Bleich. Conceptual Mapping of User’s Queries to Medical Subject Headings. Proc AMIA Suresh Srinivasan, Thomas C. Rindflesch, William T. Hole, Alan R. Aronson, and James G. Mork. Finding UMLS Metathesaurus Concepts in MEDLINE. Proc AMIA Alan R. Aronson, Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program. Proc AMIA Joshua C. Denny, Jeffrey D. Smithers, Anderson Spickard, III, Randolph A. Miller. A New Tool to Identify Key Biomedical Concepts in Text Documents. Proc AMIA National Library of Medicine. Documentation, UMLS Knowledge Sources, 14 th Edition, January Elkin PL, Cimino JJ, Lowe HJ, Aronow DB, Payne TH, Pincetl PS and Barnett GO. Mapping to MeSH: The art of trapping MeSH equivalence from within narrative text. Proc 12th SCAMC, , Tuttle MS, Olson NE, Keck KD, Cole WG, Erlbaum MS, Sherertz DD et al. Metaphrase: an aid to the clinical conceptualization and formalization of patient problems in healthcare enterprises. Methods Inf Med Nov;37(4-5): Hole W. T, Srinivasan S. Discovering Missed Synonymy in a Large Concept-Oriented Metathesaurus. Proc AMIA Symp 2000: Morioka CA, El-Saden S, Duckwiler, G. et al, Workflow Management of HIS/RIS Textual Documents with PACS Image Studies for Neuroradiology, Proc AMIA Symp 2003 (submitted for publication).

90 Performance Comparison Corpus: OHSUMED, 41 queries

91 Traditional QE Statistical-based Any terms that statistically co-occur with the original query terms are appended Not semantically focused May expand terms irrelevant to the “treatment” of “lung cancer” e.g. “survival,” “survival rate,” …

92 Document Retrieval Find free-text documents to answer queries like: “Hyperthermia, leukocytosis, increased intracranial pressure, and central herniation.” “Cerebral edema secondary to infection, diagnosis and treatment.”

93 Vector Space Model (VSM) Leukocytosis Hyperthermia Words as terms d  q  d q

94 Stem-based VSM Morphological variants bear similar content E.g., “edema” and “edemas” Use stemmer to extract stems Lovins stemmer and Porter stemmer Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… Stems: “hypertherm”, “leukocytos”, “increas”, “intracran”, “pressur”… Baseline of comparison

95 Shortcomings of Stem-based VSM Inability to capture multi-word concepts 1. “Increased intracranial pressure” Inability to utilize the relations between concepts: 2. Synonyms: “hyperthermia” and “fever” 3. IS-A relation: “hyperthermia” and “body temperature elevation”

96 Concept-based VSM Uses concepts in knowledge base (KB) as terms KB: Metathesaurus in UMLS Captures multi-word concepts Captures synonyms Query: “Hyperthermia, leukocytosis, increased intracranial pressure”… CUIs: (C ), (C ), (C )…

97 Shortcomings of Concept-based VSM Concepts may be related: E.g. “hyperthermia” and “body temperature elevation” are not identical but related concepts Need to quantify conceptual relations Knowledge bases are often incomplete, which reduces the retrieval effectiveness

98 Shortcomings of Concept-based VSM (cont’d) Concepts may be related: The conceptual similarity measure, s(c i,c j ), quantifies relations between concepts. Knowledge bases are often incomplete, which reduces the retrieval effectiveness.

99 Incompleteness of the Knowledge Bases Missing concepts in KB, e.g., “Infiltrative small bowel process” (), (C ), () In general, concept-based VSM cannot outperform stem-based VSM (cerebral edema)(cerebral lesion) Missing links between related concepts, e.g.,

100 To Compare Retrieval Effectiveness The test set: OHSUMED 106 queries, 14K documents Expert relevance judgment: R or N Retrieval effectiveness: Recall – the percentage of relevant documents retrieved so far Precision – the percentage of retrieved documents that are relevant

101 Evaluation of Phrase-based Document Similarity Due to the conceptual similarity s(c i,c j ) between concepts in p q and p d Due to the stem overlap in p q and p d

102

103

104 Semi-Automatic Segmentation of Lung Tumors classification seed estimation seed estimation adaptive fusion region growing region growing tumor segment tumor segment interesting area

105

106

107

108

109

110

111

112