Download presentation
Presentation is loading. Please wait.
1
A Knowledge-based Medical Digital Library
9/16/2018 A Knowledge-based Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA SEDE06
2
Data in a Medical Digital Library
9/16/2018 Data in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text--KMeX Patient reports Teaching files Literature News articles 9/16/2018 2 SEDE06
3
Medical Digital Library
9/16/2018 System Overview query Medical Digital Library relevant information free-text data (e.g., medical literature, news articles, etc.) image data (e.g., X-ray images, CT images, etc.) structured data (e.g., lab results, patient demo-graphic data) 9/16/2018 3 SEDE06
4
Benefits of knowledge based Medical Digital library
9/16/2018 Benefits of knowledge based Medical Digital library Content Based Information Retrieval Transforms patient records into a sea of information sources Provides scenario-specific information for patient care, medical research and education. 9/16/2018 4 SEDE06
5
Characteristics of Medical Queries
9/16/2018 Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise 9/16/2018 5 SEDE06
6
CoBase: Cooperatrive Database www.cobase.cs.ucla.edu
9/16/2018 CoBase: Cooperatrive Database Use knowledge base to: Derive Approximate Answers Answer Conceptual Queries Provide Associative Query Answers 9/16/2018 6 SEDE06
7
KB: Type Abstraction Hierarchy (TAH)
9/16/2018 KB: Type Abstraction Hierarchy (TAH) Using clustering technique to group similar: Attribute values Image features Spatial relationships among objects Provides multi-level knowledge (conceptual) representation 9/16/2018 7 SEDE06
8
Data mining for KB--TAH
9/16/2018 Data mining for KB--TAH Clustering data of an attribute: Value--difference between the exact value and the returned approximate value Frequency-- probability of occurrence for each value Can be extended to multiple attributes 9/16/2018 8 SEDE06
9
Type Abstraction Hierarchies for Medical Domain
9/16/2018 Type Abstraction Hierarchies for Medical Domain Tumor (location, size) Class X [loc1 loc3] [s1 s3] Class Y [locY sY] X1 [loc1 s1] X2 [loc2 s2] X3 [loc3 s3] Age Preteens 9 10 11 12 Teen Adult Ethnic Group Asian Korean Chinese Japanese Filipino African European 9/16/2018 9 SEDE06
10
Generalization and Specialization in TAH
9/16/2018 Generalization and Specialization in TAH More Conceptual Query Specific Query Conceptual Query Generalization Specialization 9/16/2018 10 SEDE06
11
Query Relaxation Display Query Yes Relax Database Answers Attribute No
9/16/2018 Query Relaxation Relax Attribute Query Yes Display Modification Answers Database TAHs No 9/16/2018 11 SEDE06
12
Cooperative Querying for Medical Applications
9/16/2018 Cooperative Querying for Medical Applications Query Find the treatment used for the tumor similar-to (loc, size) X1 on 12 year-old Korean males. Relaxed Query Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment. 9/16/2018 12 SEDE06
13
Medical Digital Library
9/16/2018 System Overview query Medical Digital Library relevant information free-text data (e.g., medical literature, news articles, etc.) image data (e.g., X-ray images, CT images, etc.) structured data (e.g., lab test results, patient demographic data) 9/16/2018 13 SEDE06
14
KMeD: Retrival images by contents www.kmed.cs.ucla.edu
9/16/2018 KMeD: Retrival images by contents PI: Wesley Chu, Ph.D, Computer Science Department Co-PIs: A. Cardenas, Ph.D, Computer Science Department Ricky Taira , Ph.D, School of Medicine Consultants: Denies Aberle, M.D. C.M. Breant, Ph.D Graduate students: Alex Bui Christina Chu John Dionisio T. Plattner D. Johnson C. Hsu T. Ieong 9/16/2018 14 SEDE06
15
KMeD Objectives Matching images based on features
9/16/2018 KMeD Objectives Matching images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Visual query interface 9/16/2018 15 SEDE06
16
KMeD: Retrieval of Images by Features & Content
9/16/2018 KMeD: Retrieval of Images by Features & Content Features size, shape, texture, density, histology Spatial Relations angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission 9/16/2018 16 SEDE06
17
9/16/2018 9/16/2018 17 SEDE06
18
9/16/2018 9/16/2018 18 SEDE06
19
9/16/2018 9/16/2018 19 SEDE06
20
Knowledge-Based Image Model
9/16/2018 Brain Tumor Lateral Ventricle TAH SR(t,b) Tumor Size SR(t,l) SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge Level Schema Level Representation Level (features and content) 9/16/2018 21 SEDE06
21
Knowledge- Based Query Processing Query Analysis and Feature Selection
9/16/2018 Queries Query Analysis and Feature Selection Knowledge- Based Query Processing Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers 9/16/2018 22 SEDE06
22
User Model To customize users’
9/16/2018 User Model To customize users’ interest and preference, needs, and goals. e.g. query conditions, relaxation control, etc. User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match 9/16/2018 23 SEDE06
23
User Model (cont.) Relaxation Control Policies Measure for Ranking
9/16/2018 User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking Triggering conditions 9/16/2018 24 SEDE06
24
9/16/2018 9/16/2018 26 SEDE06
25
Visual Query Language and Interface
9/16/2018 Visual Query Language and Interface Point-click-drag interface Objects may be represented by icons Spatial relationships among objects are represented graphically 9/16/2018 27 SEDE06
26
9/16/2018 9/16/2018 28 SEDE06
27
9/16/2018 Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture 9/16/2018 29 SEDE06
28
9/16/2018 9/16/2018 30 SEDE06
29
9/16/2018 9/16/2018 31 SEDE06
30
9/16/2018 9/16/2018 32 SEDE06
31
9/16/2018 9/16/2018 33 SEDE06
32
A KB Medical Digital Library www.cobase.cs.ucla.edu
9/16/2018 A KB Medical Digital Library query Medical Digital Library relevant information free-text data (e.g., medical literature, news articles, etc.) image data (e.g., X-ray images, CT images, etc.) structured data (e.g., lab test results, patient demographic data) 9/16/2018 34 SEDE06
33
KMeX www.cobase.cs.ucla.edu
9/16/2018 KMeX Project leader: Wesley W. Chu Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou 9/16/2018 35 SEDE06
34
A Sample Patient Report
9/16/2018 A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. 9/16/2018 36 SEDE06
35
Scenario-Specific Queries
9/16/2018 Scenario-Specific Queries Queries that mention one or more scenarios E.g., keratoconus treatment lung cancer diagnosis and complications A scenario (e.g., treatment): a repeating healthcare situation >60% medical queries are scenario specific [HMW90, HPH96, EOE99, EOG00, WMH01] 9/16/2018 37 SEDE06
36
Scenario Specific Retrieval
9/16/2018 Scenario Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. Diagnosis-related articles ??? How to diagnose the disease Treatment-related articles ??? How to treat the disease 9/16/2018 38 SEDE06
37
Challenge II: Terms in the query are too general
9/16/2018 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms in the query and the documents 9/16/2018 39 SEDE06
38
IndexFinder http://fargo.homedns.org/umls/demo.aspx
9/16/2018 IndexFinder Extract key information from clinical free texts Search relevant reports Search similar patients Medical KB (UMLS) provides standard medical concepts IndexFinder Extracts UMLS concepts from clinical texts Clinical Texts Clinical texts are important information sources which include clinical notes, surgical notes, discharge summary, radiology reports, etc. In many situations, doctor needs to search relevant reports of a patient or find a similar patient. To improve the quality of free text search, we need to extract key information from free text and represent it in standard terms. So “What is the key information in a free text report?”. Fortunately, the unified medical language system provides the answer. UMLS as a collection of more than 100 biomedical sources defines the key medical concepts. Therefore, we are very interested in extracting UMLS concepts from clinical texts Extract key info. Standard terms 9/16/2018 40 SEDE06
39
Previous Approaches lambs oats UMLS Mapping UMLS Concepts Free text ip
9/16/2018 Previous Approaches UMLS Mapping UMLS Concepts Free text ip dp i1 i0 vp lambs will v0 eat oats NLP Parser Noun phrases lambs oats The conventional approaches of extracting concepts from free text are like this: Start from free text, Use natural language processing to get a parse tree. From the tree, get a list of noun phrases. Then map each noun phrase against UMLS to get concepts. 9/16/2018 41 SEDE06
40
Problems of Previous Approaches
9/16/2018 Problems of Previous Approaches Concepts cannot be discovered if they are not in a single noun phrase. E.g. In “second, third, and fourth ribs”, “Second rib” can not be discovered. Difficult to scale to large text computing. Natural language processing requires significant computing resources The previous approaches have two main problems: First of all, concepts can not be discovered if they are not in a single noun phrase. For example, in the text “second, third, and fourth ribs”, the concept “second rib” can not be discovered since the word “second” and “ribs” are in different noun phrases. Second, it is difficult to scale to large text computing since natral language processing requires significant computing resources. 9/16/2018 42 SEDE06
41
Our Approach: IndexFinder (Zou et.al 03)
9/16/2018 Our Approach: IndexFinder (Zou et.al 03) Previous: free textUMLS Our approach: UMLSfree text Free text NLP Parser Noun phrases UMLS Mapping Concepts Indexing Index Data ~80MB UMLS 2GB Index phase (offline) concepts Filtering Extracting Free text Search phase (real time) We proposed a new technique called IndexFinder. Previous approach is from free text to UMLS as shown in this graph. From free text, to natural language processing, to noun phrases. Mapping individual noun phrase against UMLS to get concepts. Can we do better? Let us suppose UMLS contains only a single concept “lung cancer”. What will we do? Do we need all the these processes? We would discard all words in the free text except the two words “lung” and “cancer”. Our approach is from UMLS to free text. First, the offline index phase. It indexes the relevant part of UMLS into a compact index data which can loaded into computer main memory to answer query without using any database. Second, the real time search phase. It first extracts concept candidates and then applies filters. Suppose UMLS contains only “Lung cancer” We would discard all words in the text except “lung” and “cancer”. 9/16/2018 43 SEDE06
42
Knowledge-based approach
9/16/2018 Knowledge-based approach Using the compact index data without using any database system. Permuting words in a sentence to generate UMLS concept candidates. Using filters to eliminate irrelevant concepts. IndexFinder is a knowledge-based approach. Using the compact index data to answer query directly without using any database system. Permuting words in a sentence to generate UMLS concept candidates. And using filters to eliminate irrelevant concepts. 9/16/2018 44 SEDE06
43
Eliminate irrelevant concepts
9/16/2018 Eliminate irrelevant concepts Syntactic filter: Limit the # of word combinations within a sentence. Semantic filter: Using semantic types (e.g. body part, disease, treatment, diagnose) Using the ISA relationship and filter out general terms and keep the specific ones. After generating concept candidates, we use filters to eliminate irrelevant concepts. We proposed two kind of filters: The first is syntactic filter. It limits word combination within a sentence. The second is semantic filter. We can filter concepts by semantic types as body part, disease, treatment, etc. We can also use the ISA relationship to remove the general concepts and keep more specific ones. 9/16/2018 46 SEDE06
44
Comparison of Indexfinder with MetaMap
9/16/2018 Comparison of Indexfinder with MetaMap Input: A small mass was found in the left hilum of the lung. MetaMap MetaMap is a well known natural language processing approach to extract UMLS concepts. We’ve compared IndexFinder with MetaMap. And here is an example. For the input text “”. MetaMap found four concepts: Mass, Small, Left hilum, and Lung as in blue. IndexFinder returned a ranked list of four concepts. The top three of the list, lung left hilum, left lung mass, and a mass cannot be discovered by MetaMap. IndexFinder 9/16/2018 47 SEDE06
45
Topic Directory Using indexing for document retrieval can not provide:
9/16/2018 Topic Directory Using indexing for document retrieval can not provide: Standard vocabulary Cross reference among topics Scenario specific search Topic directory resolves these shortcomings by dynamically clustering documents into knowledge based topics based on user specified scenarios 9/16/2018 49 SEDE06
46
9/16/2018 The Mismatch Problem Scenario concepts are too general to match specialized ones in relevant docs Expanded Query: keratoconus, treatment, contact lens, epikeratoplasty, epikeratophakia … Document 1: … The use of contact lens after keratoconic epikeratoplasty… Query: keratoconus, treatment Document 2: … Epikeratophakia for aphakia, keratoconus, and myopia … 9/16/2018 52 SEDE06
47
9/16/2018 Basic Idea Start from pairs of frequently co-occurring concepts [Qiu03, Jing94, Xu96] Apply knowledge structures to filter out pairs that are “irrelevant” to a given scenario, e.g., treatment 9/16/2018 54 SEDE06
48
Sample Co-Occurring Pairs
9/16/2018 Sample Co-Occurring Pairs Concepts most frequently co-occurring with keratoconus keratoconus griffonia contact lens acute hydrops central cornea corneal penetrating keratoplasty epikeratoplasty 9/16/2018 55 SEDE06
49
UMLS – The Knowledge Source
9/16/2018 UMLS – The Knowledge Source Three major components: The MetaThesaurus > 800K medical concepts, <ID, multiple string forms> E.g., <“C ,” {“Keratoconus,” “Cornea conical”}> Used for detecting concepts from free text The Semantic Network ~100 semantic types, ~50 relations among types E.g., “Disease or Syndrome,” – containing 44,000 concepts Used for deriving scenario-specific relationships The SPECIALIST Lexicon 9/16/2018 56 SEDE06
50
Structure of The Knowledge Source
9/16/2018 Structure of The Knowledge Source The Semantic Network Disease or Syndrome Pharmocological Substance treats keratoconus insulin The Meta-Thesaurus acute hydrops keratoconus lactase 9/16/2018 57 SEDE06
51
Fragment of The Semantic Network for Each Scenario
9/16/2018 Fragment of The Semantic Network for Each Scenario E.g., the treatment scenario Therapeutic or Preventive Procedure treats Medical Device Disease or Syndrome Pharmocological Substance treats treats 9/16/2018 58 SEDE06
52
Filtering Therapeutic or Preventive Procedure Disease or Syndrome
9/16/2018 Filtering Therapeutic or Preventive Procedure Disease or Syndrome Pharmocological Substance Medical Device treats corneal keratoconus griffonia contact lens penetrating keratoplasty epikeratoplasty treats central cornea penetrating keratoplasty acute hydrops epikeratoplasty contact lens keratoconus griffonia 9/16/2018 59 SEDE06
53
Knowledge-Based Query Expansion
9/16/2018 Knowledge-Based Query Expansion Original query: <ckey, {cs}> ckey, a key concept, e.g., keratoconus {cs}, a set of scenario concepts, e.g., treatment {ce}, concepts having scenario-specific relationships with ckey ckey cs ce, e.g., keratoconus treats contact lens Expanded query: <ckey, {cs}, {ce}> E.g., keratoconus, treatment, contact lens, epikeratoplasty, epikeratophakia… 9/16/2018 60 SEDE06
54
Need for Weight Adjustments
9/16/2018 Need for Weight Adjustments Weight adjustments needed to compensate for the filtering c’e v’e ce ve fuchs dystrophy 0.289 penetrating keratoplasty 0.247 epikeratoplasty 0.230 epikeratophakia 0.119 corneal ectasia 0.168 keratoplasty 0.103 acute hydrops 0.165 contact lens 0.101 keratometry 0.133 thermokeratoplasty 0.092 corneal topography 0.132 button 0.067 corneal 0.130 secondary lens implant 0.057 aphakic corneal edema 0.122 fittings adapters 0.048 esthesiometer 0.043 statistical expansion knowledge-based expansion 9/16/2018 63 SEDE06
55
9/16/2018 The OHSUMED Testbed A testbed: a benchmark query set, a corpus, relevance judgments for each query OHSUMED [HBL94] 57 scenario-specific queries e.g., keratoconus treatment thrombocytosis treatment and diagnosis diagnostic and theraputic work up of breast mass 348K MEDLINE articles (title + abstract), 1988 – 1992 How do we identify the scenario concepts in each query The dataset is not the same as the one in the metasearching problem 9/16/2018 64 SEDE06
56
Comparison Under Different Expansion Sizes
9/16/2018 Comparison Under Different Expansion Sizes s – expansion size Metric – avgp Why appending co-occurring terms can be helpful in the first place Why would we consider it as a significant improvement comparing the top two curves? 9/16/2018 66 SEDE06
57
Summary of Query Expansion
9/16/2018 Summary of Query Expansion Knowledge based approach selects more scenario specific terms than statistical approach and achieves better performance Different “quality” of knowledge structure for different scenarios yield different performance improvements 9/16/2018 67 SEDE06
58
Challenge II: Terms in the query are too general
9/16/2018 Challenge I: Indexing Challenge II: Terms in the query are too general Challenge III: Mismatch between terms used in the query and the documents causes problems in ranking of results 9/16/2018 68 SEDE06
59
Challenge III: Mismatching between terms used in query and documents
9/16/2018 Challenge III: Mismatching between terms used in query and documents Example Query: … lung cancer, … ? ? ? Document 1: … lung carcinoma … Document 3: anti-cancer drug combinations… Document 2: … lung neoplasm … 9/16/2018 69 SEDE06
60
Ranking query results Traditional approach
Word Stem based Vector Space Model (VSM) Concept based VSM New approach Phrase based (word +concept) VSM 9/16/2018 70
61
Phrase-based Vector Space Model (VSM)
9/16/2018 Phrase-based Vector Space Model (VSM) Query: … lung cancer, … Query: … lung cancer, … ? ? √ √ √ ? lung cancer = lung carcinoma … missing!!! parent_of anti-cancer drug combinations Document: … lung neoplasm … Document: … lung carcinoma … Document: … anti-cancer drug combinations … Document: … anti-cancer drug combinations … lung neoplasm … Knowledge-source 9/16/2018 71 SEDE06
62
Phrase-based VSM Examples
9/16/2018 Phrase-based VSM Examples Query: “lung cancer …” Phrases: [(C ); “lung” “cancer”]… Document: “anti-cancer drug combinations …” Phrases: [(C ); “anti” “cancer” “drug” “combin”]… To resolve these two problems, we use a “phrase” to keep track of both the concept and its word stems. For example, Our query becomes (bullet 1). Here we use a pair of brackets to enclose a phrase. Just like in the concept-based VSM, we first detect concepts C etc., but this time, we keep all the stems together with the CUIs. “Infiltrative small bowel process” becomes (bullet 2). Notice how the stems provide useful information for the “unknown” concepts. As for “cerebral edema” and “cerebra lesion”, although there is no relation between their CUIs, the share stem “cerebr” shows that they are actually related. Query Document [(C ); “lung” “cancer”] … [(C ); “anti” “cancer” “drug” “combin”] … 9/16/2018 72 SEDE06
63
Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)
9/16/2018 Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16% 100 queries vs. 5% 50 queries The baseline for comparison is the stem-based VSM as we said before. Here, we plot the precision values at the 11 recall points. If we use concepts as terms and treat different concepts as unrelated, we arrived at the (Concepts Unrelated) line. The result is significantly worse than the baseline. (28%) Taking the concept inter-relationship into consideration (Concepts), we achieve a significant improvement over (Concept Unrelated). The average effectiveness is similar to that of the baseline. On the other hand, if we consider contribution of both the stems and the concept in a phrase, but treating different concepts as unrelated (Phrases, Concepts Unrelated), we also achieve significant improvement over the (Concept, Unrelated) line. The improvement over the baseline is not significant. Considering both stem contribution and the concept interrelationships (Phrases), we achieve a 16% improvement over the baseline. Remember that in information retrieval, a 5% improvement in average precision over 50 queries is considered significant, the 16% improvement shown here warrants a paradigm change from stem-based VSM to phrase-based VSM. 9/16/2018 73 SEDE06
64
9/16/2018 Experimental Results Knowledge based query expansion (KQE) is superior to statistical query expansion. Knowledge based phrase vector space model (PVSM) is superior to stem based vector space model (SVSM). KQE + PVSM can yield 15-20% improvements in precision/recall than SVSM. 9/16/2018 76 SEDE06
65
KMeX Demo Ad-hoc query Medical Digital Library (free text documents)
9/16/2018 KMeX Demo Ad-hoc query Medical Digital Library (free text documents) Patient report for content correlation Query results Patient reports Medical literature Teaching materials News Articles 9/16/2018 77 SEDE06
66
Query Answering via Templates
9/16/2018 Query Answering via Templates Sample templates: “<disease>, treatment,” “<disease>, diagnosis ” relevant documents Phrase-based VSM lung cancer lung cancer Query Expansion IndexFinder radiotherapy chemotherapy Template: “<disease>, treatment” lung cancer, treatment … cisplatin 9/16/2018 78 SEDE06
67
9/16/2018 9/16/2018 80 SEDE06
68
9/16/2018 9/16/2018 81 SEDE06
69
9/16/2018 9/16/2018 82 SEDE06
70
9/16/2018 9/16/2018 83 SEDE06
71
9/16/2018 9/16/2018 84 SEDE06
72
9/16/2018 Future Applications Patient: searches for relevant literature and specialists regarding the treatment of his/her specific disease. Healthcare providers: identifies other individuals with similar demography and disease, discover the success rates and side effects of treatment methods used. Medical researchers: studies the characteristics of new diseases and the effectiveness of treatment methods for those diseases 9/16/2018 90 SEDE06
73
Acknowledgments This research was supported by: Darpa F30602-94-C-0207
9/16/2018 Acknowledgments This research was supported by: Darpa F C-0207 NSF grant # IIS NIC/NIH Grant # 9/16/2018 91 SEDE06
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.