Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA

Similar presentations


Presentation on theme: "Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA"— Presentation transcript:

1 Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA wwc@cs.ucla.edu www.kmed.cs.ucla.edu

2 Outline Data types Uses of knowledge bases to enhance information management Sample systems Structured data Multi-media Free-text Conclusion

3 Information Formats used in Biomedical Applications Structure Data Multi-media Images Semi-structure Free-text

4 Uses of Knowledge Bases to Enhance Information Management Approximate matching Query conditions Image features Similar conceptual terms

5 Uses of Knowledge Bases to Enhance Information Management KB query processing Similarity query answering Associative query answering Scenario-specific query answering Sentinel --Triggering and alerting

6 Examples of KB Information Systems CoBase (1990-1998), DARPA A database that cooperates with the user for structure data KMeD (1991-2000), NSF A Knowledge-based medical multi-media database Medical Digital Library (2001-2005), NIH A knowledge-based digital file room for patient care, education, and research.

7 CoBase www.cobase.cs.ucla.edu Graduate students : K. Chiang C. Larson R. Lee M. Merzbacher M. Minock Frank Meng Wenlei Mao Mark Yang K. Zhang Staff: Q. Chen Gladys Chow Hua Yang Project leader: Wesley W. Chu

8 CoBase: Cooperative Databases Conventional query answering Need to know the detailed data based schema Cannot get approximate answers Cannot answer conceptual queries Cooperative query answering Derive approximate answers Answer conceptual queries Provide additional relevant answers that user does not (or does not know how to) ask for

9 Find a seaport with railway facility in Los Angeles CoBase Servers Heterogeneous Information Sources CoBase provides: Relaxation Approximation Association Explanation Find a nearby friendly airport that can land F-15 Domain Knowledge Find hospitals with facility similar to St. John’s near LAX Cooperative Queries

10 Generalization and Specialization More Conceptual Query Specific Query Conceptual Query Specific Query Generalization Specialization Generalization Specialization

11 Cooperative Querying for Medical Applications Query Find the treatment used for the tumor similar-to (loc, size) X 1 on 12 year-old Korean males. Relaxed Query Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment.

12 Type Abstraction Hierarchies for Medical Domain Age Preteens 9 10 11 12 TeenAdult Ethnic Group Asian Korean Chinese Japanese Filipino AfricanEuropean Tumor (location, size) Class X [loc 1 loc 3 ] [s 1 s 3 ] Class Y [loc Y s Y ] X 1 [loc 1 s 1 ] X 2 [loc 2 s 2 ] X 3 [loc 3 s 3 ]

13 KB: Type Abstraction Hierarchy Using clustering technique to group similar Attribute values Image features Spatial relationships among objects Provides multi-level knowledge (conceptual) representation

14 Data mining for TAH for Numerical Attribute Values Clustering metrics: relaxation error Difference between the exact value and the returned approximate value Relaxation error is weighted by the probability of occurrence of each value Can be extended to multiple attributes

15 Query Relaxation Relax Attribute Query Yes Display Query Modification Answers Database TAHs No

16 Summary: CoBase Derive Approximate Answers Answer Conceptual Queries Provide Associative Query Answers

17 KMeD www.kmed.cs.ucla.edu Graduate students : Alex Bui Chrisitna Chu John Dionisio T. Plattner D. Johnson C. Hsu T. Ieong Consultants: Denies Aberle, M.D. C.M. Breant, Ph.D PI: Wesley Chu, Ph.D, Computer Science Department Co-PIs: A. Cardenas, Ph.D, Computer Science Department Ricky Taira, Ph.D, School of Medicine

18 KMeD Goal: Retrieval of Images by Features & Content Features size, shape, texture, density, histology Spatial Relations angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission

19

20

21

22 Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise

23

24

25 Knowledge-Based Image Model Representation Level (features and content) Brain Tumor Lateral Ventricle TAH SR(t,b) TAH Tumor Size TAH SR(t,l) TAH Lateral Ventricle SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge Level Schema Level SR(t,b) SR(t,l)

26 Knowledge- Based Query Processing Queries Query Analysis and Feature Selection Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers

27 User Model To customize users’ interest and preference, needs, and goals. e.g. query conditions, relaxation control, etc. User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match

28 User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking Triggering conditions

29 Query Preprocessing Segment and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a Type Abstraction Hierarchy (TAH)

30

31 Similarity Query Answering Determine relevant features based on query input Select TAH based on these features Traverse through the TAH nodes to match all the images with similar features in the database Present the images and rank their similarity (e.g., by mean square error)

32 Visual Query Language and Interface Point-click-drag interface Objects may be represented by icons Spatial relationships among objects are represented graphically

33

34 Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture

35

36

37

38 Implementation Sun Sparc 20 workstations (128 MB RAM, 24-bit frame buffer) Oracle Database Management System C++ Mass Storage of Images (9 GB)

39

40

41

42 Summary: KMeD Image retrieval by feature and content Matching images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in a timeline metaphor

43

44 Medical Digital Library www.kmed.cs.ucla.edu Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Project leader: Wesley W. Chu

45 Data Types Used in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system)

46 A Free-Text Retrieval System (FTRS) Patient reports Medical literature Knowledge-based Free- Text Retrieval System (FTRS) Teaching materials Query results Ad hoc query Patient report for content correlation News Articles

47 A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …

48 Treatment- related articles ??? How to treat the disease Diagnosis- related articles ??? How to diagnose the disease Scenario-Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …

49 Challenge I: Indexing for Free-Text Extracting key concepts in the free- text for indexing Free-text: Lung cancer, small cell, stage II Concept terms in knowledge source: stage II small cell lung cancer Conventional methods use NLP Not scalable

50 Challenge II: Mismatch between terms used in query and documents Example Query: … lung cancer, … Document 3: anti-cancer drug combinations… ? ? ? Document 1: … lung carcinoma … Document 2: … lung neoplasm …

51 Challenge III: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … ? √ Query: lung cancer, chest x-ray, bronchography, …

52 A Medical KB:Unified Medical Language System (UMLS) Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts) Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.) Specialized Lexicon

53 Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

54 IndexFinder: Extracting domain- specific key concepts Technique Permute words from text to generate concept candidates. Use knowledge base to select the valid candidates. Problem Valid candidates may be irrelevant to the document. Redundant concept

55 Filtering out Irrelevant Concepts Syntactic filter: Limit permutation of words within a sentence. Semantic filter: Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts Use ISA relationship to filter out general concepts and yield specific concepts.

56 IndexFinder Performance Two orders of magnitude faster than conventional approaches No NLP Time complexity is linear with the number of distinct words in the text Preliminary Evaluation IndexFinder generates more valid terms than that of NLP (using a single noun phrase) Filtering is effective to eliminate irrelevant terms

57 Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

58 Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drug combinations … Phrase-based Vector Space Model (VSM) Query: … lung cancer, … ? Knowledge source lung cancer = lung carcinoma … √ lung neoplasm … parent_of √ anti-cancer drug combinations missing!!! Query: … lung cancer, … √ ??

59 Phrase-based VSM Examples Query Document [(C0242379); “lung” “cancer”] … [(C0003393); “anti” “cancer” “drug” “combin”] … Query: “lung cancer …” Phrases: [(C0242379); “lung” “cancer”]… Document: “anti-cancer drug combinations …” Phrases: [(C0003393); “anti” “cancer” “drug” “combin”]…

60 Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

61 Query Expansion (QE) Queries in the following form benefit from expansion: + e.g. lung cancer e.g. treatment options + e.g. lung cancer e.g. chemotherapy, radiotherapy expansion

62 result lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase Statistical lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase result Knowledge Source heart surgery heart disease Disease or Syndrome Therapeutic or Preventive Procedure treats + Statistical Knowledge-based Scenario- specific Expansion lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase result Knowledge Source heart surgery heart disease Disease or Syndrome Therapeutic or Preventive Procedure treats

63 Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) Overall improvement: 33%, 100 queries vs. 5%, 50 queries

64 Template: “, treatment” FTRS: Scenario-specific Query Answering Sample templates: “, treatment,” “, diagnosis ” lung cancer relevant documents Query Expansion … lung cancer radiotherapy chemotherapy cisplatin IndexFinder lung cancer, treatment Phrase-based VSM Engine

65 FTRS: Scenario-specific content correlation IndexFinder extracts key concepts from free-text for content correlation Query Templates Scenario Selection e.g. treatment, diagnosis, etc. Patient Report relevant documents Phrase-based VSM Engine IndexFinder Query Expansion …

66 Summary: KB Free-text retrieval Technologies IndexFinder – extracts key concepts from the free-text Phrase-based VSM – a new document indexing paradigm (concept and its word stems) to improve retrieval effectiveness Knowledge-based query expansion – match query with scenario-specific documents provides scenario-specific free-text retrieval

67 Conclusions Knowledge sources provides Approximate matching Query conditions Image features Query processing Similarity query answering User modeling Associative answering Triggering and alerting Document retrieval Convert ad hoc free-text into controlled vocabulary Phrase-based VSM Content correlation Scenario-specific retrieval Increase capabilities and effectiveness Information Management

68 Acknowledgement This research is supported by DARPA, NSF Grant # 9619345, and NIC/NIH Grant#4442511-33780


Download ppt "Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA"

Similar presentations


Ads by Google