Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Free-text Medical Document Retrieval via Phrase-based Vector Space Model Wenlei Mao, MS and Wesley W. Chu, PhD and Computer.
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. Chu Computer Science Dept, UCLA
Multimedia Database Systems
Kien A. Hua Division of Computer Science University of Central Florida.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
SWE 423: Multimedia Systems Chapter 4: Graphics and Images (4)
WMES3103 : INFORMATION RETRIEVAL
Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
A Cooperative Database System (CoBase) for Query Relaxation Wesley W. Chu, Hua Yang, and Gladys Chow Presented by David Liu.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Project IST_1999_ ARTISTE – An Integrated Art Analysis and Navigation Environment Review Meeting N.1: Paris, C2RMF, November 28, 2000 Workpackage.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Methodology Conceptual Database Design
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Chapter 5: Information Retrieval and Web Search
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Flexible Text Mining using Interactive Information Extraction David Milward
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Chapter 6: Information Retrieval and Web Search
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Search Engine Architecture
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Image Classification for Automatic Annotation
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Information Retrieval
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
An Ontological Approach to Financial Analysis and Monitoring.
MULTIMEDIA DATA MODELS AND AUTHORING
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction Multimedia initial focus
Data and Applications Security Developments and Directions
A Knowledge-based Medical Digital Library
Associative Query Answering via Query Feature Similarity
Local Feature Extraction Using Scale-Space Decomposition
A Similarity Retrieval System for Multimodal Functional Brain Images
CSE 635 Multimedia Information Retrieval
Chapter 5: Information Retrieval and Web Search
Panagiotis G. Ipeirotis Luis Gravano
CHAPTER 7: Information Visualization
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA

Outline Data types Uses of knowledge bases to enhance information management Sample systems Structured data Multi-media Free-text Conclusion

Information Formats used in Biomedical Applications Structure Data Multi-media Images Semi-structure Free-text

Uses of Knowledge Bases to Enhance Information Management Approximate matching Query conditions Image features Similar conceptual terms

Uses of Knowledge Bases to Enhance Information Management KB query processing Similarity query answering Associative query answering Scenario-specific query answering Sentinel --Triggering and alerting

Examples of KB Information Systems CoBase ( ), DARPA A database that cooperates with the user for structure data KMeD ( ), NSF A Knowledge-based medical multi-media database Medical Digital Library ( ), NIH A knowledge-based digital file room for patient care, education, and research.

CoBase Graduate students : K. Chiang C. Larson R. Lee M. Merzbacher M. Minock Frank Meng Wenlei Mao Mark Yang K. Zhang Staff: Q. Chen Gladys Chow Hua Yang Project leader: Wesley W. Chu

CoBase: Cooperative Databases Conventional query answering Need to know the detailed data based schema Cannot get approximate answers Cannot answer conceptual queries Cooperative query answering Derive approximate answers Answer conceptual queries Provide additional relevant answers that user does not (or does not know how to) ask for

Find a seaport with railway facility in Los Angeles CoBase Servers Heterogeneous Information Sources CoBase provides: Relaxation Approximation Association Explanation Find a nearby friendly airport that can land F-15 Domain Knowledge Find hospitals with facility similar to St. John’s near LAX Cooperative Queries

Generalization and Specialization More Conceptual Query Specific Query Conceptual Query Specific Query Generalization Specialization Generalization Specialization

Cooperative Querying for Medical Applications Query Find the treatment used for the tumor similar-to (loc, size) X 1 on 12 year-old Korean males. Relaxed Query Find the treatment used for the tumor Class X on preteen Asians. Association The success rate, side effects, and cost of the treatment.

Type Abstraction Hierarchies for Medical Domain Age Preteens TeenAdult Ethnic Group Asian Korean Chinese Japanese Filipino AfricanEuropean Tumor (location, size) Class X [loc 1 loc 3 ] [s 1 s 3 ] Class Y [loc Y s Y ] X 1 [loc 1 s 1 ] X 2 [loc 2 s 2 ] X 3 [loc 3 s 3 ]

KB: Type Abstraction Hierarchy Using clustering technique to group similar Attribute values Image features Spatial relationships among objects Provides multi-level knowledge (conceptual) representation

Data mining for TAH for Numerical Attribute Values Clustering metrics: relaxation error Difference between the exact value and the returned approximate value Relaxation error is weighted by the probability of occurrence of each value Can be extended to multiple attributes

Query Relaxation Relax Attribute Query Yes Display Query Modification Answers Database TAHs No

Summary: CoBase Derive Approximate Answers Answer Conceptual Queries Provide Associative Query Answers

KMeD Graduate students : Alex Bui Chrisitna Chu John Dionisio T. Plattner D. Johnson C. Hsu T. Ieong Consultants: Denies Aberle, M.D. C.M. Breant, Ph.D PI: Wesley Chu, Ph.D, Computer Science Department Co-PIs: A. Cardenas, Ph.D, Computer Science Department Ricky Taira, Ph.D, School of Medicine

KMeD Goal: Retrieval of Images by Features & Content Features size, shape, texture, density, histology Spatial Relations angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction Evolution of Object Growth fusion, fission

Characteristics of Medical Queries Multimedia Temporal Evolutionary Spatial Imprecise

Knowledge-Based Image Model Representation Level (features and content) Brain Tumor Lateral Ventricle TAH SR(t,b) TAH Tumor Size TAH SR(t,l) TAH Lateral Ventricle SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge Level Schema Level SR(t,b) SR(t,l)

Knowledge- Based Query Processing Queries Query Analysis and Feature Selection Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers

User Model To customize users’ interest and preference, needs, and goals. e.g. query conditions, relaxation control, etc. User type Default Parameter Values Feature and Content Matching Policies Complete Match Partial Match

User Model (cont.) Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List Measure for Ranking Triggering conditions

Query Preprocessing Segment and label contours for objects of interest Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects Organize the features and spatial relationships of objects into a feature database Classify the feature database into a Type Abstraction Hierarchy (TAH)

Similarity Query Answering Determine relevant features based on query input Select TAH based on these features Traverse through the TAH nodes to match all the images with similar features in the database Present the images and rank their similarity (e.g., by mean square error)

Visual Query Language and Interface Point-click-drag interface Objects may be represented by icons Spatial relationships among objects are represented graphically

Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture

Implementation Sun Sparc 20 workstations (128 MB RAM, 24-bit frame buffer) Oracle Database Management System C++ Mass Storage of Images (9 GB)

Summary: KMeD Image retrieval by feature and content Matching images based on features Processing of queries based on spatial relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in a timeline metaphor

Medical Digital Library Graduate students: Victor Z. Liu Wenlei Mao Qinghua Zou Consultants: Hooshang Kangaloo, M.D. Denies Aberle, M.D. Project leader: Wesley W. Chu

Data Types Used in a Medical Digital Library Structured data (patient lab data, demographic data,…)--CoBase Images (X rays, MRI, CT scans)--KMeD Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system)

A Free-Text Retrieval System (FTRS) Patient reports Medical literature Knowledge-based Free- Text Retrieval System (FTRS) Teaching materials Query results Ad hoc query Patient report for content correlation News Articles

A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …

Treatment- related articles ??? How to treat the disease Diagnosis- related articles ??? How to diagnose the disease Scenario-Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …

Challenge I: Indexing for Free-Text Extracting key concepts in the free- text for indexing Free-text: Lung cancer, small cell, stage II Concept terms in knowledge source: stage II small cell lung cancer Conventional methods use NLP Not scalable

Challenge II: Mismatch between terms used in query and documents Example Query: … lung cancer, … Document 3: anti-cancer drug combinations… ? ? ? Document 1: … lung carcinoma … Document 2: … lung neoplasm …

Challenge III: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … ? √ Query: lung cancer, chest x-ray, bronchography, …

A Medical KB:Unified Medical Language System (UMLS) Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts) Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.) Specialized Lexicon

Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

IndexFinder: Extracting domain- specific key concepts Technique Permute words from text to generate concept candidates. Use knowledge base to select the valid candidates. Problem Valid candidates may be irrelevant to the document. Redundant concept

Filtering out Irrelevant Concepts Syntactic filter: Limit permutation of words within a sentence. Semantic filter: Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts Use ISA relationship to filter out general concepts and yield specific concepts.

IndexFinder Performance Two orders of magnitude faster than conventional approaches No NLP Time complexity is linear with the number of distinct words in the text Preliminary Evaluation IndexFinder generates more valid terms than that of NLP (using a single noun phrase) Filtering is effective to eliminate irrelevant terms

Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drug combinations … Phrase-based Vector Space Model (VSM) Query: … lung cancer, … ? Knowledge source lung cancer = lung carcinoma … √ lung neoplasm … parent_of √ anti-cancer drug combinations missing!!! Query: … lung cancer, … √ ??

Phrase-based VSM Examples Query Document [(C ); “lung” “cancer”] … [(C ); “anti” “cancer” “drug” “combin”] … Query: “lung cancer …” Phrases: [(C ); “lung” “cancer”]… Document: “anti-cancer drug combinations …” Phrases: [(C ); “anti” “cancer” “drug” “combin”]…

Using knowledge sources to resolve these challenges Challenge I: Automatic indexing of free text Challenge II : Mismatch between terms in the query and the documents Challenge III: Terms in the query are too general

Query Expansion (QE) Queries in the following form benefit from expansion: + e.g. lung cancer e.g. treatment options + e.g. lung cancer e.g. chemotherapy, radiotherapy expansion

result lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase Statistical lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase result Knowledge Source heart surgery heart disease Disease or Syndrome Therapeutic or Preventive Procedure treats + Statistical Knowledge-based Scenario- specific Expansion lung cancer study patient survive mediastinoscopy bronchoscopy chemotherapyradiotherapy increase result Knowledge Source heart surgery heart disease Disease or Syndrome Therapeutic or Preventive Procedure treats

Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) Overall improvement: 33%, 100 queries vs. 5%, 50 queries

Template: “, treatment” FTRS: Scenario-specific Query Answering Sample templates: “, treatment,” “, diagnosis ” lung cancer relevant documents Query Expansion … lung cancer radiotherapy chemotherapy cisplatin IndexFinder lung cancer, treatment Phrase-based VSM Engine

FTRS: Scenario-specific content correlation IndexFinder extracts key concepts from free-text for content correlation Query Templates Scenario Selection e.g. treatment, diagnosis, etc. Patient Report relevant documents Phrase-based VSM Engine IndexFinder Query Expansion …

Summary: KB Free-text retrieval Technologies IndexFinder – extracts key concepts from the free-text Phrase-based VSM – a new document indexing paradigm (concept and its word stems) to improve retrieval effectiveness Knowledge-based query expansion – match query with scenario-specific documents provides scenario-specific free-text retrieval

Conclusions Knowledge sources provides Approximate matching Query conditions Image features Query processing Similarity query answering User modeling Associative answering Triggering and alerting Document retrieval Convert ad hoc free-text into controlled vocabulary Phrase-based VSM Content correlation Scenario-specific retrieval Increase capabilities and effectiveness Information Management

Acknowledgement This research is supported by DARPA, NSF Grant # , and NIC/NIH Grant#