Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.

Slides:



Advertisements
Similar presentations
The Feasibility of Using the Semantic Components Model for Indexing Documents in Digital Libraries * Susan Price + Marianne Lykke Nielsen * Lois Delcambre.
Advertisements

Jason H.D. Cho 1,2, Parikshit Sondhi 1, Chengxiang Zhai 1, Bruce R. Schatz 1,2,3 1 Department of Computer Science, 2 Institute of Genomic Biology, 3 Department.
Doç. Dr. Nurver Turfaner Department of Family Medicine.
Diagnostic Method Diagnosis Diagnosis means `through knowledge` and entails acquisition of data about the patient and their complaints using the senses:
1 / 22 Issues in Text Similarity and Categorization Jordan Smith – MUMT 611 – 27 March 2008.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
1 Knowledge Management for Disease Coding (KMDC): Background & Introduction Timothy Hays, Ph.D. Project Manager, Knowledge Management for Disease Coding.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Diagnosing – Critical Activity HINF Medical Methodologies Session 7.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Therapeutic exercise foundation and techniques Therapeutic exercise foundation and concepts Part II.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
Predicting the Semantic Orientation of Adjectives
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Treatment Based Classification of the Spine- An Evidence Based Journey for the Physical Therapist Tara J. Manal, PT, DPT, OCS, SCS Gregory E. Hicks, PT,
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
CBR in Medicine Jen Bayzick CSE435 – Intelligent Decision Support Systems.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics,
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Healthcare Process Modelling by Rule Based Networks Han Liu First Year PhD Student Alex Gegov, Jim Briggs, Mohammed Bader PhD Supervisors.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Graduate School of Informatics Kyoto University, November 21, 2001 Technologies of the Interspace Peer-Peer Semantic Indexing Bruce Schatz CANIS Laboratory.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
Clinical Decision Support 1 Historical Perspectives.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Best pTree organization? level-1 gives te, tf (term level)
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Queensland University of Technology
Stephen Joseph Galli, MD  Journal of Allergy and Clinical Immunology 
Deep Learning Amin Sobhani.
Clustering of Web pages
An Artificial Intelligence Approach to Precision Oncology
Information Retrieval
Using UMLS CUIs for WSD in the Biomedical Domain
Computerized Decision Support for Medical Imaging
Retrieval Utilities Relevance feedback Clustering
By Hossein Hematialam and Wlodek Zadrozny Presented by
Presentation transcript:

Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of Missouri-Columbia

Physicians Have Questions when they treat patients What is the best treatment for migraines in patients who are diabetic? How often should I repeat the TSH for this patient who is on synthroid? When should I get an X-ray for this patient with low back pain?

Observational Studies of Physician Information Needs Covell – 1985 –Annals of Internal Medicine. Oct 1985;103(4): Osherhoff – 1992 –Annals of Internal Medicine. Apr ;114(7): Gorman – 1994 –Medical Decision Making. Apr-Jun 1995;15(2): Ely – 1999 – BMJ. Aug ;321(7258):

Common Themes from Observational Studies Physicians have questions for 45-65% of all patients they see Physicians pursue only about 30% of those questions Physicians find answers to 80% of the questions they pursue

Collections of Questions Over 10,000 question strings collected –NLM, Ely, Vanderbilt, Duke, FPIN, U of Washington, Britain, Australia No good way to classify the questions No automated method of finding duplicate questions

Reasons to Automate Classification Organize collections of questions Improve accuracy of existing classification Find redundancy (duplicate questions) Find frequency of occurrence

Research Goal Automate Classification of Medical Questions Question Type – based on semantic and syntactic information (this experiment) Question Meaning – based on the specific instantiations of semantic and syntactic information (subsequent experiments) Ultimately – to match questions directly with structured medical information

Study Overview MU Ely 1101 Specific Questions Domain Experts 64 Categories 170 Generic Question Strings Semantic Group Sequence Patterns Automated Classification Of Specific Questions

Ely Taxonomy Generic Category (64 total) – 1111 Generic Question Strings (GQS) –What is the cause of symptom x? –What is the differential diagnosis of symptom x? –Could symptom x be condition y or be a result of condition y? –What is the likelihood that symptom x is coming from condition y?

Methods for this study (overview) 1.Extracted medical concepts from question strings using UMLS MRXNS table 2.Assigned concept unique identifier (CUI) to Semantic Groups 3.Found Semantic Group Sequence (SGS) patterns using Apriori Algorithm (modified) 4.Matched SGS from specific questions to SGS in Ely’s generic question strings to assign the generic category

1. Extracted CUIs from question strings 3 word, 2 word, 1 word window parser matching strings to MRXNS –[How should I] treat acute pharyngitis? –How [should I treat] acute pharyngitis? –How should [I treat acute] pharyngitis? –How should I [treat acute pharyngitis]?

1. Extracted CUIs from question strings How should I treat [acute pharyngitis]? –Acute pharyngitis => UMLS semantic type T047 Disease or Syndrome How should I [treat] ? –Treat (treatment) => UMLS semantic type T061 Therapeutic or Preventative Procedure

2. Assigned CUIs to Semantic Groups Semantic Groups are aggregations of similar semantic types 27 Semantic Groups (from UMLS Semantic Network) –T047 is in 017 (PATH-PROC) –T061 is in 027 (THER) 39 additional, non-medical Semantic Groups (derived from general thesauri)

3. Found Semantic Group Sequence (SGS) patterns Example question: –How should I treat acute pharyngitis –253 | 250 | 242 | 27 | 17 | 253 – How/Why 250 – Does/Can/Could/Should 242 – I/You/He/She/We 27 – treat (treatment) 17 – acute pharyngitis Ran 3000 question strings through the parser and looked for recurrent patterns

3. Found Semantic Group Sequence (SGS) patterns Example question: –How should I treat acute pharyngitis Matching patterns: Semantic typesSupport / Confidence % occurrence for support 50% incidence for confidence

3. Found Semantic Group Sequence (SGS) patterns Support the pattern of SGS occurs in at least 3% of all the questions parsed Confidence occurs 50% of the time when is found

4. Matched SGS patterns in generic and specific questions Generic question: –How should I treat condition y? Specific questions with some matching SGS patterns –How do I treat depression? –How do I manage Parkinsonism? –How do I treat acne? –How do I treat conjunctivitis? –How do I treat dementia? –How do I treat STD’s?

Results 1101 specific questions 20,710 total words 867 (2804 instances) did not match in MRXNS or in MRCON (MRXNW gave too many hits) The majority of these strings were mapped to an existing semantic type using ad-hoc stemming techniques

Results 7183 SGS patterns matched in specific and generic questions 204 (18%) specific questions had potential matches with generic questions (using SGS) 97 (10%) actual matches between specific and generic questions (using domain expert) 67 of these (using SGS) matched the same category assigned by Dr. Ely

Discussion 6% of specific question strings mapped to the generic category assigned by Ely (67/1101) 33% of those predicted to match by SGS patterns had matching generic categories 73% of specific question strings didn’t map to any generic category 45% of specific question strings (that did map to a generic category) mapped to more than one generic category

Discussion Automatic Classification of Questions –SGS pattern matching can cluster questions with similar semantic and syntactic information –These clustered questions often have the same meaning Discrepancy in classification – SGS and Ely –Our model needs work –Ely classifications are not semantically-based –Ambiguity in Ely classifications

Why Questions Didn’t Match Categories Generic Category Assigned Category Specific Question Diagnosis 1111 What is the differential diagnosis of a rash? Diagnosis 1121 What is the differential diagnosis of a rash? 35 questions are assigned to more than one category

Future Work Improve accuracy of model –Refine Semantic Groups –Use relevance feedback and Semantic Group weighting –Include part-of-speech tagging and syntactic parsing –Incorporate WordNet for non-medical terms Develop an indexing schema that represents the semantic groups and syntactic information as vectors in a high-level feature space model

Thank-you! Acknowledgements: This research was supported in part by National Library of Medicine Biomedical and Health Informatics Research Training Grant 2-T15-LM And, thanks to Dr. John Ely for his willingness to share his raw questions and classification data.