Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.

Similar presentations


Presentation on theme: "Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of."— Presentation transcript:

1 Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of Missouri-Columbia

2 Physicians Have Questions when they treat patients What is the best treatment for migraines in patients who are diabetic? How often should I repeat the TSH for this patient who is on synthroid? When should I get an X-ray for this patient with low back pain?

3 Observational Studies of Physician Information Needs Covell – 1985 –Annals of Internal Medicine. Oct 1985;103(4):596-599. Osherhoff – 1992 –Annals of Internal Medicine. Apr 1 1991;114(7):576-581. Gorman – 1994 –Medical Decision Making. Apr-Jun 1995;15(2):113-119. Ely – 1999 – BMJ. Aug 12 2000;321(7258):429-432.

4 Common Themes from Observational Studies Physicians have questions for 45-65% of all patients they see Physicians pursue only about 30% of those questions Physicians find answers to 80% of the questions they pursue

5 Collections of Questions Over 10,000 question strings collected –NLM, Ely, Vanderbilt, Duke, FPIN, U of Washington, Britain, Australia No good way to classify the questions No automated method of finding duplicate questions

6 Reasons to Automate Classification Organize collections of questions Improve accuracy of existing classification Find redundancy (duplicate questions) Find frequency of occurrence

7 Research Goal Automate Classification of Medical Questions Question Type – based on semantic and syntactic information (this experiment) Question Meaning – based on the specific instantiations of semantic and syntactic information (subsequent experiments) Ultimately – to match questions directly with structured medical information

8 Study Overview MU Ely 1101 Specific Questions Domain Experts 64 Categories 170 Generic Question Strings Semantic Group Sequence Patterns Automated Classification Of Specific Questions

9 Ely Taxonomy Generic Category (64 total) – 1111 Generic Question Strings (GQS) –What is the cause of symptom x? –What is the differential diagnosis of symptom x? –Could symptom x be condition y or be a result of condition y? –What is the likelihood that symptom x is coming from condition y?

10 Methods for this study (overview) 1.Extracted medical concepts from question strings using UMLS MRXNS table 2.Assigned concept unique identifier (CUI) to Semantic Groups 3.Found Semantic Group Sequence (SGS) patterns using Apriori Algorithm (modified) 4.Matched SGS from specific questions to SGS in Ely’s generic question strings to assign the generic category

11 1. Extracted CUIs from question strings 3 word, 2 word, 1 word window parser matching strings to MRXNS –[How should I] treat acute pharyngitis? –How [should I treat] acute pharyngitis? –How should [I treat acute] pharyngitis? –How should I [treat acute pharyngitis]?

12 1. Extracted CUIs from question strings How should I treat [acute pharyngitis]? –Acute pharyngitis => UMLS semantic type T047 Disease or Syndrome How should I [treat] ------------? –Treat (treatment) => UMLS semantic type T061 Therapeutic or Preventative Procedure

13 2. Assigned CUIs to Semantic Groups Semantic Groups are aggregations of similar semantic types 27 Semantic Groups (from UMLS Semantic Network) –T047 is in 017 (PATH-PROC) –T061 is in 027 (THER) 39 additional, non-medical Semantic Groups (derived from general thesauri)

14 3. Found Semantic Group Sequence (SGS) patterns Example question: –How should I treat acute pharyngitis –253 | 250 | 242 | 27 | 17 | 253 – How/Why 250 – Does/Can/Could/Should 242 – I/You/He/She/We 27 – treat (treatment) 17 – acute pharyngitis Ran 3000 question strings through the parser and looked for recurrent patterns

15 3. Found Semantic Group Sequence (SGS) patterns Example question: –How should I treat acute pharyngitis 253 250 242 27 17 Matching patterns: Semantic typesSupport / Confidence 253 250 242 27 0.0398 0.5588 253 242 27 0.0409 0.5612 253 250 27 0.0477 0.5084 253 250 242 0.0712 0.7598 236 17 0.0613 0.5493 253 27 0.0691 0.5077 253 242 0.0728 0.5346 253 250 0.0938 0.6885 3% occurrence for support 50% incidence for confidence

16 3. Found Semantic Group Sequence (SGS) patterns Support the pattern of SGS occurs in at least 3% of all the questions parsed Confidence 253 250 242 27 occurs 50% of the time when 253 250 242 is found

17 4. Matched SGS patterns in generic and specific questions Generic question: –How should I treat condition y? Specific questions with some matching SGS patterns –How do I treat depression? –How do I manage Parkinsonism? –How do I treat acne? –How do I treat conjunctivitis? –How do I treat dementia? –How do I treat STD’s?

18 Results 1101 specific questions 20,710 total words 867 (2804 instances) did not match in MRXNS or in MRCON (MRXNW gave too many hits) The majority of these strings were mapped to an existing semantic type using ad-hoc stemming techniques

19 Results 7183 SGS patterns matched in specific and generic questions 204 (18%) specific questions had potential matches with generic questions (using SGS) 97 (10%) actual matches between specific and generic questions (using domain expert) 67 of these (using SGS) matched the same category assigned by Dr. Ely

20 Discussion 6% of specific question strings mapped to the generic category assigned by Ely (67/1101) 33% of those predicted to match by SGS patterns had matching generic categories 73% of specific question strings didn’t map to any generic category 45% of specific question strings (that did map to a generic category) mapped to more than one generic category

21 Discussion Automatic Classification of Questions –SGS pattern matching can cluster questions with similar semantic and syntactic information –These clustered questions often have the same meaning Discrepancy in classification – SGS and Ely –Our model needs work –Ely classifications are not semantically-based –Ambiguity in Ely classifications

22 Why Questions Didn’t Match Categories Generic Category Assigned Category Specific Question Diagnosis 1111 What is the differential diagnosis of a rash? Diagnosis 1121 What is the differential diagnosis of a rash? 35 questions are assigned to more than one category

23 Future Work Improve accuracy of model –Refine Semantic Groups –Use relevance feedback and Semantic Group weighting –Include part-of-speech tagging and syntactic parsing –Incorporate WordNet for non-medical terms Develop an indexing schema that represents the semantic groups and syntactic information as vectors in a high-level feature space model

24 Thank-you! Acknowledgements: This research was supported in part by National Library of Medicine Biomedical and Health Informatics Research Training Grant 2-T15-LM07089-11. And, thanks to Dr. John Ely for his willingness to share his raw questions and classification data.


Download ppt "Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of."

Similar presentations


Ads by Google