Consumer Health Question Answering Systems Rohit Chandra 2011090 Sourabh Singh 2011112.

1 Consumer Health Question Answering Systems Rohit Chandra 2011090 Sourabh Singh 2011112

2 Plan  Introduction  Architecture  Stages  Question Processing  Query formulation  Document Retrieval  Sentence Extraction  Answer ranking  Corpus  Existing health QA system  Questions

3 Introduction  A system which gives automated answers to consumer health queries  Can answer questions like –  What treatment to get ?  Symptoms – Disease  Prescribed medicines for an ailment  Prognosis of a disease  Many QA systems exist in the clinical domain  Clinical QA systems take well formed queries in pre-defined templates  Consumer health queries are ill formed and many times have lots of unnecessary detail

4 Architecture

5 Stages  Receiving user query  Question processing and analysis (NLP)  Question classification for search in specific DBs  Keyword extraction and query formulation  Document retrieval (TF-IDF)  Sentence extraction (SemRep)  Ranking of candidate answers  Displaying the answers

6 Question processing  Fixing grammatical errors, stemming and tokenization, POS tagging  Stanford NLP tools  Anaphora and Ellipsis resolution  Useful in decomposition  Divides the complex query into independent meaningful sentences  Focus term determination  Svm-light & weka  Metamap for UMLS entities  Question classification  SVM Light, ClinQues, Stanford NLP Tools  12 categories viz. diagnosis, treatment, prognosis, medication etc.

7 Question Decomposition

8 Features  Unigrams (UMLS entities)  Semantic groupings  UMLS semantic type  Sentence offset  Lexicon Offset  POS tags  Bigrams  Parse tree tags

9 Query expansion/Keywords UMLS Metathesaurus online

10 Document retrieval  Tf-IDF using Apache Lucene – simple keyword based retrieval  Apache Solr – searching of locally indexed documents  OAQA – keyword search along with semantic and expansion information using UMLS and Wordnet  Bing API

11 Sentence extraction  This is done using Metamap and SemRep  Both tools provide Java and Web APIs  Takes paragraphs with upto 10000 words  Output in XML format

12 Candidate answer ranking  Keyword frequency  Tf-IDF score  Lesk Score  Longest Common Subsequence

13 Corpus  Online and offline documents for fetching answers and for training data  UMLS Metathesaurus  MeSH  Medline Plus  ClinQues  Yahoo Answers  Steps Tools – Illness & prescription database

14 Existing health QA systems  askHERMES – UW Milwaukee  QANUS – NUS  MiPACQ – University of Colorado Boulder  WATSON – IBM  MD Consult –  Illnesses and prescription database

15 References  NIH health QA system   Metamap, SemRep and UMLS metathesaurus   Athenikos, Sofia J., and Hyoil Han. "Biomedical question answering: A survey."Computer methods and programs in biomedicine 99.1 (2010): 1-24.  Cairns, Brian L., et al. "The MiPACQ clinical question answering system."AMIA Annual Symposium Proceedings. Vol. 2011. American Medical Informatics Association, 2011.

16 Questions Thanks

