Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN

Similar presentations


Presentation on theme: "Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN"— Presentation transcript:

1 Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN cyl@microsoft.com

2 Generating Questions from Queries Where is the next Hannah Montana concert? Q2Q as a question generation shared task

3 Remember Ask Jeeves?  “How large is British Columbia?”

4 Live Search QnA (English)

5 Naver Knowledge iN (Korea) 5  Naver “Knowledge iN “Service  Opened at October 2002  70 Millions Knowledge iN DB are collected (2007. 06)  # of Users: 12 millions Upper level users (higher than Kosu): 6,648 (0.05%)  Distribution of knowledge Education, Learning: 17.78% Computer, Communication: 12.89% Entertainments, Arts: 11.42% Business, Economy: 11.42% Home, Life: 7.44%

6 Baidu Zhidao (China)  17,012,767 resolved questions in two years’ operation.  8,921,610 are knowledge related.  96.7% of questions are resolved.  10,000,000 daily visitors.  71,308 new questions per day.  3.14 answers per question.  http://www.searchlab.com.cn ( 中国人搜索行为研究 /User Research Lab of Chinese Search) http://www.searchlab.com.cn

7 Yahoo! Answers (Global; Marciniak)  Launched in December 2005.  20 million users in the U.S. (> 90 million worldwide).  33,557,437 resolved questions (US; April 2008).  ~70,000* new questions per day (US).  6.76* answers per question (US).

8 Question Taxonomy  ISI’s question answer typology ( Hovy et al. 2001 & 2002 )  Results of analyzing over 20K online questions  140 different question types with examples  http://www.isi.edu/natural- language/projects/webclopedia/Taxonomy/taxonomy_tople vel.html  Liu et al. (COLING 2008)’s cQA question taxonomy  Derived from Broder’s (SIGIR Forum 2002) web serach taxonomy  Results of analyzing 100 randomly sampled questions from top 4 Yahoo! Answers categories Entertainment & Music, Society & Culture, Health, and Computer & Internet

9 Main Task: Q2Q  Generate questions given a query  Query: “Hannah Montana concert”  Questions: “How do I get Hannah Montana concert tickets for a really good price?” “What should i wear to a hannah montana concert?” “How long is the Hannah Montana concert?” …  Subtasks  Predict user goals  Learn question templates  Normalize questions

10 Data Preparation  cQA archives  Live Search QnA  Yahoo! Answers  Ask.com  Other sources  Query logs  MSN/Live Search  Yahoo!  Ask.com  TREC and other sources  Possible process  Sample queries from search engine query logs  Ensure broad topic coverage  Find candidate questions from cQA archives given queries  Create mapped Q2Q corpus for training and testing

11 Intrinsic Evaluation  Given a query term  Generate a rank list of questions related to the query term  Open set – use pooling approach Pool all questions from participants Rate each question as relevant or not Compute recall/precision/F1 scores  Closed set – use test set data as gold standard  Metrics Diversity, interestingness, utility, and so on.

12 Extrinsic Evaluation  A straw man scenario  Task – online information seeking  Setup 1. A user select a topic (T) she is interested in. 2. Generate a set of N queries given T and a query log. 3. The user select a query (q) from the set. 4. Generate a set of M questions given q. 5. The user select the question (Q) that she has in mind. 6. If the user does not select any question, record it as not successful. 7. Send q to a search engine (S); get results X. 8. Send q, Q, and anything inferred from Q to S; get results Y. 9. Compare results X and Y using standard IR relevance metrics.

13 Summary  Task: Question generation from queries  Data:  Search engine query logs  cQA question answer archives  Question taxonomies  Evaluation:  Intrinsic – evaluate specific technology areas  Extrinsic – evaluate its effect on real world scenarios  Real data, real task, and real impact

14 Analyze cQA Questions (Liu et al. COLING 08) cQA QuestionNavigationalInformationalConstantDynamicOpinion Context- Dependent OpenTransactionalSocial


Download ppt "Automatic Question Generation from Queries Natural Language Computing, Microsoft Research Asia Chin-Yew LIN"

Similar presentations


Ads by Google