Relevance Models for QA Project Update University of Massachusetts, Amherst AQUAINT meeting December, 2002 Bruce Croft and James Allan, PIs.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Information Retrieval in Practice
Information Retrieval Models: Probabilistic Models
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Evaluating Search Engine
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Search Engines and Information Retrieval
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Question Answering using Language Modeling Some workshop-level thoughts.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Conditional Random Fields
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Table Extraction Using MaxEnt Zonghui Lian. Introduction Table extraction Table format.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Scalable Text Mining with Sparse Generative Models
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Overview of Search Engines
Information Retrieval in Practice
Search Engines and Information Retrieval Chapter 1.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
CS 6961: Structured Prediction Fall 2014 Course Information.
Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
C.Watterscs64031 Evaluation Measures. C.Watterscs64032 Evaluation? Effectiveness? For whom? For what? Efficiency? Time? Computational Cost? Cost of missed.
Table Extraction Using Conditional Random Fields D. Pinto, A. McCallum, X. Wei and W. Bruce Croft - on SIGIR03 - Presented by Vitor R. Carvalho March 15.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Relevance Feedback Hongning Wang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
John Lafferty, Chengxiang Zhai School of Computer Science
Learning to Rank with Ties
Topic: Semantic Text Mining
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Relevance Models for QA Project Update University of Massachusetts, Amherst AQUAINT meeting December, 2002 Bruce Croft and James Allan, PIs

UMass AQUAINT Project Status Question answering using language models  Carried out more experiments using basic LM approach  Developed new model(s) and starting more experiments  Moved experiments to LEMUR toolkit Query triage  Studied Clarity measure for questions Question answering with semi-structured data  Developed HMM and CRF-based table extractors  More experiments on question answering with table structure Answer updating  Experiments with time-based questions

QA using LM P(Answer|Question) can be estimated many ways  Could be done directly, but usually will involve intermediate steps such as documents, question classes  Initially focused on answer passages, but “extracted” answers can be modeled  Can model “templates” as well as n-gram answer models  Can also introduce cross-lingual QA through P(A lang1 |Q lang2 ) Every approach requires training data  “answer mining” for answer models/templates  incorporating user feedback

Query Triage Given a question, what can we infer from it?  Query vs. question  Quality (does it need to be made more precise)  Type (likely form of answers and granularity)  Human intermediation (should it be directed to a human expert?) Previous work developed “Clarity” measure for queries and tested on TREC ad-hoc data  Demonstrated high correlation with performance  Threshold can be set automatically Current research focuses on TREC QA data

Basic result: We can predict question performance (with some qualifications)  Did not work for some TREC question classes For example:  What is the date of Bastille Day? TREC-9P Clarity score 2.49  What time of year do most people fly? TREC-9P Clarity score 0.76 Predicting Question Performance

“the” “do”, “day”, “what” “celebrate” “paris” “bastille” “assmann” terms Log P Clarity score computation Question Q, text Question Q, text... Passages, A... Passages ranked by P(A|Q) retrieve model passage collection language model passage collection language model question- related language model question- related language Compute divergence Clarity Score

Clarity Example (for queries) term rank p q Log 2 (p q /p c ) Top 6 terms in query model: 1. "adjust" 2. "federal" 3. "action" 4. "land" 5. "occur" 6. "hyundai" "What adjustments should be made when federal action occurs?" (clar. 0.37) "Show me predictions for changes in the prime lending rate and any changes made in the prime lending rates" (clar. 2.85) Top 6 terms in query model: 1. "bank" 2. "hong" 3. "kong" 4. "rate" 5. "lend" 6. "prime"

Test System Passages:  Two sentences, overlapping  from top retrieved docs for all questions Measuring performance:  Question Likelihood used to rank passages  Average precision (rather than MRR)  Top 8 documents to estimate Clarity scores

Precision vs. Clarity (Time Qs) Average Precision What is the date of Bastille Day? What time of year do most people fly? What is Martin Luther King Jr 's real birthday? Clarity Score

Question Type# of QsRank Correlation (R) P-Value Amount 33– Famous Location Person 93– Time Miscellaneous Correlation by Question Type

Strong on average Allows prediction of question performance Variation with question type  Two bad (R<0) cases: Amount and Person Amount: only has 33 questions, only a few bad Qs Person: 93 questions, plenty of bad Qs to analyze What’s going on? Correlation Analysis

Two kinds of mistakes: High clarity, low average precision  E.g. What is Martin Luther King Jr 's real birthday?  Answerless, coherent, very likely context in collection  Rare (good thing for the method) Low clarity, high average precision  Various kinds of bad luck  Often coupled with few relevant passages  Many examples in Person case… Predictive Mistakes Ave. Precision Clarity Score

Precision vs. Clarity (Person Qs) 15 “really bad” mistakes  “Really bad” ≡ clarity score 70 %-ile  8 with many relevant answer passages ( > 50 ) 5 (one-third) are slight variants of Who created “The Muppets”? 2 variants of What king signed the Magna Carta? 1 other question with plenty of relevants  7 with few relevant answer passages E.g. Silly Putty was invented by whom?, 2 rels Ave. Precision Clarity Score

QA using Tables Developed and tested QUASM demonstration system using non-LM techniquesQUASM demonstration system  extraction of tabular structure  answer passages constructed from extracted data and metadata  extension of question types for “statistical” data  failure analysis Major focus now is to develop probabilistic framework for whole process  tabular structure extraction  answer passage representation  P(Answer|Question)

QuASM – Lessons Learned Much harder to find answers in tables than in text Table extraction is the key issue Representation of answer passages also very important  what is an answer passage for tables?  e.g. too much metadata can cause poor retrieval

Table Extraction Heuristics do a good job of identifying tables  97.8% percent of lines labeled correctly as in or out of table Small labeling errors, however, can lead to poor retrieval Current algorithm for extracting header information too permissive

Text Table Transformation Number and Percent of Children under 19 Years of Age, at or below 200 Percent of Poverty, by State: Three-Year Averages for 1997, 1998, and (Numbers in Thousands) _________________________________________________________________________________ | AT OR BELOW | AT OR BELOW 200% OF POVERTY | Total children | 200% OF POVERTY | WITHOUT HEALTH INSURANCE | under 19 years, |____________________________|_____________________________| all income levels | Standard Standard| Standard Standard | |Number error Pct. error |Number error Pct. error | ______________________|____________________________|_____________________________| Alabama ,114 | | | Alaska | | | Arizona ,430 | | | Arkansas | | |

Text Table Transformation - Problems (Numbers in Thousands) | AT OR BELOW | AT OR BELOW 200% OF POVERTY | Total children | 200% OF POVERTY | WITHOUT HEALTH INSURANCE | under 19 years, |____________________________|_____________________________| all income levels | Standard Standard| Standard Standard | |Number error Pct. error |Number error Pct. error | Alabama AT OR BELOW 200% OF POVERTY ____________________________ Standard Number. | 499 Missed part of title due to lack of indentation Extraneous text

New Labeling 3 Cells 2 Gaps Mostly Letters Mostly Digits Header Like Dashes Starts with Spaces Consecutive Spaces All White Space Features NONTABLE BLANKLINE TITLE SUPERHEADER TABLEHEADER SUBHEADER DATAROW SEPARATOR SECTIONHEADER SECTIONDATAROW TABLEFOOTNOTE TABLECAPTION Line Tags

Text Table Extraction Model Non-Table TitleData Row Super Header Table Header Subheader Finite State Machine (hidden Markov process) Non-TableTitleSuper HeaderTable HeaderData Row Visible feature vectors probabilistically infer state sequence.

Features for Table Extraction These features are not independent  Many correlations  Overlapping and long- distance dependencies  Observations from the past and future 3 Cells 2 Gaps Mostly Letters Mostly Digits Header Like Dashes Starts with Spaces Consecutive Spaces All White Space Features

Non-TableTitleSuper HeaderTable HeaderData Row Observations are conditioned on state  HMMs are the standard sequence model  They are a generative model of the sequence  Generative models do not easily handle non-independent features. Hidden Markov Models

Conditional Random Fields Non-TableTitleSuper HeaderTable HeaderData Row State sequence is conditioned on entire observation sequence. A conditional model:  Can examine features, but is not responsible for generating them.  Doesn’t have to explicitly model their dependencies.  Has the ability to handle many arbitrary features with the full power of finite state automata.

Results ExperimentPercentage of Lines Labeled Correctly Random, Training Data MLE11.4% HMM83.0% Fully Connected CRF93.3% Original Heuristic (4 labels)77.0% Label six test documents, total of 5817 lines.

Summary of Plans Testing a probabilistic model for QA Refining the Clarity measure for questions Finer-grain table extraction and QA tests Time-dependent language models