Answering List Questions using Co-occurrence and Clustering Majid Razmara and Leila Kosseim Concordia University

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
An Ontology Creation Methodology: A Phased Approach
© Johan Bos November 2005 Question Answering Lecture 1 (two weeks ago): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Global Places. Q1. Name the country given by the X. X France Germany Greece Italy Poland Russia Spain Sweden Switzerland Ukraine.
Clustering Unsupervised learning Generating “classes”
A Web-based Question Answering System Yu-shan & Wenxiu
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Hang Cui et al. NUS at TREC-13 QA Main Task 1/20 National University of Singapore at the TREC- 13 Question Answering Main Task Hang Cui Keya Li Renxu Sun.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Funny Factory Keith Harris Matt Gamble Mike Cialowicz Zeid Rusan.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Ling573 NLP Systems and Applications May 7, 2013.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
AQUAINT IBM PIQUANT ARDACYCORP Subcontractor: IBM Question Answering Update piQuAnt ARDA/AQUAINT December 2002 Workshop This work was supported in part.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Authors: Yutaka Matsuo & Mitsuru Ishizuka Designed by CProDM Team.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Measuring Monolinguality
Funny Factory Mike Cialowicz Zeid Rusan Matt Gamble Keith Harris.
Traditional Question Answering System: an Overview
Presentation transcript:

Answering List Questions using Co-occurrence and Clustering Majid Razmara and Leila Kosseim Concordia University

2 Introduction Question Answering TREC QA track  Question Series  Corpora Target: American Girl dolls  FACTOID: In what year were American Girl dolls first introduced?  LIST: Name the historical dolls.  LIST : Which American Girl dolls have had TV movies made about them?  FACTOID: How much does an American Girl doll cost?  FACTOID: How many American Girl dolls have been sold?  FACTOID: What is the name of the American Girl store in New York?  FACTOID: What corporation owns the American Girl company?  OTHER: Other

3 Hypothesis Answer Instances 1.Have the same semantic entity class 2.Co-occur within sentences, or 3.Occur in different sentences sharing similar context  Based on Distributional Hypothesis: “Words occurring in the same contexts tend to have similar meanings” [Harris, 1954].

Ltw_Eng_ (AQUAINT-2) United, which operates a hub at Dulles, has six luggage screening machines in its basement and several upstairs in the ticket counter area. Delta, Northwest, American, British Airways and KLM share four screening machines in the basement. Ltw_Eng_ (AQUAINT-2) Independence said its last flight Thursday will leave White Plains, N.Y., bound for Dulles Airport. Flyi suffered from rising jet fuel costs and the aggressive response of competitors, led by United and US Airways. New York Times (Web) Continental Airlines sued United Airlines and the committee that oversees operations at Washington Dulles International Airport yesterday, contending that recently installed baggage-sizing templates inhibited competition. Wikipedia (Web) AirTran At its peak of 600 flights daily, Independence, combined with service from JetBlue and AirTran, briefly made Dulles the largest low-cost hub in the United States. Target 232: "Dulles Airport“ Question 232.6: "Which airlines use Dulles” 4

5 Our Approach 1. Create an initial candidate list  Answer Type Recognition  Document Retrieval  Candidate Answer Extraction  It may also be imported from an external source (e.g. Factoid QA) 2. Extract co-occurrence information 3. Cluster candidates based on their co- occurrence

6 Answer Type Recognition 9 Types:  Person, Country, Organization, Job, Movie, Nationality, City, State, and Other Lexical Patterns  ^ (Name | List | What | Which) (persons | people | men | women | players | contestants | artists | opponents | students)  PERSON  ^ (Name | List | What | Which) (countries | nations)  COUNTRY Syntagmatic Patterns for Other types  ^ (WDT | WP | VB | NN) (DT | JJ)* (NNS | NNP | NN | JJ | )* (NNS | NNP | NN | NNPS) (VBN | VBD | VBZ | WP | $)  ^ (WDT | WP | VB | NN) (VBD | VBP) (DT | JJ | JJR | PRP$ | IN)* (NNS | NNP | NN | )* (NNS | NNP | NN) Type Resolution

7 Resolves the answer subtype to one of the main types  List previous conductors of the Boston Pops.  Type: OTHER Sub Type: Conductor  PERSON WordNet's Hypernym Hierarchy

8 Document Retrieval Document Collection  Source Document Collection  Few documents  To extract candidates  Domain Document Collection  Many documents  To extract co-occurrence information Query Generation  Google Query on Web  Lucene Query on Corpora Source Domain

9 Candidate Answer Extraction Term Extraction  Extract all terms that conform to the expected answer type  Person, Organization, Job  Intersection of several NE taggers: LingPipe, Stanford tagger & Gate NE  To get a better precision  Country, State, City, Nationality  Gazetteer  To get a better precision  Movie, Other  Capitalized and quoted terms  Verification of Movie  Verification of Other numHits(“SubType Term” OR “Term SubType”) numHits(“Term”) numHits(GoogleQuery intitle:Term site:

10 Co-occurrence Information Extraction Domain Collection Documents are split into sentences Each sentence is checked as to whether it contains candidate answers

Steps: 1.Put each candidate term t i in a separate cluster C i 2.Compute the similarity between each pair of clusters  Average Linkage 3.Merge two clusters with highest inter-cluster similarity 4.Update all relations between this new cluster and other clusters 5.Go to step 3 until  There are only N clusters, or  The similarity is less than a threshold 11 Hierarchical Agglomerative Clustering

Similarity between each pair of candidates Based on co-occurrence within sentences Using chi-square (  2 ) Shortcoming 12 The Similarity Measure Total  term i term i O 11 + O 21 O 21 O 11 term j O 12 + O 22 O 22 O 12  term j NO 21 + O 22 O 11 + O 12 Total

13 Pinpointing the Right Cluster Question and target keywords are used as “spies” Spies are:  Inserted into the list of candidate answers  Are treated as candidate answers, hence  their similarity to one another and similarity to candidate answers are computed  they are clustered along with candidate answers The cluster with the most number of spies is returned  The spies are removed Other approaches

oman Cluster-31 Cluster-2Cluster-9 spain, bangladesh, japan, germany, haiti, nepal, china, sweden, iran, mexico, vietnam, belgium, lebanon, iraq, russia, turkey pakistan, 2005, afghanistan, octob, u.s, india, affect, earthquak pakistan pakistan, 2005, afghanistan, octob, u.s, india, affect, earthquak pakistan, 2005, afghanistan, octob, u.s, india, affect, earthquak Pakistan earthquakes of October 2005 Target 269: Pakistan earthquakes of October 2005 What countries were affected by this earthquake? Question 269.2: What countries were affected by this earthquake? Recall = 2/3 Precision = 2/3 F-score = 2/3 14

Best 0.085Median 0.000Worst F=14.5 Results in TREC 2007

16 Evaluation of Clustering Baseline  List of candidate answers prior to clustering Our Approach  List of candidate answers filtered by the clustering Theoretical Maximum  The best possible output of clustering based on the initial list F-scoreRecallPrecisionQuestionsCorpus TREC Baseline Our Approach Theoretical Max TREC 2007 Baseline Our Approach Theoretical Max

17 Evaluation of each Question Type

18 Future Work Developing a module that verifies whether each candidate is a member of the answer type  In case of Movie and Other types Using co-occurrence at the paragraph level rather than the sentence level  Anaphora Resolution can be used  Another method for similarity measure   2 does not work well with sparse data  for example, using Yates correction for continuity (Yates’  2 ) Using different clustering approaches Using different similarity measures  Mutual Information

19 Questions?