QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.

QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University of Singapore Email: yangh@comp.nus.edu.sg

Outline Introduction Factoid Subsystem List Subsystem Definition Subsystem Result Conclusion and Future Work

Introduction Given a question and a large text corpus, return an “answer” rather than relevant “documents” QA is at the intersection of IR + IE + NLP Our system - QUALIFIER Consists 3 subsystems External Resources – Web, WordNet, Ontology Event-based Question Answering New Modules introduced

Factoid System Overview

Factoid Subsystem Detailed Question Analysis QA Event Construction QA Event Mining Answer Selection Answer Justification Fine-grained Named Entity Recognition Anaphora Resolution Canonicalization Coreference Successive Constraint Relaxation

Why Event-based QA - I The world consists of two basic types of things: entities and events and people often ask questions about them. From Question Answering’s Point of View Questions = “enquiries about entities or events”.

Why Event-based QA - II QA Entities “Anything having existence (living or nonliving)” E.g. “What is the democratic party symbol?” QA Events “Something that happens at a given place and time”. E.g. “How did donkey become democratic party symbol?” Thomas Nast 1870 Harper’s Weekly cartoon

Why Event-based QA - III Entity Questions Properties, or entities themselves definition questions. Event Questions Elements of events Location, Time, Subject, Object, Quantity Description Action, etc. WH-QuestionQA Event Elements WhoSubject, Object WhereLocation WhenTime What Subject, Object, Description, Action WhichSubject, Object, HowQuantity, Description Table 1: Correspondence of WH- Questions & Event Elements question :== event | event_element | entity | entity_property event :== { event_element } event_element :== time | location | subject | object | quantity | description | action | other entity :== object | subject entity_property :== quantity | description | other

Event-based QA Hypothesis Equivalency:  QA event E i,E j,if all_elements(E i ) = all_elements(E j ), then E i = E j, and vice versa; Generality: if all_elements(E i ) is a subset of all_elements(E j ), then E i is more general than E j ; Cohesiveness: if elements a, b both belong to an event E i, and a, c do not belong to a known event, then co-occurrence(a,b) is greater than co- occurrence(a,c); Predictability: if elements a, b both belong to an event E i, then a => b and b => a.

QA Event Space Consider an event to be a point in a multi-dimensional QA event space. If we know all the elements about an event, then we can easily answer different questions about it E.g. “When did Bob Marley die ?” As there are innate associations among these elements if they belong to the same event (Cohesiveness), we can use what are already known To narrow the search scope To find rest of the unknown event elements, the answer (Predictability)

Problems to be Solved However, for most of the cases, it is difficult to find the correct unknown element(s), i.e., the correct answer Two major problems: Insufficient known elements Inexact known elements Solution: Explore the use of world knowledge (Web and WordNet glosses) to find more known elements Exploit the lexical knowledge from (WordNet synsets and morphemics) to find exact forms.

How to Find a QA Event Using Web From original query term q (0), retrieve top N web documents  q i (0)  q (0), extract nearby non-trivial words in one sentence or n words away (in C q ) and rank them by computing its probability of correlation with q i (0) Using WordNet  q i (0)  q (0), extract terms that are lexically related to q i (0) by locating them in Gloss G q and Synset S q Combine the external knowledge resources to form term collection: K q = C q + (G q  S q )

QA Event Construction Structured Query Formulation We perform structural analysis on K q to form semantic groups of terms Given any two distinct terms t i, t j  K q, we compute their Lexical correlation Co-occurrence correlation Distance correlation

QA Event Construction For example, “What Spanish explorer discovered the Mississippi River?” The final Boolean query becomes: “(Mississippi) & (French|Spanish) & (Hernando & Soto & De) & (1541) & (explorer) & (first | European |river)”.

QA Event Mining Extract important association rules among the elements by using data mining techniques. Given a QA event E i, we define X, Y as two sets of event elements. Event mining studies the rules of the form X  Y, where X, Y are QA event element sets, X  Y = , and Y  {element original }= . if X  Y, ignore X  Y. if cardinality(Y) > 1, ignore X  Y. if Y  {element original } , ignore X  Y.

Passage & Answer Selection Select Passage based on Answer Event Score (AES) from the relevant documents in the QA corpus: Support (X  Y) = Confidence (X  Y) = The weight for answers candidate j is defined as:

Related Modules: Fine-grained Named Entity Recognition Fine-grained NE Tagging Non-ascii Character Remover Number Format Converter E.g. “one hundred eleven” => 111 Rule Confliction Revolver Longer Length Ontology Handcrafted Priorities

Related Modules: Answer Justification We generate axioms based on our manually constructed ontology. For example, q1425: What is the population of Maryland? Sentence: “Maryland 's population is 50,000 and growing rapidly.” Ontology Axiom (OA): Maryland (c1) & population (c1, c2) -> 5000000(c2) In this way, we could identify the wrong answer “50000”, which is the surface text shown.

Factoid Results

List System Overview

List Subsystem Multiple Answers from Same Paragraph Canonicalization Resolution Unique answer “the States”, “USA”, “United States”, etc Pattern-based Answer Extraction, and + verb … … include:,, … “list of …” “top” + number + adj-superlative

List Results

System Overview

Definition Subsystem

Pre-processing document filter anaphora resolution sentence “positive set” and “negative set” Sentence Ranking Sentence weighting in Corpus Sentence weighting in Web Overall weighting :

Definition Subsystem Answer Generation (Progressive Maximal Margin Relevance) 1.All sentences are ordered in descending order by weights. 2.Add the first sentence to the summary. 3.Examine the following sentences. If Weight(stc)- Weight(next_stc) >avg_sim(stc), Add next_stc to summary; 4.Go to Step 3) till the length limit of the target summary is satisfied.

Definition Results We empirically set the length of the summary for People and Objects based on question classification results.

Overall Performance

Conclusion and Future Work Conclusion Event-based Question Answering Factoid question and list questions explore the power of Event-based QA Definition questions answering combines IR and Summarization Use Ontology to boost the performance of our NE and answer justification modules Future Work Give a formal proof of our QA event hypothesis Working towards an online question answering system Interactive QA Analysis and opinion questions VideoQA – question answering on news video

QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.

Similar presentations

Presentation on theme: "QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University.

Similar presentations

Presentation on theme: "QUALIFIER in TREC-12 QA Main Task Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of Computing National University."— Presentation transcript:

Similar presentations

About project

Feedback