Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Solving Problems Using Textual Entailment over Identified Sources 2015/9/14 Weixi Zhu.

Similar presentations


Presentation on theme: "Towards Solving Problems Using Textual Entailment over Identified Sources 2015/9/14 Weixi Zhu."— Presentation transcript:

1 Towards Solving Problems Using Textual Entailment over Identified Sources
2015/9/14 Weixi Zhu

2 Background Problem solving is a long-term goal pursued by artificial intelligence research There are 21.10%(123/583) single choice questions in history tests of NHEEE (including mocks) able to be answered only based on Wikipedia pages and common sense. Human experts have identified Wikipedia pages (2.80 per question) to answer those questions.

3 Problem definition Input Output Goal A question stem
Four options with one correct Wikipedia pages Output Answer(an option) Goal Maximize the accuracy

4 Approach Source Identification -> retrieve Wikipedia pages
Textual Entailment -> rank four options

5 Source Identification
Entity Page Retrieval Entity Mention Identification Leftmost longest principle Filter out stop words (HIT + 5 domain-specific stop words) Entity Disambiguation and Redirect Resolution

6 Source Identification
Quotation Page Retrieval Quotations form the queries (Lucene’s phrase query, with words’ order preserved) Use TF-IDF scheme w如果在p中出现的越多,在整个db中包含w的页面越少,那么weight(w, p)就越大

7 Source Identification
Page Filtering Domain-Based Filtering Whether one of the page’s categories is historical hierarchically. Theme-Based Filtering A question always focuses on a theme. Therefore the pages, 𝑝 𝑖 , should be convergent in a vector space, F. The centroid is 𝑐= 1 𝑛 𝑖=1 𝑛 𝐹( 𝑝 𝑖 ) | 𝐹 𝑝 𝑖 | , and we rank 𝑝 𝑖 by cos⁡(𝑐, 𝐹(𝑝 𝑖 )) Question-Based Filtering Pages should also be useful for answering questions, therefore they should be relevant to the stem text, s, if retrieved based on an option text, O, and vice versa. In a word vector space W(weighted by TF-IDF), we rank p by cos(W(p), W(O)) or cos(W(p), W(s)) or cos(W(p), W(O)) + cos(W(p), W(s))

8 Text Entailment Rank the options with top-ranked one as the answer.
To what extent, “the stem text + identified pages retrieved based on it” or “the option text + identified pages retrieved based on it” could entail “the stem + the option” ? To what extent, the identified pages retrieved based on the stem could entail the option? To what extent, the identified pages retrieved based on the option could entail the stem?

9 Text Entailment For each option o, we have two kinds of scores describing the extent If P(s) = Φ, then OptionScore(o) = cos(W(o), W(s)) If P(o) = Φ, then StemScore(o) = cos(W(s), W(o))

10 Experiment Comparison in Theme-based Filtering
F : words, links, categories

11 Result Date Set Gold Standard
123/583 questions from history tests of NHEEE adopted in Beijing( ) and mock NHEEE adopted in Beijing( ), 6 of which excluded. Wikipedia pages dumped on 2014/11/05 Gold Standard 345 pages identified by human experts for answering the 123 questions.

12 Result Entity Page Retrieval Quotation Page Retrieval
2,383 entity mentions, per question. 6.50% incorrectness by randomly testing 200 of them. 151 (6.34%) disambiguation pages with 91 (60.26%) correctness and 23 (15.23%) lacking replacement. 798 (33.49%) redirect pages Quotation Page Retrieval 251 quotes ( “”, 《》) formed 615 disjunctive queries, 244 of which successfully retrieved one quotation page.

13 Result Different theme-based filtering (k = 6 when F-score reaches the peak)

14 Result Option ranking Combination

15 Weakness and Future work
k should be automatically chosen for each individual question. Pages covering a broad spectrum of topics rank high by theme-based ranking. Need more sophisticated techniques for text entailment except vector space model. Knowledge representation and logical reasoning, in particular temporal reasoning, are essential.

16 Questions, the more the better


Download ppt "Towards Solving Problems Using Textual Entailment over Identified Sources 2015/9/14 Weixi Zhu."

Similar presentations


Ads by Google