Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer Studies Laboratory, IvI, University of Amsterdam katrenko@science.uva.nl

Outline Task statement Tree mining: methods Experiments Discussion

Why trees?… What do these two pictures have in common? Complex structure Complex structure! (Scottish handwriting (17 th century))

Motivation Idea: trees can be compared in order to find highly similar structures Tree mining is an intermediate step which allows for the frequent subtree discovery When looking for the most frequent subtrees, we can relax the restrictions on how similar two subtrees should be

What type of trees? (1) In tree mining, there are the following subtrees distinguished: Bottom-up subtrees Induced subtrees Embedded subtrees We use embedded tree mining as described in (M. Zaki, 2005, “Efficiently mining Frequent Trees in a Forest: Algorithms and Applications).

What type of trees?(2) A BD CE G FGH A BD CK E FK H RED – embedded trees YELLOW – bottom-up trees Tree 1 Tree 2

Methodology Data Dependency parsing Depth first search (DFS, preorder) Rooted ordered emb. tree mining Setting thresholds Evaluation

Data preprocessing Each pair of sentences has been parsed by Minipar (Dekang Lin) Each dependency tree has been transformed by incorporating edge labels into node labels Each transformed tree has been presented in preorder (or DFS)

Syntactic matching Provided two sentences (trees, consequently) S1 and S2 where =|S1| and =|S2|, let the size of the rooted maximal embedded tree be. We define the similarity score as a ratio

Runs Run 1: syntactic matching (syntactic functions being incorporated into the node labels) & lemmas overlap Run 2: lemmas overlap (baseline) Run 3: syntactic matching (without syntactic functions) & lemmas overlap

Official results (accuracy) Run 1 (59%) QA 60.50% SUM 69.50% IR 62.00% IE 44.00%

Precision vs. Recall

Precision vs. Recall (2)

Conclusions: Does it work? Syntactic matching improves precision! But… In some cases, it is too flexible (which leads to false positives) We used ordered trees, therefore such pairs as below do not get high matching scores (h) The currency used in China is the Renminbi Yuan. (t) The Renminbi Yuan is the currency used in China.

Possible extensions Use the synonyms/antonyms from WordNet Handle situation where there are several maximal subtrees Use weighing for the tree nodes Use deep semantic analysis

H: The author expressed his gratitude to the audience T: Thank you! / False? True

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

Similar presentations

Presentation on theme: "Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.

Similar presentations

Presentation on theme: "Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer."— Presentation transcript:

Similar presentations

About project

Feedback