Download presentation
Presentation is loading. Please wait.
Published byMegan Black Modified over 9 years ago
1
Fabio Massimo Zanzotto University of Rome “Tor Vergata” Roma, Italy Textual Entailment Recognition for Web Based Question-Answering
2
© F.M.Zanzotto University of Rome “Tor Vergata” Operational Scenarios What’s the weather in Macao? When is my paper scheduled in the World Intelligence Congress?
3
© F.M.Zanzotto University of Rome “Tor Vergata” Web Answering a Question using existing texts Operational Scenarios Q: Who did Roma play with? Snippet: Roma won against Milan (2-1)
4
© F.M.Zanzotto University of Rome “Tor Vergata” Seeking information in text streams Operational Scenarios I: “violare zona rossa a Roma durante la manifestazione” (violate the off-limit zone in Rome during the demonstration) “è prp uno skifo l’hanno kiusa… allò dmn mattina confermato x le otto a piazza del popolo” (it is really bad it has been closed… then tomorrow morning it’s confirmed 8 o’clock in piazza del popolo) … … SMS Stream
5
© F.M.Zanzotto University of Rome “Tor Vergata” Reframing the two problems … Recognizing Textual Entailment Q: Who did Roma play against? S: Roma won against Milan (2-1) Roma played against X Hypothesis (H) Roma played against Milan Text (T) Roma won against Milan (2-1) entails
6
© F.M.Zanzotto University of Rome “Tor Vergata” Reframing the two problems … Recognizing Textual Entailment Hypothesis (H) “violare zona rossa a Roma durante la manifestazione” (violate the off-limit zone in Rome during the demonstration) Text (T) “è prp uno skifo l’hanno kiusa… allò dmn mattina confermato x le otto a piazza del popolo” (it is really bad it has been closed… then tomorrow morning it’s confirmed 8 o’clock in piazza del popolo) entails
7
© F.M.Zanzotto University of Rome “Tor Vergata” Recognizing Textual Entailment (RTE): Problem definition Systems and Approaches for RTE Supervised Machine Learning Methods for RTE Semi-supervised Knowledge Induction for RTE Outline
8
© F.M.Zanzotto University of Rome “Tor Vergata” Classical Entailment Definition Chierchia & McConnell-Ginet (2001): A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true Strict entailment - doesn't account for some uncertainty allowed in applications (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
9
© F.M.Zanzotto University of Rome “Tor Vergata” Language Variability Dow ends up Dow climbs 255 The Dow Jones Industrial Average closed up 255 Stock market hits a record high Dow gains 255 points (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
10
© F.M.Zanzotto University of Rome “Tor Vergata” Natural Language and Meaning Meaning Language Ambiguity Variability (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
11
© F.M.Zanzotto University of Rome “Tor Vergata” Applied Textual Entailment A directional relation between two text fragments: Text (T) and Hypothesis (H): T entails H (T H) if humans reading t will infer that h is most likely true For textual entailment to hold we require: T + previous knowledge K H and not K H
12
© F.M.Zanzotto University of Rome “Tor Vergata” Operational (applied) definition: –Human gold standard - as in NLP applications –Assuming common background knowledge Applied Textual Entailment For textual entailment to hold we require: T + previous knowledge K H and not K H
13
© F.M.Zanzotto University of Rome “Tor Vergata” Applied Textual Entailment Model variability as relations between text expressions: Equivalence: text1 text2 (paraphrasing) Entailment: text1 text2 the general case Hypothesis (H) Roma played against Milan Text (T) Roma won against Milan (2-1) entails Hypothesis (H) Roma defeated Milan Text (T) Roma won against Milan entails
14
© F.M.Zanzotto University of Rome “Tor Vergata” The task has been operationally defined in the challenges of Recognizing Textual Entailment (RTE) (Dagan et al. 2005) under the PASCAL EU Network (RTE 1-2-3) the NIST (RTE 4-5-6-7) the SEMEVAL conference (RTE-8) Operational Definition Current Challenge
15
© F.M.Zanzotto University of Rome “Tor Vergata” The task has been defined on the basis of other NLP tasks: –Question Answering – Information Extraction – “Semantic” Information Retrieval – Comparable documents / multi-doc summarization – Machine Translation evaluation – Reading comprehension – Paraphrase acquisition Most data created from actual applications output Operational Definition (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
16
© F.M.Zanzotto University of Rome “Tor Vergata” Some RTE Challenge Examples TEXTHYPOTHESISTASK ENTAIL- MENT 1 Regan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IEFalse 2Google files for its long awaited IPO.Google goes public.IRTrue 3 …: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QATrue 4 The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%. The SPD is defeated by the opposition parties. IETrue (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
17
© F.M.Zanzotto University of Rome “Tor Vergata” Recognizing Textual Entailment (RTE): Problem definition Systems and Approaches for RTE Supervised Machine Learning Methods for RTE Semi-supervised Knowledge Induction for RTE Outline
18
© F.M.Zanzotto University of Rome “Tor Vergata” Problem We want to build a system that recognize whether: a text T entails an Hypothesis H Systems for RTE Questions: How many possibilities do we have? What kind of knowledge do we need? Is there a baseline system?
19
© F.M.Zanzotto University of Rome “Tor Vergata” Lexical Overlap System Count how many words/tokens are in common ore “related” between T and H, if this number is large (above a threshold) then say ENTAILMENT otherwise say NOT-ENTAILMENT Baseline RTE system
20
© F.M.Zanzotto University of Rome “Tor Vergata” Baseline RTE system Hyp: The Cassini spacecraft has reached Titan. Text: The Cassini spacecraft arrived at Titan in July, 2006. (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
21
© F.M.Zanzotto University of Rome “Tor Vergata” Some examples: Baseline RTE system T1T1 H1H1 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid insurance companies pay dividends.” T 1 H 1 T1T1 H2H2 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid companies pay cash dividends.” T 1 H 2 (Zanzotto, Moschitti, 2006) The problem is not so simple, but this is a good baseline!
22
© F.M.Zanzotto University of Rome “Tor Vergata” Problem We want to build a system that recognize whether: a text T entails an Hypothesis H Systems for RTE Questions: How many possibilities do we have? What kind of knowledge do we need? Is there a baseline system?
23
© F.M.Zanzotto University of Rome “Tor Vergata” we need (Lexical Knowlegde): the equivalence win against defeat the implication win play What kind of knowledge do we need? Roma defeated MilanRoma won against Milan Roma played against MilanRoma won against Milan
24
© F.M.Zanzotto University of Rome “Tor Vergata” What kind of knowledge do we need? we need (first-order rules/rules with variables): the equivalence “X conducted Y interviews with Z” = “X interviewed Y Z” the implication “X” “more than Y” if X>Y T2T2 H2H2 “Kesslers team conducted 60,643 face-to-face interviews with adults in 14 countries” “Kesslers team interviewed more than 60,000 adults in 14 countries” T 2 H 2
25
© F.M.Zanzotto University of Rome “Tor Vergata” How do we encode this knowlegde –It depends on the level of language interpretation How do we use this knowledge –Rule based systems + threshold –Machine learnt systems How do we learn this knowledge –Supervised learning –Unsupervised/Semisupervised Learning Residual problems
26
© F.M.Zanzotto University of Rome “Tor Vergata” Textual Entailment and Language Interpretations Meaning Representation Raw Text Inference Representation Textual Entailment Local Lexical Syntactic Parse Semantic Representation Logical Forms
27
© F.M.Zanzotto University of Rome “Tor Vergata” Constituency-based Syntactic Interpretation Symbolic Langague Interpretation Models VP VBNP feed NP NNS cows NNNNS animalextracts S NP NNS Farmers
28
© F.M.Zanzotto University of Rome “Tor Vergata” Dependency-based Syntactic Interpretation Symbolic Langague Interpretation Models (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
29
© F.M.Zanzotto University of Rome “Tor Vergata” Semantic Interpretation Semantic Role Labelling (or Semantic Parse) Symbolic Langague Interpretation Models T: The government purchase of the Roanoke building, a former prison, took place in 1902. The govt. purchase… prison take placein 1902 ARG_0ARG_1ARG_2 PRED purchase The Roanoke building ARG_1 PRED (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
30
© F.M.Zanzotto University of Rome “Tor Vergata” Logical Forms Symbolic Langague Interpretation Models [Bos & Markert] The semantic representation language is a first-order fragment a language used in Discourse Representation Theory (DRS), conveying argument structure with a neo-Davidsonian analysis and Including the recursive DRS structure to cover negation, disjunction, and implication. (Dagan, Roth, Zanzotto, ACL Tutorial 2007)
31
© F.M.Zanzotto University of Rome “Tor Vergata” Rules at different levels Textual Entailment and Language Interpretations Local Lexical Syntactic Parse Semantic Representation Logical Forms Many rules corresponding to R Possibly still one rule R One rule R Many rules corresponding to R
32
© F.M.Zanzotto University of Rome “Tor Vergata” Rules (with Variables) at different levels Local Lexical Syntactic Parse Semantic Representation Logical Forms x y.win(x,y) play(x,y) win(Arg0:x,Arg1:y) play(Arg0:x,Arg1:y) X wins against Y X plays against Y X won against Y X played against Y X defeatedY X played against Y Y has been defeated by X X played againstY
33
© F.M.Zanzotto University of Rome “Tor Vergata” Rewriting systems (RS) Distance/Similarity Systems (DSS) Hybrid Systems = RS+DSS Strategies for building a RTE system
34
© F.M.Zanzotto University of Rome “Tor Vergata” Rewriting Systems Strategies for building a RTE system TH … Meaning Representation Raw Text r2r2 r1r1 r n-1 rnrn n-1 n
35
© F.M.Zanzotto University of Rome “Tor Vergata” Distance/Similarity Systems Strategies for building a RTE system TH Meaning Representation Raw Text sim( ) <t>t NOYES
36
© F.M.Zanzotto University of Rome “Tor Vergata” Hybrid Systems Strategies for building a RTE system TH Meaning Representation Raw Text jj kk n …… sim( j k ) <t>t NOYES
37
© F.M.Zanzotto University of Rome “Tor Vergata” Residual Problems How to estimate the threshold t? How to accumulate a large knowledge base of rules? Strategies for building a RTE system Supervised Machine Learning Approaches Semi-supervised Machine Learning Approaches or Knowledge Induction Methods
38
© F.M.Zanzotto University of Rome “Tor Vergata” Recognizing Textual Entailment (RTE): Problem definition Systems and Approaches for RTE Supervised Machine Learning Methods for RTE Semi-supervised Knowledge Induction for RTE Outline
39
© F.M.Zanzotto University of Rome “Tor Vergata” Quick background on Supervised Machine Learning Classifier Learner Instance Instance in a feature space xixi yiyi {(x 1,y 1 ) (x 2,y 2 ) … (x n,y n )} Training Set Learnt Model
40
© F.M.Zanzotto University of Rome “Tor Vergata” Some Machine Learning Methods exploit the distance between instances in the feature space For these machines, we can use the Kernel Trick: –define the distance K(x 1, x 2 ) –instead of defining the feautures Quick background on Supervised Machine Learning x1x1 x2x2 K(x 1,x 2 )
41
© F.M.Zanzotto University of Rome “Tor Vergata” If Recognizing Textual Entailment (RTE) is a classification task: RTE and Classification We can learn a classifier from annotated examples Problem: Defining the feature space T2T2 H2H2 “Kesslers team conducted 60,643 face-to-face interviews with adults in 14 countries” “Kesslers team interviewed more than 60,000 adults in 14 countries” T 2 H 2
42
© F.M.Zanzotto University of Rome “Tor Vergata” Hybrid Systems RTE and Classification TH Meaning Representation Raw Text jj kk n …… NOYES Classifier We can learn a classifier from annotated examples Problem: Defining the feature space
43
© F.M.Zanzotto University of Rome “Tor Vergata” Defining the feature space for RTE Classifiers Classes of models and feature spaces for sentence pairs A particular model: First-order rewrite rule feature spaces for sentence pairs RTE and Classification
44
© F.M.Zanzotto University of Rome “Tor Vergata” Defining the feature space for RTE Classifiers Classes of models and feature spaces for sentence pairs A particular model: First-order rewrite rule feature spaces for sentence pairs RTE and Classification
45
© F.M.Zanzotto University of Rome “Tor Vergata” How do we define the feature space? Possible features –“Distance Features” - Features of “some” distance between T and H –“Entailment trigger Features” –“Pair Feature” – The content of the T-H pair is represented Possible representations of the sentences –Bag-of-words (possibly with n-grams) –Syntactic representation –Semantic representation Page 45 Defining the feature space T1T1 H1H1 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid insurance companies pay dividends.” T 1 H 1
46
© F.M.Zanzotto University of Rome “Tor Vergata” Possible features –Number of words in common –Longest common subsequence –Longest common syntactic subtree –… Page 46 Similarity Features T H “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid insurance companies pay dividends.” T HT H
47
© F.M.Zanzotto University of Rome “Tor Vergata” Limits Similarity Features T1T1 H1H1 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid insurance companies pay dividends.” T 1 H 1 T1T1 H2H2 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid companies pay cash dividends.” T 1 H 2 % of H covered words = 6/7 % of H covered syntactic relations = 6/7
48
© F.M.Zanzotto University of Rome “Tor Vergata” Possible features from (de Marneffe et al., 2006) –Polarity features presence/absence of neative polarity contexts (not,no or few, without) –“Oil price surged” “Oil prices didn’t grow” –Antonymy features presence/absence of antonymous words in T and H –“Oil price is surging” “Oil prices is falling down” –Adjunct features dropping/adding of syntactic adjunct when moving from T to H –“all solid companies pay dividends” “all solid companies pay cash dividends” –… Page 48 Entailment Triggers
49
© F.M.Zanzotto University of Rome “Tor Vergata” Possible features –Bag-of-word spaces of T and H –Syntactic spaces of T and H Page 49 Pair Features T H “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid insurance companies pay dividends.” T HT H end_Tyear_Tsolid_Tcompanies_Tpay_Tdividends_T …… end_Hyear_Hsolid_Hcompanies_Hpay_Hdividends_H …… insurance_H TH
50
© F.M.Zanzotto University of Rome “Tor Vergata” Bag-of-word spaces of T and H –We can learn: T implies H as when T contains “end”… T does not imply H when H contains “end”… Pair Features: what can we learn? end_Tyear_Tsolid_Tcompanies_Tpay_Tdividends_T …… end_Hyear_Hsolid_Hcompanies_Hpay_Hdividends_H …… insurance_H TH It seems to be totally irrelevant!!!
51
© F.M.Zanzotto University of Rome “Tor Vergata” Defining the feature space for RTE Classifiers Classes of models and feature spaces for sentence pairs A particular model: First-order rewrite rule feature spaces for sentence pairs RTE and Classification
52
© F.M.Zanzotto University of Rome “Tor Vergata” For example, in textual entailment… Motivation T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” P 1 : T 1 H 1 T2T2 H2H2 “They feed dolphins fishs” “Fishs eat dolphins” P 2 : T 2 H 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” P 3 : T 3 H 3 Training examples Classification Relevant Features feed eat X X Y Y X X Y Y First-order rules
53
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this part of the talk…
54
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this part of the tutorial…
55
© F.M.Zanzotto University of Rome “Tor Vergata” We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function K(P 1,P 2 )=|S(P 1 ) S(P 2 )| that computes how many common first-order rules are activated from P 1 and P 2 Without loss of generality, we present the problem in syntactic-first-order rule feature spaces First-order rule (FOR) feature spaces: challenges
56
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1
57
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1 VP S NP S VP 1 1 VBNP 3 3 1 1 S VP VBNP 3 3 1 1 T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” T 1 H 1 feed eat Pa=Pa= S(P a )= Adding placeholders Propagating placeholders
58
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VB eat VP VBNP feed NP NNS Babies NNS babies NN milk S NP NNS Mothers 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1 NP NN milk 2 2 2 2 2 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” T3 H3T3 H3 Pb=Pb= S(P b )= VP S NP S VP 1 1 VBNP 2 2 1 1 S VP VBNP 2 2 1 1 feed eat
59
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y VP S NP S VP 1 1 VBNP 2 2 1 1 S VP VBNP 2 2 1 1 feed eat VP S NP S VP 1 1 VBNP 3 3 1 1 S VP VBNP 3 3 1 1 feed eat K(P a,P b )=|S(P a ) S(P b )| S(P b )= S(P a )= = = =
60
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this part of the tutorial…
61
© F.M.Zanzotto University of Rome “Tor Vergata” FOR feature spaces can be modelled with particular graphs We call these graphs tripartite direct acyclic graphs (tDAGs) Observations: –tDAGs are not trees –tDAGs can be used to model both rules and sentence pairs –unifying rules in sentences is a graph matching problem –graph macthing algorithms are, in general, exponential A step back…
62
© F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1
63
© F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1
64
© F.M.Zanzotto University of Rome “Tor Vergata” Follows the formal definition of the graphs and the kernel. It will not be covered in this tutorial. Tripartite Directed Acyclic Graphs (tDAGs)
65
© F.M.Zanzotto University of Rome “Tor Vergata” S NPVP NP eat VP VB feed NP VB A tripartite directed acyclic graph is a graph G = (N,E) where: the set of nodes N is partitioned in three sets N t, N g, and A the set of edges is partitioned in four sets N t, N g, E A(t), and E A(g) where t = (N t,E t ) and g = (N t,E t ) are two trees E A(t) = {(x, y)|x N t and y A} E A(g) = {(x, y)|x N g and y A} Tripartite Directed Acyclic Graphs (tDAGs)
66
© F.M.Zanzotto University of Rome “Tor Vergata” Alternative definition A tDAG is a pair of extented trees G = ( ) where: = (N t A,E t E A(t) ) and = (N g A,E g E A(g) ). Tripartite Directed Acyclic Graphs (tDAGs) S NPVP NP eat VP VB feed NP VB S NPVP NP eat VP VB feed NP VB X X Y Y X X Y Y
67
© F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this part of the tutorial…
68
© F.M.Zanzotto University of Rome “Tor Vergata” Computing the implicit kernel function K(P 1,P 2 )=|S(P 1 ) S(P 2 )| involves general graph matching. This is an exponential problem. Yet… tDAGs are particular graphs and we can define an efficient algorithm Again challenges Program Analyzing the isomorphism among tDAGs Deriving an algorithm for K(P 1,P 2 )
69
© F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between graphs G 1 =(N 1,E 1 ) and G 2 =(N 2,E 2 ) are isomorphic if: –|N 1 |=|N 2 | and |E 1 |=|E 2 | –Among all the bijecive functions relating N 1 and N 2, it exists f : N 1 N 2 such that: for each n 1 in N 1, Label(n 1 )=Label(f(n 1 )) for each (n a,n b ) in E 1, (f(n a ),f(n b )) is in E 2 Isomorphism between tDAGs
70
© F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism adapted to tDAGs G 1 = ( 1 1 ) and G 2 = ( 2 2 ) are isomorphic if these two properties hold –Partial isomorphism and are isomorphic and are isomorphic This property generates two functions f and f –Constraint compatibility f and f are compatible on the sets of nodes A 1 and A 2, if for each n A 1, it happens that f (n) = f (n). Isomorphism between tDAGs
71
© F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between tDAGs VP VBNP 3 3 1 1 S VP VBNP 3 3 1 1 VP VBNP 2 2 1 1 S VP VBNP 2 2 1 1 C=C= C = C 1 1 1 1 {), 3 3 2 2 (),(}, C=C= 1 1 1 1 {), 3 3 2 2 (),(}, Partial isomorphism Constraint compatibility P a =( a a )= P b =( b b )=
72
© F.M.Zanzotto University of Rome “Tor Vergata” We define K(P 1,P 2 )=|S(P 1 ) S(P 2 )| using the isomorphism between tDAGs The idea: reverse the order of isomorphism detection First, constraint compatibility –Building a set C of all the relevant alternative constraints –Finding subsets of S(P 1 ) S(P 2 ) meeting a constraint c C Second, partial isomorphism detection Ideas for building the kernel subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
73
© F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2 C={c 1,c 2 }={ 1 1 1 1 {), 2 2 2 2 (),(},, 1 1 1 1 {), 2 2 3 3 (),(}, } K(P a,P b )=|S(P a ) S(P b )| P a =( a a )= P b =( b b )= subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
74
© F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2 1 1 1 1 {), 2 2 2 2 (),(},c1=c1= A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN 1 1 1 1 1 1 A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1 C={c 1,c 2 } S(P a ) S(P b )) c1 = Pa=Pa= Pb=Pb= subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility K(P a,P b )=|S(P a ) S(P b )| K(P a,P b )=|S(P a ) S(P b )|=|(S(P a ) S(P b )) c1 (S(P a ) S(P b )) c2 |
75
© F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2 1 1 1 1 {), 2 2 3 3 (),(},c2=c2= A BC 1 1 1 1 CC 2 2 1 1 1 1 I MN M 1 1 1 1 M 1 1 1 1 2 2 A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 CC 2 2 1 1 1 1 I MN 1 1 1 1 1 1 A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1 C={c 1,c 2 } K(P a,P b )=|S(P a ) S(P b )|=|(S(P a ) S(P b )) c1 (S(P a ) S(P b )) c2 | Pa=Pa= Pb=Pb= S(P a ) S(P b )) c2 = subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
76
© F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN 1 1 1 1 1 1 A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1 A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2 A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1 (S(P a ) S(P b )) c1 =(S( a ) S( b )) c1 S( a ) S( b )) c1 K(P a,P b )=| c C (S(P a ) S(P b )) c |=| c C (S( a ) S( b )) c (S( a ) S( b )) c | subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
77
© F.M.Zanzotto University of Rome “Tor Vergata” The general Equation can be computed using: 1)K S (kernel function for trees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) 2)The inclusion exclusion principle Kernel on FOR feature spaces K(P 1,P 2 )=| c C (S( 1 ) S( 2 )) c (S( 1 ) S( 2 )) c |
78
© F.M.Zanzotto University of Rome “Tor Vergata” Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) Test-bed: corpus –Recognizing Textual Entailment challenge data Computational Efficency Analysis
79
© F.M.Zanzotto University of Rome “Tor Vergata” Computational Efficency Analysis Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders
80
© F.M.Zanzotto University of Rome “Tor Vergata” Training: RTE 1, 2, 3 Testing: RTE 4 Accuracy Comparison
81
© F.M.Zanzotto University of Rome “Tor Vergata” Yet, the lexicalized rule part of the first-order rule feature space is sparse Then, we starting to look on how to include Distributional semantics in our kernels (Mehdad, Moschitti, Zanzotto, NAACL 2010) Our strategy so far… feed eat X X Y Y X X Y Y
82
© F.M.Zanzotto University of Rome “Tor Vergata” Recognizing Textual Entailment (RTE): Problem definition Systems and Approaches for RTE Supervised Machine Learning Methods for RTE Semi-supervised Knowledge Induction for RTE Outline
83
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge Learning Lexical Knowledge or Rules Acquisition of Implicit Knowledge Acquiring Corpora for Supervised Machine Learning Models Semi-supervised Knowledge Induction
84
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge Learning Lexical Knowledge or Rules Acquisition of Implicit Knowledge Acquiring Corpora for Supervised Machine Learning Models Semi-supervised Knowledge Induction
85
© F.M.Zanzotto University of Rome “Tor Vergata” Acquistion of Explicit Knowledge The questions we need to answer What? –What we want to learn? Which resources do we need? Using what? –Which are the principles we have? How? –How do we organize the “knowledge acquisition” algorithm
86
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge: what? Types of knowledge Equivalence –Co-hyponymy Between words: cat dog –Synonymy Between words: buy acquire Sentence prototypes (paraphrasing) : X bought Y X acquired Z% of the Y’s shares Oriented semantic relations Words: cat animal, buy own, wheel partof car Sentence prototypes : X acquired Z% of the Y’s shares X owns Y
87
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge : Using what? Underlying hypothesis Harris’ Distributional Hypothesis (DH) (Harris, 1964) “Words that tend to occur in the same contexts tend to have similar meanings.” Robison’s Point-wise Assertion Patterns (PAP) (Robison, 1970) “It is possible to extract relevant semantic relations with some pattern.” sim(w 1,w 2 ) sim(C(w 1 ), C(w 2 )) w 1 is in a relation r with w 2 if the context pattern(w 1, w 2 )
88
© F.M.Zanzotto University of Rome “Tor Vergata” Page 88 Words or Forms Context (Feature) Space sim w (W 1,W 2 ) sim ctx (C(W 1 ), C(W 2 )) w 1 = constitute w 2 = compose C(w 1 ) C(w 2 ) Distributional Hypothesis (DH) Corpus: source of contexts … sun is constituted of hydrogen … …The Sun is composed of hydrogen …
89
© F.M.Zanzotto University of Rome “Tor Vergata” Point-wise Assertion Patterns (PAP) w 1 is in a relation r with w 2 if the contexts patterns r (w 1, w 2 ) relation w 1 part_of w 2 patterns “ w 1 is constituted of w 2 ” “ w 1 is composed of w 2 ” Corpus: source of contexts … sun is constituted of hydrogen … …The Sun is composed of hydrogen … part_of(sun,hydrogen) selects correct vs incorrect relations among words Statistical Indicator S corpus (w 1,w 2 )
90
© F.M.Zanzotto University of Rome “Tor Vergata” Words or Forms Context (Feature) Space w 1 = constitute w 2 = compose C(w 1 ) C(w 2 ) DH and PAP cooperate Corpus: source of contexts … sun is constituted of hydrogen … …The Sun is composed of hydrogen … Distributional Hypothesis Point-wise assertion Patterns
91
© F.M.Zanzotto University of Rome “Tor Vergata” Knowledge Aquisition: Where methods differ? On the “word” side Target equivalence classes: Concepts or Relations Target forms: words or expressions On the “context” side Feature Space Similarity function Words or Forms Context (Feature) Space w 1 = cat w 2 = dog C(w 1 ) C(w 2 )
92
© F.M.Zanzotto University of Rome “Tor Vergata” KA4TE: a first classification of some methods Types of knowledge Underlying hypothesis Distributional Hypothesis Point-wise assertion Patterns Equivalence Oriented relations ISA patterns (Hearst, 1992) Verb Entailment (Zanzotto et al., 2006) Concept Learning (Lin&Pantel, 2001a) Inference Rules (DIRT) (Lin&Pantel, 2001b) Relation Pattern Learning (ESPRESSO) (Pantel&Pennacchiotti, 2006) Hearst ESPRESSO (Pantel&Pennacchiotti, 2006) Noun Entailment (Geffet&Dagan, 2005)
93
© F.M.Zanzotto University of Rome “Tor Vergata” Noun Entailment Relation Type of knowledge: oriented relations Underlying hypothesis: distributional hypothesis Main Idea: distributional inclusion hypothesis (Geffet&Dagan, 2006) w 1 w 2 if All the prominent features of w 1 occur with w 2 in a sufficiently large corpus Words or Forms Context (Feature) Space ++++ ++ + ++ w1w1 w2w2 C(w 1 ) C(w 2 ) w1 w2w1 w2 I(C(w 2 )) I(C(w 1 ))
94
© F.M.Zanzotto University of Rome “Tor Vergata” Verb Entailment Relations Type of knowledge: oriented relations Underlying hypothesis: point-wise assertion patterns Main Idea: win play ? player wins ! (Zanzotto, Pennacchiotti, Pazienza, 2006) relation v1 v2v1 v2 patterns “agentive_nominalizatio n( v 2 ) v 1 ” Point-wise Mutual information Statistical Indicator S (v 1,v 2 ) Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using selectional preferences, Coling-ACL, 2006
95
© F.M.Zanzotto University of Rome “Tor Vergata” Verb Entailment Relations Understanding the idea Selectional restriction fly(x) has_wings(x) in general v(x) c(x) (if x is the subject of v then x has the property c) Agentive nominalization “agentive noun is the doer or the performer of an action v’” “X is player” may be read as play(x) c(x) is clearly v’(x) if the property c is derived by v’ with an agentive nominalization (Zanzotto, Pennacchiotti, Pazienza, 2006) Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using selectional preferences, Coling-ACL, 2006
96
© F.M.Zanzotto University of Rome “Tor Vergata” Verb Entailment Relations Understanding the idea Given the expression player wins Seen as a selctional restriction win(x) play(x) Seen as a selectional preference P(play(x)|win(x)) > P(play(x)) Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using selectional preferences, Coling-ACL, 2006
97
© F.M.Zanzotto University of Rome “Tor Vergata” Knowledge Acquisition for TE: How? The algorithmic nature of a DH+PAP method Direct –Starting point: target words Indirect –Starting point: context feature space Iterative –Interplay between the context feature space and the target words
98
© F.M.Zanzotto University of Rome “Tor Vergata” Words or Forms Context (Feature) Space sim(w 1,w 2 ) sim(C(w 1 ), C(w 2 )) w 1 = cat w 2 = dog C(w 1 ) C(w 2 ) Direct Algorithm sim(w 1, w 2 ) I(C(w 1 )) I(C(w 2 )) sim(I(C(w 1 )), I(C(w 2 ))) sim(w 1,w 2 ) sim(I(C(w 1 )), I(C(w 2 ))) 1.Select target words w i from the corpus or from a dictionary 2.Retrieve contexts of each w i and represent them in the feature space C(w i ) 3.For each pair (w i, w j ) 1.Compute the similarity sim(C(w i ), C(w j )) in the context space 2.If sim(w i, w j )= sim(C(w i ), C(w j ))> w i and w j belong to the same equivalence class W sim(C(w 1 ), C(w 2 ))
99
© F.M.Zanzotto University of Rome “Tor Vergata” Page 99 1.Given an equivalence class W, select relevant contexts and represent them in the feature space 2.Retrieve target words (w 1, …, w n ) that appear in these contexts. These are likely to be words in the equivalence class W 3.Eventually, for each w i, retrieve C(w iI ) from the corpus 4.Compute the centroid I(C(W)) 5.For each for each w i, if sim(I(C(W), w i )<t, eliminate w i from W. Words or Forms Context (Feature) Space sim(w 1,w 2 ) sim(C(w 1 ), C(w 2 )) w 1 = cat w 2 = dog C(w 1 ) Indirect Algorithm C(w 2 ) sim(w 1, w 2 ) sim(w 1,w 2 ) sim(I(C(w 1 )), I(C(w 2 ))) sim(C(w 1 ), C(w 2 ))
100
© F.M.Zanzotto University of Rome “Tor Vergata” Page 100 1.For each word w i in the equivalence class W, retrieve the C(w i ) contexts and represent them in the feature space 2.Extract words w j that have contexts similar to C(w i ) 3.Extract contexts C(w j ) of these new words 4.For each for each new word w j, if sim(C(W), w j )> , put w j in W. Words or Forms Context (Feature) Space sim(w 1,w 2 ) sim(C(w 1 ), C(w 2 )) w 1 = cat w 2 = dog C(w 1 ) Iterative Algorithm C(w 2 ) sim(C(w 1 ), C(w 2 )) sim(w 1, w 2 ) sim(w 1,w 2 ) sim(I(C(w 1 )), I(C(w 2 )))
101
© F.M.Zanzotto University of Rome “Tor Vergata” Knowlege Acquisition using DH and PAH Direct Algorithms –Concepts from text via clustering (Lin&Pantel, 2001) –Inference rules – aka DIRT (Lin&Pantel, 2001) –… Indirect Algorithms –Hearst’s ISA patterns (Hearst, 1992) –Question Answering patterns (Ravichandran&Hovy, 2002) –… Iterative Algorithms –Entailment rules from Web – aka TEASE (Szepktor et al., 2004) –Espresso (Pantel&Pennacchiotti, 2006) –…
102
© F.M.Zanzotto University of Rome “Tor Vergata” TEASE Type: Iterative algorithm On the “word” side Target equivalence classes: fine-grained relations Target forms: verb with arguments On the “context” side Feature Space prevent(X,Y) X_{filler}:mi?,Y_{filler}:mi? call indictable subj obj mod X Y finally mod (Szepktor et al., 2004) Idan Szpektor, Hristo Tanev, Ido Dagan and Bonaventura Coppola. 2004. Scaling Web-based Acquisition of Entailment Relations. In Proceedings of EMNLP 2004.
103
© F.M.Zanzotto University of Rome “Tor Vergata” TEASE WEB Lexicon Input template: X subj -accuse- obj Y Sample corpus for input template: Paula Jones accused Clinton… BBC accused Blair… Sanhedrin accused St.Paul… … Anchor sets: {Paula Jones subj; Clinton obj} {Sanhedrin subj; St.Paul obj} … Sample corpus for anchor sets: Paula Jones called Clinton indictable… St.Paul defended before the Sanhedrin … TEASE Anchor Set Extraction (ASE) Template Extraction (TE) iterate (Szepktor et al., 2004) Idan Szpektor, Hristo Tanev, Ido Dagan and Bonaventura Coppola. 2004. Scaling Web-based Acquisition of Entailment Relations. In Proceedings of EMNLP 2004. Templates: X call Y indictable Y defend before X …
104
© F.M.Zanzotto University of Rome “Tor Vergata” TEASE Innovations with respect to reasearches < 2004 First direct algorithm for extracting rules A feature selection is done to assess the most informative features Extracted forms are clustered to obtain the most general sentence prototype of a given set of equivalent forms (Szepktor et al., 2004) call {1} indictable {1} subj {1} obj {1} mod {1} X {1} Y {1} harassment {1} for {1} S1:S1: call {2} indictable {2} subj {2} obj {2} mod {2} X {2} Y {2} S2:S2: finally {2} mod {2} call {1,2} indictable {1,2} subj {1,2} obj {1,2} mod {1,2} X {1,2} Y {1,2} harassment {1} for {1} finally {2} mod {2} Idan Szpektor, Hristo Tanev, Ido Dagan and Bonaventura Coppola. 2004. Scaling Web-based Acquisition of Entailment Relations. In Proceedings of EMNLP 2004.
105
© F.M.Zanzotto University of Rome “Tor Vergata” Espresso Type: Iterative algorithm On the “word” side Target equivalence classes: relations Target forms: expressions, sequences of tokens Y is composed by X, Y is made of X compose(X,Y) (Pantel&Pennacchiotti, 2006) Patrick Pantel, Marco Pennacchiotti. Espresso: A Bootstrapping Algorithm for Automatically Harvesting Semantic Relations. In Proceedings of COLING/ACL-06, 2006
106
© F.M.Zanzotto University of Rome “Tor Vergata” Espresso (leader, panel) (city, region) (oxygen, water) Y is composed by X X, Y Y is part of Y 1.0 Y is composed by X 0.8 Y is part of X 0.2 X, Y (tree, land) (oxygen, hydrogen) (atom, molecule) (leader, panel) (range of information, FBI report) (artifact, exhibit) … 1.0 (tree, land) 0.9 (atom, molecule) 0.7 (leader, panel) 0.6 (range of information, FBI report) 0.6 (artifact, exhibit) 0.2 (oxygen, hydrogen) (Pantel&Pennacchiotti, 2006) Patrick Pantel, Marco Pennacchiotti. Espresso: A Bootstrapping Algorithm for Automatically Harvesting Semantic Relations. In Proceedings of COLING/ACL-06, 2006
107
© F.M.Zanzotto University of Rome “Tor Vergata” Espresso Innovations with respect to reasearches < 2006 A measure to determine specific vs. general patterns (ranking in the equivalent forms) Both pattern and instance selections are performed Differnt Use of General and specific patterns in the iterative algorithm (Pantel&Pennacchiotti, 2006) 1.0 Y is composed by X 0.8 Y is part of X 0.2 X, Y Patrick Pantel, Marco Pennacchiotti. Espresso: A Bootstrapping Algorithm for Automatically Harvesting Semantic Relations. In Proceedings of COLING/ACL-06, 2006
108
© F.M.Zanzotto University of Rome “Tor Vergata” Structure & Lexico-Syntactic Patterns Observation Distributional Models (DH) Lexico-Syntactic Pattern Models (LSP) Target RelationsHyperonymy (IS_A) Cotopy (Similarity) Use of structural properties Transitivity is implicitly exploited Target RelationsAll possible semantic relations Use of structural properties Transitivity is NOT exploited Fallucchi, F. & Zanzotto, F. M. Inductive Probabilistic Taxonomy Learning using Singular Value Decomposition, NATURAL LANGUAGE ENGINEERING, 2011
109
© F.M.Zanzotto University of Rome “Tor Vergata” Structure & Lexico-Syntactic Patterns Target RelationsAll possible semantic relations Use of structural properties Transitivity is effectively exploited Exploiting Transitivity within Lexico-Syntactic Pattern Models we exploit structural properties of target relations to determine the probability we focus on the transitivity to reinforce or lower the probability Fallucchi, F. & Zanzotto, F. M. Inductive Probabilistic Taxonomy Learning using Singular Value Decomposition, NATURAL LANGUAGE ENGINEERING, 2011
110
© F.M.Zanzotto University of Rome “Tor Vergata” Structure & Lexico-Syntactic Patterns cat mammal animal Direct Probabilities for Corpus Observation (E) with Lexico-Syntactic Patterns Induced Probabilities 0.648 0.7 0.8 0.2 isa relation Fallucchi, F. & Zanzotto, F. M. Inductive Probabilistic Taxonomy Learning using Singular Value Decomposition, NATURAL LANGUAGE ENGINEERING, 2011
111
© F.M.Zanzotto University of Rome “Tor Vergata” Structure & Lexico-Syntactic Patterns lettuce (i) animal (k 2 ) food (j) vegetable (k 1 ) Fallucchi, F. & Zanzotto, F. M. Inductive Probabilistic Taxonomy Learning using Singular Value Decomposition, NATURAL LANGUAGE ENGINEERING, 2011
112
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge Learning Lexical Knowledge or Rules Acquisition of Implicit Knowledge Acquiring Corpora for Supervised Machine Learning Models Semi-supervised Knowledge Induction
113
© F.M.Zanzotto University of Rome “Tor Vergata” Acquistion of Implicit Knowledge The questions we need to answer What? –What we want to learn? Which resources do we need? Using what? –Which are the principles we have?
114
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge: what? Types of knowledge Equivalence –Nearly Synonymy between sentences Acme Inc. bought Goofy ltd. Acme Inc. acquired 11% of the Goofy ltd.’s shares Oriented semantic relations –Entailment between sentences Acme Inc. acquired 11% of the Goofy ltd.’s shares Acme Inc. owns Goofy ltd. Note: ALSO TRICKY NOT-ENTAILMENT ARE RELEVANT
115
© F.M.Zanzotto University of Rome “Tor Vergata” Acquisition of Explicit Knowledge : Using what? Underlying hypothesis Structural and content similarity “Sentences are similar if they share enough content” A revised Point-wise Assertion Patterns “Some patterns of sentences reveal relations among sentences” sim(s 1,s 2 ) according to relations from s 1 and s 2
116
© F.M.Zanzotto University of Rome “Tor Vergata” A first classification of some methods Types of knowledge Underlying hypothesis Structural and content similarity Revised Point-wise assertion Patterns Equivalence Oriented relations Relations among sentences (Hickl et al., 2006) Paraphrase Corpus (Dolan&Quirk, 2004) entails not entails Relations among sentences (Burger&Ferro, 2005) Wikipedia Revisions (Zanzotto&Pennacchiotti, 2010)
117
© F.M.Zanzotto University of Rome “Tor Vergata” Entailment relations among sentences Type of knowledge: oriented relations (entailment) Underlying hypothesis: revised point-wise assertion patterns Main Idea: in headline news items, the first sentence/paragraph generally entails the title (Burger&Ferro, 2005) relation s2 s1s2 s1 patterns “News Item Title(s 1 ) First_Sentence(s 2 )” This pattern works on the structure of the text
118
© F.M.Zanzotto University of Rome “Tor Vergata” Entailment relations among sentences examples from the web New York Plan for DNA Data in Most Crimes Eliot Spitzer is proposing a major expansion of New York’s database of DNA samples to include people convicted of most crimes, while making it easier for prisoners to use DNA to try to establish their innocence. … Title Body Chrysler Group to Be Sold for $7.4 Billion DaimlerChrysler confirmed today that it would sell a controlling interest in its struggling Chrysler Group to Cerberus Capital Management of New York, a private equity firm that specializes in restructuring troubled companies. … Title Body
119
© F.M.Zanzotto University of Rome “Tor Vergata” Tricky Not-Entailment relations among sentences Type of knowledge: oriented relations (tricky not- entailment) Underlying hypothesis: revised point-wise assertion patterns Main Idea: –in a text, sentences with a same name entity generally do not entails each other –Sentences connected by “on the contrary”, “but”, … do not entail each other (Hickl et al., 2006) relation s1 s2s1 s2 patterns s 1 and s 2 are in the same text and share at least a named entity “s 1. On the contrary, s 2 ”
120
© F.M.Zanzotto University of Rome “Tor Vergata” Tricky Not-Entailment relations among sentences examples from (Hickl et al., 2006) One player losing a close friend is Japanese pitcher Hideki Irabu, who was befriended by Wells during spring training last year. Irabu said he would take Wells out to dinner when the Yankees visit Toronto. T H According to the professor, present methods of cleaning up oil slicks are extremely costly and are never completely efficient. T HIn contrast, he stressed, Clean Mag has a 100 percent pollution retrieval rate, is low cost and can be recycled.
121
© F.M.Zanzotto University of Rome “Tor Vergata” Wikipedia for Extracting Examples Wikipedia : open encyclopedia, where every person can behave as an author, inserting new entries or modifying existing ones. Extracting pairs of sentences from Wikipedia revision system HYPOTHESIS Given an original entry S1 a piece of text in Wikipedia before it is modified by an author, and the revision S2 the modified text: (S1, S2) extracted from the Wikipedia revision database, represent good candidate of both positive and negative entailment pairs (T,H). (Zanzotto&Pennacchiotti, 2010)
122
© F.M.Zanzotto University of Rome “Tor Vergata” Wikipedia for Extracting Examples Type of knowledge: oriented relations (tricky not- entailment) Underlying hypothesis: revised point-wise assertion patterns Main Idea: (Zanzotto&Pennacchiotti, 2010)
123
© F.M.Zanzotto University of Rome “Tor Vergata” Wikipedia for Extracting Examples Here an example (Zanzotto&Pennacchiotti, 2010)
124
© F.M.Zanzotto University of Rome “Tor Vergata” Nice properties of Wikipedia revisions Wikipedia revisions are ideal for co-training: given a pair entry–revision (S1, S2), we can define two independent views: content-pair view : features modeling the actual textual content (S1, S2). comment view : features regarding the comment inserted by the author of the revision S2 (usually, the reason and the explanation of the changes he wrote). (Zanzotto&Pennacchiotti, 2010)
125
© F.M.Zanzotto University of Rome “Tor Vergata” Recognizing Textual Entailment (RTE): Problem definition Systems and Approaches for RTE Supervised Machine Learning Methods for RTE Semi-supervised Knowledge Induction for RTE What we have seen
126
© F.M.Zanzotto University of Rome “Tor Vergata” Current RTE Challenge http://www.cs.york.ac.uk/semeval-2013/task7/ Textual Entailment Resource Pool http://aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool Book on Recognizing Textual Entailment I. Dagan, D. Roth, M. Sommons, F.M.Zanzotto, Recognizing Textual Entailment: Models and Applications, Morgan&Claypool Publishers (forthcoming) RTE Resources
127
© F.M.Zanzotto University of Rome “Tor Vergata” Initial Idea Zanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross- pair similarities, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006 First refinement of the algorithm Moschitti, A. & Zanzotto, F. M. Fast and Effective Kernels for Relational Learning from Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007 Analysis of different feature spaces Pennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment, Poceeding of International Conference RANLP - 2007, 2007 A comprehensive description Zanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to Textual Entailment Recognition, NATURAL LANGUAGE ENGINEERING, 2009 Learning RTE Systems on Rule Spaces
128
© F.M.Zanzotto University of Rome “Tor Vergata” Adding Distributional Semantics Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010 A valid kernel with an efficient algorithm Zanzotto, F. M. & Dell'Arciprete, L. Efficient kernels for sentence pair classification, Conference on Empirical Methods on Natural Language Processing, 2009 Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment Recognition, FUNDAMENTA INFORMATICAE Applications Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter, Proceedings of 2011 Conference on Empirical Methods on Natural Language Processing (EmNLP), 2011 Extracting RTE Corpora Zanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co- training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010 Learning Verb Relations Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics Learning RTE Systems on Rule Spaces
129
© F.M.Zanzotto University of Rome “Tor Vergata” References [1] Rod Adams. Textual entailment through extended lexical overlap. In Proceedingsof the Second PASCAL Challenges Workshop on Recognizing Textual Entailment,2006. [2] E. Akhmatova. Textual entailment resolution via atomic propositions. In Proceed-ings of RTE 2005, 2005. [3] R. Bar-Haim, J. Berant, I. Dagan, I. Greenthal, S. Mirkin, E. Shnarch, andI. Szpektor. Efficient semantic deduction and approximate matching over compactparse forests. In Text Analysis Conference (TAC), 2009. [4] Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo, BernardoMagnini, and Idan Szpektor. The second pascal recognising textual entailmentchallenge. In Proceedings of the Second PASCAL Challenges Workshop on Recog-nising Textual Entailment. Venice, Italy, 2006. [5] Roy Bar-Haim, Ido Dagan, Iddo Greental, and Eyal Shnarch. Semantic inferenceat the lexical-syntactic level. In Proceedings of the 22nd National Conference onArtificial Intelligence (AAAI), Vancouver, Canada, July 2007. [6] Roy Bar-haim, Ido Dagan, Iddo Greental, Idan Szpektor, and Moshe Friedman.Semantic inference at the lexical-syntactic level. In Proceedings of AAAI, pages131-136, 2007. [7] Roy Bar-Haim, Ido Dagan, Iddo Greental, Idan Szpektor, and Moshe Friedman.Semantic inference at the lexical-syntactic level for textual entailment recognition.In Proceedings of the ACL-PASCAL Workshop on Textual Entailment andParaphrasing, pages 131-136, Prague, June 2007. Association for ComputationalLinguistics. [8] Roy Bar-Haim, Idan Szpecktor, and Oren Glickman. Definition and analysis ofintermediate entailment levels. In Proceedings of the ACL Workshop on EmpiricalModeling of Semantic Equivalence and Entailment, pages 55-60. Association forComputational Linguistics, Ann Arbor, Michigan, June 2005. [9] Regina Barzilay and Kathleen McKeown. Extracting paraphrases from a parallelcorpus. In Proceedings of the 39th ACL Meeting. Toulouse, France, 2001. [10] Samuel Bayer, John Burger, Lisa Ferro, John Henderson, and Alexander Yeh.Mitre's submissions to the eu pascal rte challenge. In Proceedings of RTE 2005,160 Mesh Refinement for Time-Domain Numerical Electromagnetics2005. [11] Roni Ben Aharon, Idan Szpektor, and Ido Dagan. Generating entailment rulesfrom framenet. In Proceedings of the ACL 2010 Conference Short Papers, pages241-246, Uppsala, Sweden, July 2010. Association for Computational Linguistics. [12] Luisa Bentivogli, Elena Cabrio, Ido Dagan, Danilo Giampiccolo, Medea Lo Leggio,and Bernardo Magnini. Building textual entailment specialized data sets: amethodology for isolating linguistic phenomena relevant to inference. In NicolettaCalzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, SteliosPiperidis, Mike Rosner, and Daniel Tapias, editors, LREC. European LanguageResources Association, 2010. [13] Luisa Bentivogli, Elena Cabrio, Ido Dagan, Danilo Giampiccolo, Medea Lo Leggio,and Bernardo Magnini. Building textual entailment specialized data sets: amethodology for isolating linguistic phenomena relevant to inference. In NicolettaCalzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani,Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedingsof the Seventh conference on International Language Resources and Evaluation(LREC'10), Valletta, Malta, may 2010. European Language Resources Association(ELRA). [14] Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa T. Dang, and Danilo Giampiccolo.The sixth PASCAL recognizing textual entailment challenge. In The Text AnalysisConference (TAC 2010), 2010. [15] Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa T. Dang, and Danilo Giampiccolo.The seventh PASCAL recognizing textual entailment challenge. In The TextAnalysis Conference (TAC 2011), to appear, 2011.
130
© F.M.Zanzotto University of Rome “Tor Vergata” References [16] Luisa Bentivogli, Ido Dagan, Hoa T. Dang, Danilo Giampiccolo, and BernardoMagnini. The fifth PASCAL recognizing textual entailment challenge. In The TextAnalysis Conference (TAC 2009), 2009. [17] Jonathan Berant, Ido Dagan, and Jacob Goldberger. Global learning of focusedentailment graphs. In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics, pages 1220-1229, Uppsala, Sweden, July 2010. Associationfor Computational Linguistics. [18] Richard Bergmair. A proposal on evaluation measures for rte. In Proceedings ofthe 2009 Workshop on Applied Textual Inference, pages 10-17, Suntec, Singapore,August 2009. Association for Computational Linguistics. [19] C. M. Bishop. Neural networks for pattern recognition. Oxford University Press,Oxford, UK, 1996. [20] Johan Bos and Katja Markert. When logical inference helps determining textual entailment (and when it doesn't). In Proceedings of the Second PASCAL Chal-lenges Workshop on Recognizing Textual Entailment, 2006. [21] R. Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference modelfor semantic entailment in natural language. In Proceedings of the National Con-ference on Artificial Intelligence (AAAI), pages 1678-1679, 2005. [22] C. Brockett. Aligning the rte 2006 corpus. Technical Report MSR-TR-2007-77,Microsoft Research, 2007. [23] John Burger and Lisa Ferro. Generating an entailment corpus from news headlines.In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equiv-alence and Entailment, pages 49-54. Association for Computational Linguistics,Ann Arbor, Michigan, June 2005. [24] Chris Callison-Burch, Philipp Koehn, and Miles Osborne. Improved statisticalmachine translation using paraphrases. In Proceedings of the Human LanguageTechnology Conference of the NAACL, Main Conference, pages 17-24, New YorkCity, USA, June 2006. Association for Computational Linguistics. [25] Jean Carletta. Assessing agreement on classification tasks: The kappa statistic.Computational Linguistics, 22(2):249-254, 1996. [26] Bob Carpenter. The Logic of Typed Feature Structures. Cambridge UniversityPress, Cambridge, England, 1992. [27] Xavier Carreras and Lluìs Marquez. Introduction to the CoNLL-2005 Shared Task:Semantic Role Labeling. In Proceedings of the Ninth Conference on ComputationalNatural Language Learning (CoNLL-2005), pages 152-164, Ann Arbor, Michigan,June 2005. Association for Computational Linguistics. [28] Asli Celikyilmaz, Marcus Thint, and Zhiheng Huang. A graph-based semisupervisedlearning for question-answering. In Proc. of the Annual Meeting ofthe ACL, pages 719-727, Suntec, Singapore, August 2009. Association for ComputationalLinguistics. [29] M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Discriminative learning overconstrained latent representations. In Proc. of the Annual Meeting of the NorthAmerican Association of Computational Linguistics (NAACL), Jun 2010. [30] M. Chang, L. Ratinov, and D. Roth. Constraints as prior knowledge. In ICMLWorkshop on Prior Knowledge for Text and Language Processing, pages 32-39,July 2008.
131
© F.M.Zanzotto University of Rome “Tor Vergata” References [31] M. Chang, V. Srikumar, D. Goldwasser, and D. Roth. Structured output learningwith indirect supervision. In Proc. of the International Conference on MachineLearning (ICML), 2010. [32] Ming-Wei Chang, Dan Goldwasser, Dan Roth, and Vivek Srikumar. Discriminative162 Mesh Refinement for Time-Domain Numerical Electromagneticslearning over constrained latent representations. In Proceedings of HLT: NAACL,pages 429-437, 2010. [33] E. Charniak. A maximum-entropy-inspired parser. In Proceedings of NAACL 2000,pages 132{139, Seattle, Washington, 2000. [34] Gennaro Chierchia and Sally McConnell-Ginet. Meaning and Grammar: An in-troduction to Semantics. MIT press, Cambridge, MA, 2001. [35] Timothy Chklovski and Patrick Pantel. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In Proceedings of Conference on EmpiricalMethods in Natural Language Processing (EMNLP-04), pages 33{40, 2004. [36] Timoty Chklovski and Patrick Pantel. VerbOCEAN: Mining the web for finegrainedsemantic verb relations. In Proceedings of the 2004 Conference on Empir-ical Methods in Natural Language Processing. Barcellona, Spain, 2004. [37] Naom Chomsky. Aspect of Syntax Theory. MIT Press, Cambridge, Massachussetts,1957. [38] Kenneth Ward Church and Patrick Hanks. Word association norms, mutual informationand lexicography. In Proceedings of the 27th Annual Meeting of theAssociation for Computational Linguistics (ACL). Vancouver, Canada, 1989. [39] Philipp Cimiano, Andreas Hotho, and Steffen Staab. Learning concept hierarchiesfrom text corpora using formal concept analysis. Journal of Artificial Intelligenceresearch, 24:305-339, 2005. [40] P. Clark, W. R. Murray, J. Thompson, P. Harrison, J. Hobbs, and C. Fellbaum.On the role of lexical and world knowledge in RTE- 3. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 54-59, 2007. [41] Peter Clark and Phil Harrison. An Inference-Based Approach to Recognizing Entailment.In Text Analysis Conference (TAC), pages 63-72, 2009. [42] Peter Clark and Phil Harrison. An inference-based approach to recognizing entailment.In Notebook papers and Results, Text Analysis Conference (TAC), pages63-72, 2009. [43] Peter Clark, Phil Harrison, John Thompson, William Murray, Jerry Hobbs, andChristiane Fellbaum. On the role of lexical and world knowledge in rte3. In Pro-ceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing,pages 54-59, Prague, June 2007. Association for Computational Linguistics. [44] Michael Collins and Nigel Duffy. New ranking algorithms for parsing and tagging:Kernels over discrete structures, and the voted perceptron. In Proceedings ofACL02. 2002. [45] Robin Cooper, Dick Crouch, Jan Van Eijck, Chris Fox, Johan Van Genabith, JanJaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, and Steve Pulman. Using the framework. Technical report, 1996.
132
© F.M.Zanzotto University of Rome “Tor Vergata” References [46] Courtney Corley and Rada Mihalcea. Measuring the semantic similarity of texts.In Proc. of the ACL Workshop on Empirical Modeling of Semantic Equivalence andEntailment, pages 13-18. Association for Computational Linguistics, Ann Arbor,Michigan, June 2005. [47] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:1-25,1995. [48] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Ma-chines and Other Kernel-based Learning Methods. Cambridge University Press,March 2000. [49] C. Cumby and D. Roth. On kernel methods for relational learning. In Proc. of theInternational Conference on Machine Learning (ICML), pages 107-114, 2003. [50] I. Dagan and O. Glickman. Probabilistic textual entailment: Generic applied modelingof language variability. In Learning Methods for Text Understanding andMining, Grenoble, France, 2004. [52] Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. Recognizing textualentailment: Rational, evaluation and approaches. Natural Language Engineering,15(Special Issue 04):i-xvii, 2009. [53] Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textualentailment challenge. In Quionero-Candela et al., editor, LNAI 3944: MLCW 2005,pages 177-190. Springer-Verlag, Milan, Italy, 2006. [54] Marie-Catherine de Marneffe, Trond Grenager, Bill MacCartney, Daniel Cer,Daniel Ramage, Chloe Kiddon, and Christopher D. Manning. Aligning semanticgraphs for textual inference and machine reading. In AAAI Spring Symposiumat Stanford 2007, 2007. [55] Marie-Catherine de Marneffe, Bill MacCartney, Trond Grenager, Daniel Cer, AnnaRafferty, and Christopher D. Manning. Learning to distinguish valid textual entailments.In Bernardo Magnini and Ido Dagan, editors, Proceedings of the SecondPASCAL Recognizing Textual Entailment Challenge, Venice, Italy, 2006. Springer-Verlag. [56] Marie-Catherine de Marneffe and Christopher Manning. The stanford typed dependenciesrepresentation. In COLING Workshop on Cross-framework and Cross-domain Parser Evaluation, 2008. [57] Marie-Catherine de Marneffe, Anna N. Rafferty, and Christopher D. Manning.Finding contradictions in text. In Proceedings of ACL-08: HLT, pages 1039-1047,164 Mesh Refinement for Time-Domain Numerical ElectromagneticsColumbus, Ohio, June 2008. Association for Computational Linguistics. [58] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. L, and RichardHarshman. Indexing by latent semantic analysis. Journal of the American Societyfor Information Science, 41:391-407, 1990. [59] Q. Do and D. Roth. Constraints based taxonomic relation classifier. In EMNLP,Massachussetts, USA, 10 2010. [60] Quang Do, Dan Roth, Mark Sammons, Yuancheng Tu, and V.G.Vinod Vydiswaran.Robust, Light-weight Approaches to compute Lexical Similarity. ComputerScience Research and Technical Reports, University of Illinois, 2010.
133
© F.M.Zanzotto University of Rome “Tor Vergata” References [61] Bill Dolan, Chris Quirk, and Chris Brockett. Unsupervised construction of largeparaphrase corpora: Exploiting massively parallel news sources. In Proceedings ofColing 2004, pages 350-356. COLING, Geneva, Switzerland, Aug 23-Aug 27 2004. [62] FRANCESCA FALLUCCHI and FABIO MASSIMO ZANZOTTO. Inductiveprobabilistic taxonomy learning using singular value decomposition. Natural Lan-guage Engineering, 17(01):71-94, 2011. [63] C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998. [64] Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press,Cambridge, MA, 1998. [65] Charles John Fillmore, Christopher R Johnson, and M R L Petruck. Backgroundto framenet. International Journal of Lexicography, 16(3):235-250, 2003. [66] Abraham Fowler, Bob Hauser, Daniel Hodges, Ian Niles, Adrian Novischi, andJans Stephan. Applying cogex to recognize textual entailment. In Proceedings ofRTE 2005, 2005. [67] Konstantina Garoufi. Towards a better understanding of applied textual entailment:Annotation and evaluation of the rte-2 dataset. Master's thesis, SaarlandUniversity, 2008. [68] Thomas Gartner. A survey of kernels for structured data. SIGKDD Explorations,2003. [69] Maayan Geffet and Ido Dagan. The distributional inclusion hypotheses and lexicalentailment. In ACL '05: Proceedings of the 43rd Annual Meeting on Associationfor Computational Linguistics, pages 107-114, Morristown, NJ, USA, 2005. Associationfor Computational Linguistics. [70] Danilo Giampiccolo, Hoa T. Dang, Bernardo Magnini, Ido Dagan, and Bill Dolan.The fourth PASCAL recognizing textual entailment challenge. In The Text Anal-ysis Conference (TAC 2008), 2008. [71] Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1-9. Association for Computational Linguistics, Prague, June 2007. [72] Oren Glickman and Ido Dagan. Probabilistic textual entailment: Generic appliedmodeling of language variability. In Proceedings of the Workshop on LearningMethods for Text Understanding and Mining. Grenoble, France, 2004. [73] Oren Glickman, Ido Dagan, and Moshe Koppel. A lexical alignment model forprobabilistic textual entailment. In Joaquin Qui~nonero Candela, Ido Dagan,Bernardo Magnini, and Florence d'Alche Buc, editors, MLCW, volume 3944 ofLecture Notes in Computer Science, pages 287-298. Springer, 2005. [74] Isabelle Guyon and Andre Elisseeff. An introduction to variable and feature selection.Journal of Machine Learning Research, 3:1157-1182, March 2003. [75] Sanda Harabagiu and Andrew Hickl. Methods for using textual entailment in opendomainquestion answering. In Proceedings of the 21st International Conferenceon Computational Linguistics and 44th Annual Meeting of the Association forComputational Linguistics, pages 905-912, 2006.
134
© F.M.Zanzotto University of Rome “Tor Vergata” References [76] Sanda Harabagiu and Andrew Hickl. Methods for using textual entailment in opendomainquestion answering. In Proceedings of the 21st International Conference onComputational Linguistics and 44th Annual Meeting of the Association for Com-putational Linguistics, pages 905-912, Sydney, Australia, July 2006. Associationfor Computational Linguistics. [77] Sanda Harabagiu and Andrew Hickl. Methods for Using Textual Entailment inOpen-Domain Question Answering. In Proceedings of the 21st International Con- ference on Computational Linguistics and 44th Annual Meeting of the Associationfor Computational Linguistics, pages 905-912, Sydney, Australia, July 2006. Associationfor Computational Linguistics. [78] Sanda Harabagiu, Andrew Hickl, and Finley Lacatusu. Satisfying informationneeds with multi-document summaries. Information Processing & Management,43(6):1619 - 1642, 2007. Text Summarization. [79] Zellig Harris. Distributional structure. In Jerrold J. Katz and Jerry A. Fodor,editors, The Philosophy of Linguistics. Oxford University Press, New York, 1964. [80] P. Harrison and M. Maxwell. A new implementation of gpsg. In Proceedings of the6th Canadian Conference on AI (CSCSI'86), pages 78-83, 1986. [81] Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. InProceedings of the 15th International Conference on Computational Linguistics(CoLing-92). Nantes, France, 1992. [82] Andrew Hickl. Using discourse commitments to recognize textual entailment. In166 Mesh Refinement for Time-Domain Numerical ElectromagneticsProceedings of the 22nd COLING Conference, 2008. [83] Andrew Hickl and Jeremy Bensley. A Discourse Commitment-Based Frameworkfor Recognizing Textual Entailment. In Proceedings of the ACL-PASCAL Work-shop on Textual Entailment and Paraphrasing, pages 171-176, 2007. [84] Andrew Hickl, John Williams, Jeremy Bensley, Kirk Roberts, Bryan Rink, andYing Shi. Recognizing textual entailment with LCCs GROUNDHOG system. InBernardo Magnini and Ido Dagan, editors, Proceedings of the Second PASCALRecognizing Textual Entailment Challenge, Venice, Italy, 2006. Springer-Verlag. [85] Andrew Hickl, John Williams, Jeremy Bensley, Kirk Roberts, Bryan Rink, andYing Shi. Recognizing textual entailment with LCCs GROUNDHOG system. InBernardo Magnini and Ido Dagan, editors, Proceedings of the Second PASCALRecognizing Textual Entailment Challenge. Springer-Verlag, Venice, Italy, 2006. [86] J. R. Hobbs, M. Stickel, P. Martin, and D. Edwards. Interpretation as abduction.In Proceedings of the 26th Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 95-103, 1988. [87] A. Iftene and M.-A. Moruz. Uaic participation at rte5. In Notebook papers andResults, Text Analysis Conference (TAC), pages 367-376, 2009. [88] Christian Jacquemin. Spotting and Discovering Terms through Natural Lan-guage Processing. Massachusetts Institue of Technology, Cambrige, Massachussetts,USA, 2001. [89] Jay J. Jiang and David W. Conrath. Semantic similarity based on corpus statisticsand lexical taxonomy. In Proc. of the 10th ROCLING, pages 132-139. Tapei,Taiwan, 1997. [90] Valentin Jijkoun and Maarten de Rijke. Recognizing textual entailment using lexicalsimilarity. In Proceedings of the 1st Pascal Challenge Workshop, Southampton,UK, 2005.
135
© F.M.Zanzotto University of Rome “Tor Vergata” References [91] Johannes Kobler, Uwe Schoning, and Jacobo Toran. The graph isomorphism prob-lem: its structural complexity. Birkhauser Verlag, Basel, Switzerland, Switzerland,1993. [92] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico,Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens,Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: opensource toolkit for statistical machine translation. In ACL '07: Proceedings of the45th Annual Meeting of the ACL on Interactive Poster and Demonstration Ses-sions, pages 177-180, Morristown, NJ, USA, 2007. Association for ComputationalLinguistics. [93] Milen Koulyekov and Bernardo Magnini. Recognizing textual entailment with treeBibliography 167edit distance algorithms. In Proceedings of RTE 2005, 2005. [94] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings ofCOLING/ACL-98, pages 768-774, 1998. [95] D. Lin and P. Pantel. Induction of semantic classes from natural language text.In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and DataMining, pages 317-322, 2001. [96] Dekang Lin. Dependency-based evaluation of minipar. In Proceedings of the Work-shop on Evaluation of Parsing Systems at LREC 1998, Granada, Spain, 1998. [97] Dekang Lin and Patrick Pantel. DIRT-discovery of inference rules from text. InProceedings of the ACM Conference on Knowledge Discovery and Data Mining(KDD-01), San Francisco, CA, 2001. [98] Dekang Lin and Patrick Pantel. DIRT: discovery of inference rules from text. InKnowledge Discovery and Data Mining, pages 323-328, 2001. [99] B. MacCartney, T. Grenager, and M. de Marneffe. Learning to recognize featuresof valid textual entailments. In Proceedings of RTE-NAACL 2006, 2006. [100] Bill MacCartney, Michel Galley, and Christopher D. Manning. A phrase-basedalignment model for natural language inference. In Proceedings of the Conferenceon Empirical Methods in Natural Language Processing (EMNLP-2008), 2008. [101] BillMacCartney and Christopher D. Manning. An extended model of natural logic.In The Eighth International Conference on Computational Semantics (IWCS-8),Tilburg, Netherlands, 2009. [102] J. B. MacQueen. Some methods for classification and analysis of multivariateobservations. In L. M. Le Cam and J. Neyman, editors, Proc. of the fifth BerkeleySymposium on Mathematical Statistics and Probability, volume 1, pages 281-297.University of California Press, 1967. [103] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotatedcorpus of english: The penn treebank. Computational Linguistics, 19:313- 330, 1993. [104] Y. Mehdad, M. Negri, and M. Federico. Towards cross-lingual textual entailment.In Human Language Technologies: The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguistics, pages 321-324. Association for Computational Linguistics, 2010. [105] Y. Mehdad, F. M. Zanzotto, and A. Moschitti. Semker: Syntactic/semantic kernelsfor recognizing textual entailment. In Notebook papers and Results, Text AnalysisConference (TAC), pages 259-265, 2009.
136
© F.M.Zanzotto University of Rome “Tor Vergata” References [106] Yashar Mehdad, Matteo Negri, Elena Cabrio, Milen Kouylekov, and BernardoMagnini. EDITS: An Open Source Framework for Recognizing Textual Entail168Mesh Refinement for Time-Domain Numerical Electromagneticsment. In Text Analysis Conference (TAC), pages 169-178, 2009. [107] Yashar Mehdad, Matteo Negri, Elena Cabrio, Milen Kouylekov, and BernardoMagnini. Edits: An open source framework for recognizing textual entailment. InNotebook papers and Results, Text Analysis Conference (TAC), pages 169-178,2009. [108] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K.J. Miller. Wordnet: An onlinelexical database. International Journal of Lexicography, 3(4):235-312, 1990. [109] George A. Miller. WordNet: A lexical database for English. Communications ofthe ACM, 38(11):39-41, November 1995. [110] Shachar Mirkin, Ido Dagan, and Sebastian Pado. Assessing the role of discoursereferences in entailment inference. In Proceedings of the 48th Annual Meeting ofthe Association for Computational Linguistics, pages 1209-1219, Uppsala, Sweden,July 2010. Association for Computational Linguistics. [111] Shachar Mirkin, Lucia Specia, Nicola Cancedda, Ido Dagan, Marc Dymetman,and Idan Szpektor. Source-language entailment modeling for translating unknownterms. In Proceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural Language Processingof the AFNLP, pages 791-799, Suntec, Singapore, August 2009. Association forComputational Linguistics. [112] Shachar Mirkin, Lucia Specia, Nicola Cancedda, Ido Dagan, Marc Dymetman,and Idan Szpektor. Source-language entailment modeling for translating unknownterms. In Proceedings of ACL/AFNLP, pages 791-799, Suntec, Singapore, August2009. Association for Computational Linguistics. [113] D. Moldovan, C. Clark, S. Harabagiu, and S. Maiorano. Cogex: A logic prover forquestion answering. In Proceedings of HLT-NAACL 2003, 2003. [114] A. Moschitti and F. Zanzotto. Fast and effective kernels for relational learningfrom texts. In Zoubin Ghahramani, editor, Proc. of the International Conferenceon Machine Learning (ICML), pages 649-656. Omnipress, 2007. [115] Alessandro Moschitti. Making tree kernels practical for natural language learning.In Proceedings of EACL'06. Trento, Italy, 2006. [116] Alessandro Moschitti and Fabio Massimo Zanzotto. Fast and effective kernels forrelational learning from texts. In Proceedings of the International Conference ofMachine Learning (ICML). Corvallis, Oregon, 2007. [117] Eamonn Newman, Nicola Stokes, John John Dunnion, and Joe Carthy. Ucd iirgapproach to the textual entailment challenge. In Proceedings of the 1st PascalChallenge Workshop, Southampton, UK, 2005. [118] Rodney d. Nielsen, Wayne Ward, and James h. Martin. Recognizing entailmentBibliography 169in intelligent tutoring systems*. Nat. Lang. Eng., 15:479-501, October 2009. [119] Sebastian Pado, Marie-Catherine de Marneffe, Bill MacCartney, Anna N. Rafferty,Eric Yeh, and Christopher D. Manning. Deciding entailment and contradictionwith stochastic and edit distance-based alignment. In Text Analysis Conference(TAC), 2008. [120] Sebastian Pado, Michel Galley, Dan Jurafsky, and Chris Manning. Robust machinetranslation evaluation with entailment features. In Proceedings of the JointConference of the 47th Annual Meeting of the ACL and the 4th International JointConference on Natural Language Processing of the AFNLP: Volume 1 - Volume1, ACL '09, pages 297-305, Stroudsburg, PA, USA, 2009. Association for ComputationalLinguistics.
137
© F.M.Zanzotto University of Rome “Tor Vergata” References [121] Sebastian Pado, Michel Galley, Dan Jurafsky, and Christopher D. Manning. Robustmachine translation evaluation with entailment features. In Proceedings ofACL/AFNLP, pages 297-305, Suntec, Singapore, August 2009. Association forComputational Linguistics. [122] Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patternsfor automatically harvesting semantic relations. In Proceedings of the 21st In- ternational Conference on Computational Linguistics and 44th Annual Meetingof the Association for Computational Linguistics, pages 113-120. Association forComputational Linguistics, Sydney, Australia, July 2006. [123] M. Pazienza, M. Pennacchiotti, and F. Zanzotto. Terminology extraction: An analysisof linguistic and statistical approaches. In S. Sirmakessis, editor, KnowledgeMining Series: Studies in Fuzziness and Soft Computing. Springer Verlag, 2005. [124] Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. Wordnet::similarity -measuring the relatedness of concepts. In Proc. of 5th NAACL. Boston, MA, 2004. [125] Anselmo Pe~nas, Alvaro Rodrigo, Valentìn Sama, and Felisa Verdejo. Overviewof the answer validation exercise 2006. In Carol Peters, Paul Clough, Fredric C.Gey, Jussi Karlgren, Bernardo Magnini, Douglas W. Oard, Maarten de Rijke,and Maximilian Stempfhuber, editors, CLEF, volume 4730 of Lecture Notes inComputer Science, pages 257-264. Springer, 2006. [126] Anselmo Pe~nas, Alvaro Rodrigo, and Felisa Verdejo. Overview of the answer validationexercise 2007. In Carol Peters, Valentin Jijkoun, Thomas Mandl, HenningMuller, Douglas W. Oard, Anselmo Pe~nas, Vivien Petras, and Diana Santos, editors,CLEF, volume 5152 of Lecture Notes in Computer Science, pages 237-248.Springer, 2007. [127] V. Punyakanok, D. Roth, and W. Yih. Natural language inference via dependencytree mapping: An application to question answering. In submission, 2004.170 Mesh Refinement for Time-Domain Numerical Electromagnetics [128] J. Quinlan. C4:5:programs for Machine Learning. Morgan Kaufmann, San Mateo,1993. [129] R. Raina, A. Ng, and C. Manning. Robust textual inference via learning andabductive reasoning. In Proceedings of AAAI 2005, 2005. [130] L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithmsfor disambiguation to wikipedia. In Proc. of the Annual Meeting of the Associationof Computational Linguistics (ACL), 2011. [131] Deepak Ravichandran and Eduard Hovy. Learning surface text patterns for aquestion answering system. In Proceedings of the 40th ACL Meeting. Philadelphia,Pennsilvania, 2002. [132] Philip Resnik. Selection and Information: A Class-Based Approach to Lexical Re-lationships. PhD thesis, Department of Computer and Information Science, Universityof Pennsylvania, 1993. [133] Harold R. Robison. Computer-detectable semantic structures. Information Storageand Retrieval, 6(3):273-288, 1970. [134] Alvaro Rodrigo, Anselmo Pe~nas, and Felisa Verdejo. Evaluating question answeringvalidation as a classification problem. Language Resources and Evaluation,pages 1-9, March 2011. [135] Alvaro Rodrigo, Anselmo Pe~nas, and Felisa Verdejo. Overview of the answer validationexercise 2008. In Carol Peters, Thomas Deselaers, Nicola Ferro, Julio Gonzalo,Gareth J. F. Jones, Mikko Kurimo, Thomas Mandl, Anselmo Pe~nas, andVivien Petras, editors, CLEF, volume 5706 of Lecture Notes in Computer Science,pages 296-313. Springer, 2008.
138
© F.M.Zanzotto University of Rome “Tor Vergata” References [136] Lorenza Romano, Milen Kouylekov, Idan Szpektor, Ido Dagan, and AlbertoLavelli. Investigating a generic paraphrase-based approach for relation extraction.In EACL, 2006. [137] F. Rosenblatt. The perceptron: A probabilistic model for information storage andorganization in the brain. Psych. Rev., 65:386-407, 1958. (Reprinted in Neurocom-puting (MIT Press, 1988).). [138] D. Roth and W. Yih. Global inference for entity and relation identification via alinear programming formulation. In Lise Getoor and Ben Taskar, editors, Intro- duction to Statistical Relational Learning. MIT Press, 2007. [139] Dan Roth, Mark Sammons, and V.G.Vinod Vydiswaran. A Framework for EntailedRelation Recognition. In Proc. of the Annual Meeting of the Association ofComputational Linguistics (ACL), Singapore, August 2009. Association for ComputationalLinguistics. [140] Mark Sammons, V.G.Vinod Vydiswaran, and Dan Roth. \Ask not what TextualBibliography 171Entailment can do for you...". In ACL, Uppsala, Sweden, July 2010. Associationfor Computational Linguistics. [141] Mark Sammons, V.G.Vinod Vydiswaran, T. Vieira, N. Johri, M.-W. Chang,D. Goldwasser, V. Srikumar, G. Kundu, Y. Tu, K. Small, J. Rule, Q. Do, andD. Roth. Relation Alignment for Textual Entailment Recognition. In Text Analy-sis Conference (TAC), 2009. [142] Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the conll-2003shared task: Language-independent named entity recognition. In Proceedings ofCoNLL-2003, pages 142-147, 2003. [143] scar Ferrndez, Christian Spurk, Milen Kouylekov, Iustin Dornescu, Sergio Ferrndez,Matteo Negri, Rubn Izquierdo, David Toms, Constantin Orasan, GuenterNeumann, Bernardo Magnini, and Jose Luis Vicedo. The qall-me framework: Aspecifiable-domain multilingual question answering architecture. Web Semantics:Science, Services and Agents on the World Wide Web, 9(2):137 - 145, 2011. [144] Rion Snow, Daniel Jurafsky, and A. Y. Ng. Semantic taxonomy induction fromheterogenous evidence. In In ACL, pages 801-808, 2006. [145] Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap andfast|but is it good?: evaluating non-expert annotations for natural languagetasks. In Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing, EMNLP '08, pages 254-263, Stroudsburg, PA, USA, 2008. Associationfor Computational Linguistics. [146] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A Core ofSemantic Knowledge. In 16th international World Wide Web conference (WWW2007), New York, NY, USA, 2007. ACM Press. [147] Jana Z. Sukkarieh and Svetlana Stoyanchev. Automating model building in crater.In Proceedings of the 2009 Workshop on Applied Textual Inference, TextInfer'09, pages 61-69, Stroudsburg, PA, USA, 2009. Association for ComputationalLinguistics. [148] Idan Szpektor and Ido Dagan. Learning entailment rules for unary templates. InProceedings of the 22nd International Conference on Computational Linguistics(Coling 2008), pages 849-856, Manchester, UK, August 2008. Coling 2008 OrganizingCommittee. [149] Idan Szpektor, Ido Dagan, Roy Bar-Haim, and Jacob Goldberger. Contextualpreferences. In Proceedings of ACL-08: HLT, pages 683-691, Columbus, Ohio,June 2008. Association for Computational Linguistics. [150] Idan Szpektor, Hristo Tanev, Ido Dagan, and Bonaventura Coppola. Scaling webbasedacquisition of entailment relations. In Proceedings of the 2004 Conference172 Mesh Refinement for Time-Domain Numerical Electromagneticson Empirical Methods in Natural Language Processing. Barcellona, Spain, 2004.
139
© F.M.Zanzotto University of Rome “Tor Vergata” References [151] L. Tesniere. Elements de syntaxe structural. Klincksiek, Paris, France, 1959. [152] Lucy Vanderwende and William B. Dolan. What syntax can contribute in the entailmenttask. In Joaquin Qui~nonero Candela, Ido Dagan, Bernardo Magnini, andFlorence d'Alche Buc, editors, Machine Learning Challenges Workshop, volume3944 of Lecture Notes in Computer Science, pages 205-216. Springer, 2006. [153] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, NewYork, 1995. [154] Ellen M. Voorhees and Donna Harman. Overview of the seventh text retrieval conferencetrec-7. In Proceedings of the Seventh Text REtrieval Conference (TREC-7,pages 1-24, 1998. [155] Annie Zaenen, Lauri Karttunen, and Richard Crouch. Local textual inference: Canit be defined or circumscribed? In Proc. of the ACL Workshop on Empirical Mod-eling of Semantic Equivalence and Entailment, pages 31-36, Ann Arbor, Michigan,June 2005. Association for Computational Linguistics. [156] A. Zanzotto, F.M. adn Moschitti, M. Pennacchiotti, and M.T. Pazienza. Learningtextual entailment from examples. In Bernardo Magnini and Ido Dagan, editors,Proceedings of the Second PASCAL Recognizing Textual Entailment Challenge.Springer-Verlag, Venice, Italy, 2006. [157] Fabio Massimo Zanzotto and Lorenzo Dell'Arciprete. Efficient kernels for sentencepair classification. In Conference on Empirical Methods on Natural Language Pro-cessing, pages 91-100, 6-7 August 2009. [158] Fabio Massimo Zanzotto, Lorenzo Dell'arciprete, and Alessandro Moschitti. Effi-cient graph kernels for textual entailment recognition. FUNDAMENTA INFOR-MATICAE, 107 (2-3):199-222, 2011. [159] Fabio Massimo Zanzotto and Alessandro Moschitti. Automatic learning of textualentailments with cross-pair similarities. In Proceedings of the 21st Coling and 44thACL, pages 401-408. Sydney, Australia, July 2006. [160] Fabio Massimo Zanzotto and Marco Pennacchiotti. Expanding textual entailmentcorpora fromwikipedia using co-training. In Proceedings of the 2nd Workshop onThe People's Web Meets NLP: Collaboratively Constructed Semantic Resources,pages 28-36, Beijing, China, August 2010. Coling 2010 Organizing Committee. [161] Fabio Massimo Zanzotto, Marco Pennacchiotti, and Alessandro Moschitti. Amachine learning approach to textual entailment recognition. NATURAL LAN- GUAGE ENGINEERING, 15-04:551-582, 2009. [162] Fabio Massimo Zanzotto, Marco Pennacchiotti, and Maria Teresa Pazienza. Discoveringasymmetric entailment relations between verbs using selectional preferBibliography173ences. In Proceedings of the 21st International Conference on Computational Lin-guistics and 44th Annual Meeting of the Association for Computational Linguis-tics, pages 849-856. Association for Computational Linguistics, Sydney, Australia,July 2006. [163] F.M. Zanzotto and A. Moschitti. Automatic learning of textual entailments withcross-pair similarities. In ACL-44: Proceedings of the 21st International Conferenceon Computational Linguistics and the 44th annual meeting of the Association forComputational Linguistics, pages 401-408, 2006. [164] Zhi Zhong and Hwee Tou Ng. It makes sense: a wide-coverage word sense disambiguationsystem for free text. In Proceedings of the ACL 2010 System Demon-strations, ACLDemos '10, pages 78-83, Stroudsburg, PA, USA, 2010. Associationfor Computational
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.