Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification
F.M.Zanzotto University of Rome “Tor Vergata” Classifying sentence pairs is an important activity in many NLP tasks, e.g.: –Textual Entailment Recognition –Machine Translation –Question-Answering Classifiers need suitalble feature spaces Motivation
F.M.Zanzotto University of Rome “Tor Vergata” For example, in textual entailment… Motivation T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” P 1 : T 1 H 1 T2T2 H2H2 “They feed dolphins fishs” “Fishs eat dolphins” P 2 : T 2 H 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” P 3 : T 3 H 3 Training examples Classification Relevant Features feed eat X X Y Y X X Y Y First-order rules
F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this talk…
F.M.Zanzotto University of Rome “Tor Vergata” We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function K(P 1,P 2 )=|S(P 1 ) S(P 2 )| that computes how many common first-order rules are activated from P 1 and P 2 Without loss of generality, we present the problem in syntactic-first-order rule feature spaces First-order rule (FOR) feature spaces: challenges
F.M.Zanzotto University of Rome “Tor Vergata” … using the Kernel Trick: –define the distance K(P 1, P 2 ) –instead of defining the feautures Observations T 1 H 1 T 1 H 2 K(T 1 H 1,T 1 H 2 )
F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers VP S NP S VP 1 1 VBNP S VP VBNP T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” T 1 H 1 feed eat Pa=Pa= S(P a )= Adding placeholders Propagating placeholders
F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VB eat VP VBNP feed NP NNS Babies NNS babies NN milk S NP NNS Mothers NP NN milk T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” T3 H3T3 H3 Pb=Pb= S(P b )= VP S NP S VP 1 1 VBNP S VP VBNP feed eat
F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y VP S NP S VP 1 1 VBNP S VP VBNP feed eat VP S NP S VP 1 1 VBNP S VP VBNP feed eat K(P a,P b )=|S(P a ) S(P b )| S(P b )= S(P a )= = = =
F.M.Zanzotto University of Rome “Tor Vergata” FOR feature spaces can be modelled with particular graphs We call these graphs tripartite direct acyclic graphs (tDAGs) Observations: –tDAGs are not trees –tDAGs can be used to model both rules and sentence pairs –unifying rules in sentences is a graph matching problem A step back…
F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers
F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers
F.M.Zanzotto University of Rome “Tor Vergata” S NPVP NP eat VP VB feed NP VB A tripartite directed acyclic graph is a graph G = (N,E) where: the set of nodes N is partitioned in three sets N t, N g, and A the set of edges is partitioned in four sets N t, N g, E A(t), and E A(g) where t = (N t,E t ) and g = (N t,E t ) are two trees E A(t) = {(x, y)|x N t and y A} E A(g) = {(x, y)|x N g and y A} Tripartite Directed Acyclic Graphs (tDAGs)
F.M.Zanzotto University of Rome “Tor Vergata” Alternative definition A tDAG is a pair of extented trees G = ( ) where: = (N t A t,E t E A(t) ) and = (N g A g,E g E A(g) ). Tripartite Directed Acyclic Graphs (tDAGs) S NPVP NP eat VP VB feed NP VB S NPVP NP eat VP VB feed NP VB X X Y Y X X Y Y
F.M.Zanzotto University of Rome “Tor Vergata” Computing the implicit kernel function K(P 1,P 2 )=|S(P 1 ) S(P 2 )| involves general graph matching. This is an exponential problem. Yet… tDAGs are particular graphs and we can define an efficient algorithm We will analyze the isomorphism among tDAGs and we will derive an algorithm for Again challenges
F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between graphs G 1 =(N 1,E 1 ) and G 2 =(N 2,E 2 ) are isomorphic if: –|N 1 |=|N 2 | and |E 1 |=|E 2 | –Among all the bijecive functions relating N 1 and N 2, it exists f : N 1 N 2 such that: for each n 1 in N 1, Label(n 1 )=Label(f(n 1 )) for each (n a,n b ) in E 1, (f(n a ),f(n b )) is in E 2 Isomorphism between tDAGs
F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism adapted to tDAGs G 1 = ( 1 1 ) and G 2 = ( 2 2 ) are isomorphic if these two properties hold –Partial isomorphism and are isomorphic and are isomorphic This property generates two functions f and f –Constraint compatibility f and f are compatible on the sets of nodes A 1 and A 2, if for each n A 1, it happens that f (n) = f (n). Isomorphism between tDAGs
F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between tDAGs VP VBNP S VP VBNP VP VBNP S VP VBNP C=C= C = C {), (),(}, C=C= {), (),(}, Partial isomorphism Constraint compatibility P a =( a a )= P b =( b b )=
F.M.Zanzotto University of Rome “Tor Vergata” We define K(P 1,P 2 )=|S(P 1 ) S(P 2 )| using the isomorphism between tDAGs The idea: reverse the order of isomorphism detection First, constraint compatibility –Building a set C of all the relevant alternative constraints –Finding subsets of S(P 1 ) S(P 2 ) meeting a constraint c C Second, partial isomorphism detection Ideas for building the kernel subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM C={c 1,c 2 }={ {), (),(},, {), (),(}, } K(P a,P b )=|S(P a ) S(P b )| P a =( a a )= P b =( b b )= subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM {), (),(},c1=c1= A BC BB I MN N N A BC I MN N N A BC BB I MN A BC I MN C={c 1,c 2 } S(P a ) S(P b )) c1 = Pa=Pa= Pb=Pb= subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility K(P a,P b )=|S(P a ) S(P b )| K(P a,P b )=|S(P a ) S(P b )|=|(S(P a ) S(P b )) c1 (S(P a ) S(P b )) c2 |
F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM {), (),(},c2=c2= A BC CC I MN M M A BC I MN N N A BC CC I MN A BC I MN C={c 1,c 2 } K(P a,P b )=|S(P a ) S(P b )|=|(S(P a ) S(P b )) c1 (S(P a ) S(P b )) c2 | Pa=Pa= Pb=Pb= S(P a ) S(P b )) c2 = subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC BB I MN N N A BC I MN N N A BC BB I MN A BC I MN A BC BB I MN N N A BC I MN (S(P a ) S(P b )) c1 =(S( a ) S( b )) c1 S( a ) S( b )) c1 K(P a,P b )=| c C (S(P a ) S(P b )) c |=| c C (S( a ) S( b )) c (S( a ) S( b )) c | subsets of S(P 1 ) S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility
F.M.Zanzotto University of Rome “Tor Vergata” The general Equation can be computed using: 1)K S (kernel function for trees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) 2)The inclusion exclusion principle Kernel on FOR feature spaces K(P 1,P 2 )=| c C (S( 1 ) S( 2 )) c (S( 1 ) S( 2 )) c |
F.M.Zanzotto University of Rome “Tor Vergata” Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) Test-bed: corpus –Recognizing Textual Entailment challenge data Computational Efficency Analysis
F.M.Zanzotto University of Rome “Tor Vergata” Computational Efficency Analysis Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders
F.M.Zanzotto University of Rome “Tor Vergata” Training: RTE 1, 2, 3 Testing: RTE 4 Accuracy Comparison
F.M.Zanzotto University of Rome “Tor Vergata” We reduced kernels in first-order feature spaces as graph-matching problems We defined a new class of graphs, tDAGs We presented an efficient algorithm for computing kernels in FOR feature spaces Conclusions
F.M.Zanzotto University of Rome “Tor Vergata” S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers VP S NP S VP 1 1 VBNP S VP VBNP
F.M.Zanzotto University of Rome “Tor Vergata” VP S NP S VP 1 1 VBNP S VP VBNP S NPVP VB eat VP VBNP feed NP NNS Cows NNS babies NN milk S NP NNS Mothers NP NN milk VP S NP S VP 1 1 VBNP S VP VBNP