Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification.

Similar presentations


Presentation on theme: "Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification."— Presentation transcript:

1 Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

2 F.M.Zanzotto University of Rome “Tor Vergata” Classifying sentence pairs is an important activity in many NLP tasks, e.g.: –Textual Entailment Recognition –Machine Translation –Question-Answering Classifiers need suitalble feature spaces Motivation

3 F.M.Zanzotto University of Rome “Tor Vergata” For example, in textual entailment… Motivation T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” P 1 : T 1  H 1 T2T2 H2H2 “They feed dolphins fishs” “Fishs eat dolphins” P 2 : T 2  H 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” P 3 : T 3  H 3 Training examples Classification Relevant Features feed  eat X X Y Y X X Y Y First-order rules

4 F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this talk…

5 F.M.Zanzotto University of Rome “Tor Vergata” We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function K(P 1,P 2 )=|S(P 1 )  S(P 2 )| that computes how many common first-order rules are activated from P 1 and P 2 Without loss of generality, we present the problem in syntactic-first-order rule feature spaces First-order rule (FOR) feature spaces: challenges

6 F.M.Zanzotto University of Rome “Tor Vergata” … using the Kernel Trick: –define the distance K(P 1, P 2 ) –instead of defining the feautures Observations T 1  H 1 T 1  H 2 K(T 1  H 1,T 1  H 2 )

7 F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1   VP S NP S VP 1 1   VBNP 3 3 1 1 S VP VBNP 3 3 1 1     T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” T 1  H 1 feed eat Pa=Pa= S(P a )= Adding placeholders Propagating placeholders

8 F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VB eat VP VBNP feed NP NNS Babies NNS babies NN milk S NP NNS Mothers 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1   NP NN milk 2 2 2 2 2 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” T3 H3T3 H3 Pb=Pb= S(P b )= VP S NP S VP 1 1   VBNP 2 2 1 1 S VP VBNP 2 2 1 1     feed eat

9 F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y VP S NP S VP 1 1   VBNP 2 2 1 1 S VP VBNP 2 2 1 1     feed eat VP S NP S VP 1 1   VBNP 3 3 1 1 S VP VBNP 3 3 1 1     feed eat K(P a,P b )=|S(P a )  S(P b )| S(P b )= S(P a )=   = = =

10 F.M.Zanzotto University of Rome “Tor Vergata” FOR feature spaces can be modelled with particular graphs We call these graphs tripartite direct acyclic graphs (tDAGs) Observations: –tDAGs are not trees –tDAGs can be used to model both rules and sentence pairs –unifying rules in sentences is a graph matching problem A step back…

11 F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1

12 F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1

13 F.M.Zanzotto University of Rome “Tor Vergata” S NPVP NP eat VP VB feed NP VB A tripartite directed acyclic graph is a graph G = (N,E) where: the set of nodes N is partitioned in three sets N t, N g, and A the set of edges is partitioned in four sets N t, N g, E A(t), and E A(g) where t = (N t,E t ) and g = (N t,E t ) are two trees E A(t) = {(x, y)|x  N t and y  A} E A(g) = {(x, y)|x  N g and y  A} Tripartite Directed Acyclic Graphs (tDAGs)

14 F.M.Zanzotto University of Rome “Tor Vergata” Alternative definition A tDAG is a pair of extented trees G = (  ) where:  = (N t  A t,E t  E A(t) ) and  = (N g  A g,E g  E A(g) ). Tripartite Directed Acyclic Graphs (tDAGs) S NPVP NP eat VP VB feed NP VB S NPVP NP eat VP VB feed NP VB X X Y Y X X Y Y

15 F.M.Zanzotto University of Rome “Tor Vergata” Computing the implicit kernel function K(P 1,P 2 )=|S(P 1 )  S(P 2 )| involves general graph matching. This is an exponential problem. Yet… tDAGs are particular graphs and we can define an efficient algorithm We will analyze the isomorphism among tDAGs and we will derive an algorithm for Again challenges

16 F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between graphs G 1 =(N 1,E 1 ) and G 2 =(N 2,E 2 ) are isomorphic if: –|N 1 |=|N 2 | and |E 1 |=|E 2 | –Among all the bijecive functions relating N 1 and N 2, it exists f : N 1  N 2 such that: for each n 1 in N 1, Label(n 1 )=Label(f(n 1 )) for each (n a,n b ) in E 1, (f(n a ),f(n b )) is in E 2 Isomorphism between tDAGs

17 F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism adapted to tDAGs G 1 = (  1  1 ) and G 2 = (  2  2 ) are isomorphic if these two properties hold –Partial isomorphism  and  are isomorphic  and  are isomorphic This property generates two functions f  and f  –Constraint compatibility f  and f  are compatible on the sets of nodes A 1 and A 2, if for each n  A 1, it happens that f  (n) = f  (n). Isomorphism between tDAGs

18 F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between tDAGs VP VBNP 3 3 1 1 S VP VBNP 3 3 1 1   VP VBNP 2 2 1 1 S VP VBNP 2 2 1 1   C=C= C  = C  1 1 1 1 {), 3 3 2 2 (),(}, C=C= 1 1 1 1 {), 3 3 2 2 (),(}, Partial isomorphism Constraint compatibility P a =(  a  a )= P b =(  b  b )=

19 F.M.Zanzotto University of Rome “Tor Vergata” We define K(P 1,P 2 )=|S(P 1 )  S(P 2 )| using the isomorphism between tDAGs The idea: reverse the order of isomorphism detection First, constraint compatibility –Building a set C of all the relevant alternative constraints –Finding subsets of S(P 1 )  S(P 2 ) meeting a constraint c  C Second, partial isomorphism detection Ideas for building the kernel subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

20 F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2     C={c 1,c 2 }={ 1 1 1 1 {), 2 2 2 2 (),(},, 1 1 1 1 {), 2 2 3 3 (),(}, } K(P a,P b )=|S(P a )  S(P b )| P a =(  a  a )= P b =(  b  b )= subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

21 F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2     1 1 1 1 {), 2 2 2 2 (),(},c1=c1= A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2   A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2   A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN 1 1 1 1 1 1   A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1         C={c 1,c 2 } S(P a )  S(P b )) c1 = Pa=Pa= Pb=Pb= subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility K(P a,P b )=|S(P a )  S(P b )| K(P a,P b )=|S(P a )  S(P b )|=|(S(P a )  S(P b )) c1  (S(P a )  S(P b )) c2 |

22 F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C 1 1 1 1 C 2 2 BB 2 2 1 1 1 1 1 1 A BC C 1 1 1 1 C 3 3 BB 2 2 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 MM 1 1 2 2 1 1 2 2 I MN N 1 1 1 1 N 1 1 MM 1 1 3 3 1 1 2 2     1 1 1 1 {), 2 2 3 3 (),(},c2=c2= A BC 1 1 1 1 CC 2 2 1 1 1 1 I MN M 1 1 1 1 M 1 1 1 1 2 2   A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2   A BC 1 1 1 1 CC 2 2 1 1 1 1 I MN 1 1 1 1 1 1   A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1         C={c 1,c 2 } K(P a,P b )=|S(P a )  S(P b )|=|(S(P a )  S(P b )) c1  (S(P a )  S(P b )) c2 | Pa=Pa= Pb=Pb= S(P a )  S(P b )) c2 = subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

23 F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2   A BC 1 1 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2   A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN 1 1 1 1 1 1   A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1          A BC 1 1 1 1 BB 2 2 1 1 1 1 I MN N 1 1 1 1 N 1 1 1 1 2 2  A BC 1 1 1 1 1 1 I MN 1 1 1 1 1 1   (S(P a )  S(P b )) c1 =(S(  a )  S(  b )) c1  S(  a )  S(  b )) c1 K(P a,P b )=|  c  C (S(P a )  S(P b )) c |=|  c  C (S(  a )  S(  b )) c  (S(  a )  S(  b )) c | subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

24 F.M.Zanzotto University of Rome “Tor Vergata” The general Equation can be computed using: 1)K S (kernel function for trees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) 2)The inclusion exclusion principle Kernel on FOR feature spaces K(P 1,P 2 )=|  c  C (S(  1 )  S(  2 )) c  (S(  1 )  S(  2 )) c |

25 F.M.Zanzotto University of Rome “Tor Vergata” Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) Test-bed: corpus –Recognizing Textual Entailment challenge data Computational Efficency Analysis

26 F.M.Zanzotto University of Rome “Tor Vergata” Computational Efficency Analysis Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders

27 F.M.Zanzotto University of Rome “Tor Vergata” Training: RTE 1, 2, 3 Testing: RTE 4 Accuracy Comparison

28 F.M.Zanzotto University of Rome “Tor Vergata” We reduced kernels in first-order feature spaces as graph-matching problems We defined a new class of graphs, tDAGs We presented an efficient algorithm for computing kernels in FOR feature spaces Conclusions

29 F.M.Zanzotto University of Rome “Tor Vergata” S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers 1 1 2 2 3 3 3 3 3 3 1 1 2 2 1 1 2 2 3 3 3 3 3 3 2 2 1 1 1 1 1 1   VP S NP S VP 1 1   VBNP 3 3 1 1 S VP VBNP 3 3 1 1    

30 F.M.Zanzotto University of Rome “Tor Vergata” VP S NP S VP 1 1   VBNP 2 2 1 1 S VP VBNP 2 2 1 1     S NPVP VB eat VP VBNP feed NP NNS Cows NNS babies NN milk S NP NNS Mothers 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1   NP NN milk 2 2 2 2 2 2 VP S NP S VP 1 1   VBNP 3 3 1 1 S VP VBNP 3 3 1 1    


Download ppt "Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification."

Similar presentations


Ads by Google