Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification.

Slides:

Advertisements

Similar presentations

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.

Advertisements

EVALITA 2009 Recognizing Textual Entailment (RTE) Italian Chapter Johan Bos 1, Fabio Massimo Zanzotto 2, Marco Pennacchiotti 3 1 University of Rome La.

A Survey of Program Slicing Techniques A Survey of Program Slicing Techniques Sections 3.1,3.6 Swathy Shankar

An Introduction to the Model Verifier verds Wenhui Zhang September 15 th, 2010.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Fabio Massimo Zanzotto and Danilo Croce University of Rome “Tor Vergata” Roma, Italy Reading what Machines ‘Think’

In Search of Influential Event Organizers in Online Social Networks

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Francesca Fallucchi, Noemi Scarpato,Armando Stellato, and Fabio Massimo Zanzotto DISP, University “Tor Vergata” Rome, Italy

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.

Fast Isocontouring For Improved Interactivity Chandrajit L. Bajaj Valerio Pascucci Daniel R. Schikore.

Triple Patterning Aware Detailed Placement With Constrained Pattern Assignment Haitong Tian, Yuelin Du, Hongbo Zhang, Zigang Xiao, Martin D.F. Wong.

Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Fabio Massimo Zanzotto University of Rome “Tor Vergata” Roma, Italy Textual Entailment Recognition for Web Based Question-Answering.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Fabio Massimo Zanzotto

Graphical models for part of speech tagging

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.

Benk Erika Kelemen Zsolt

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.

1 Grammar Extraction and Refinement from an HPSG Corpus Kiril Simov BulTreeBank Project ( Linguistic Modeling Laboratory, Bulgarian.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Distributed Tree Kernels and Distributional Semantics: Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART.

Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.

START OF DAY 5 Reading: Chap. 8. Support Vector Machine.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)

Franciszek Seredynski, Damian Kurdej Polish Academy of Sciences and Polish-Japanese Institute of Information Technology APPLYING LEARNING CLASSIFIER SYSTEMS.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.

The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Fabio Massimo Zanzotto Alessandro Moschitti Experimenting a “general purpose” textual entailment learner in AVE University of Rome “Tor Vergata” Italy.

- Sachin Singh. Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Brute Force II.

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Learning Textual Entailment from Examples

Model ensemble for an effective on-line reconstruction of missing data in sensor networks

Hyper-parameter tuning for graph kernels via Multiple Kernel Learning

TT-Join: Efficient Set Containment Join

Model ensemble for an effective on-line reconstruction of missing data in sensor networks

Pattern Recognition and Image Analysis

Model ensemble for an effective on-line reconstruction of missing data in sensor networks

Model ensemble for an effective on-line reconstruction of missing data in sensor networks

Resource Allocation for Distributed Streaming Applications

Presentation transcript:

Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto University of Rome “Tor Vergata” Classifying sentence pairs is an important activity in many NLP tasks, e.g.: –Textual Entailment Recognition –Machine Translation –Question-Answering Classifiers need suitalble feature spaces Motivation

F.M.Zanzotto University of Rome “Tor Vergata” For example, in textual entailment… Motivation T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” P 1 : T 1  H 1 T2T2 H2H2 “They feed dolphins fishs” “Fishs eat dolphins” P 2 : T 2  H 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” P 3 : T 3  H 3 Training examples Classification Relevant Features feed  eat X X Y Y X X Y Y First-order rules

F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: a challenge Tripartite Directed Acyclic Graphs (tDAG) as a solution: –for modelling FOR feature spaces –for defining efficient algorithms for computing kernel functions with tDAGs in FOR feature spaces An efficient algorithm for computing kernels in FOR spaces Experimental and comparative assessment of the computational efficiency of the proposed algorithm In this talk…

F.M.Zanzotto University of Rome “Tor Vergata” We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function K(P 1,P 2 )=|S(P 1 )  S(P 2 )| that computes how many common first-order rules are activated from P 1 and P 2 Without loss of generality, we present the problem in syntactic-first-order rule feature spaces First-order rule (FOR) feature spaces: challenges

F.M.Zanzotto University of Rome “Tor Vergata” … using the Kernel Trick: –define the distance K(P 1, P 2 ) –instead of defining the feautures Observations T 1  H 1 T 1  H 2 K(T 1  H 1,T 1  H 2 )

F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers   VP S NP S VP 1 1   VBNP S VP VBNP     T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” T 1  H 1 feed eat Pa=Pa= S(P a )= Adding placeholders Propagating placeholders

F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VB eat VP VBNP feed NP NNS Babies NNS babies NN milk S NP NNS Mothers   NP NN milk T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” T3 H3T3 H3 Pb=Pb= S(P b )= VP S NP S VP 1 1   VBNP S VP VBNP     feed eat

F.M.Zanzotto University of Rome “Tor Vergata” First-order rule (FOR) feature spaces: challenges S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y VP S NP S VP 1 1   VBNP S VP VBNP     feed eat VP S NP S VP 1 1   VBNP S VP VBNP     feed eat K(P a,P b )=|S(P a )  S(P b )| S(P b )= S(P a )=   = = =

F.M.Zanzotto University of Rome “Tor Vergata” FOR feature spaces can be modelled with particular graphs We call these graphs tripartite direct acyclic graphs (tDAGs) Observations: –tDAGs are not trees –tDAGs can be used to model both rules and sentence pairs –unifying rules in sentences is a graph matching problem A step back…

F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers

F.M.Zanzotto University of Rome “Tor Vergata” As for Feature Structures… Tripartite Directed Acyclic Graphs (tDAG) S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y S VP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers

F.M.Zanzotto University of Rome “Tor Vergata” S NPVP NP eat VP VB feed NP VB A tripartite directed acyclic graph is a graph G = (N,E) where: the set of nodes N is partitioned in three sets N t, N g, and A the set of edges is partitioned in four sets N t, N g, E A(t), and E A(g) where t = (N t,E t ) and g = (N t,E t ) are two trees E A(t) = {(x, y)|x  N t and y  A} E A(g) = {(x, y)|x  N g and y  A} Tripartite Directed Acyclic Graphs (tDAGs)

F.M.Zanzotto University of Rome “Tor Vergata” Alternative definition A tDAG is a pair of extented trees G = (  ) where:  = (N t  A t,E t  E A(t) ) and  = (N g  A g,E g  E A(g) ). Tripartite Directed Acyclic Graphs (tDAGs) S NPVP NP eat VP VB feed NP VB S NPVP NP eat VP VB feed NP VB X X Y Y X X Y Y

F.M.Zanzotto University of Rome “Tor Vergata” Computing the implicit kernel function K(P 1,P 2 )=|S(P 1 )  S(P 2 )| involves general graph matching. This is an exponential problem. Yet… tDAGs are particular graphs and we can define an efficient algorithm We will analyze the isomorphism among tDAGs and we will derive an algorithm for Again challenges

F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between graphs G 1 =(N 1,E 1 ) and G 2 =(N 2,E 2 ) are isomorphic if: –|N 1 |=|N 2 | and |E 1 |=|E 2 | –Among all the bijecive functions relating N 1 and N 2, it exists f : N 1  N 2 such that: for each n 1 in N 1, Label(n 1 )=Label(f(n 1 )) for each (n a,n b ) in E 1, (f(n a ),f(n b )) is in E 2 Isomorphism between tDAGs

F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism adapted to tDAGs G 1 = (  1  1 ) and G 2 = (  2  2 ) are isomorphic if these two properties hold –Partial isomorphism  and  are isomorphic  and  are isomorphic This property generates two functions f  and f  –Constraint compatibility f  and f  are compatible on the sets of nodes A 1 and A 2, if for each n  A 1, it happens that f  (n) = f  (n). Isomorphism between tDAGs

F.M.Zanzotto University of Rome “Tor Vergata” Isomorphism between tDAGs VP VBNP S VP VBNP   VP VBNP S VP VBNP   C=C= C  = C  {), (),(}, C=C= {), (),(}, Partial isomorphism Constraint compatibility P a =(  a  a )= P b =(  b  b )=

F.M.Zanzotto University of Rome “Tor Vergata” We define K(P 1,P 2 )=|S(P 1 )  S(P 2 )| using the isomorphism between tDAGs The idea: reverse the order of isomorphism detection First, constraint compatibility –Building a set C of all the relevant alternative constraints –Finding subsets of S(P 1 )  S(P 2 ) meeting a constraint c  C Second, partial isomorphism detection Ideas for building the kernel subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM     C={c 1,c 2 }={ {), (),(},, {), (),(}, } K(P a,P b )=|S(P a )  S(P b )| P a =(  a  a )= P b =(  b  b )= subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM     {), (),(},c1=c1= A BC BB I MN N N   A BC I MN N N   A BC BB I MN   A BC I MN         C={c 1,c 2 } S(P a )  S(P b )) c1 = Pa=Pa= Pb=Pb= subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility K(P a,P b )=|S(P a )  S(P b )| K(P a,P b )=|S(P a )  S(P b )|=|(S(P a )  S(P b )) c1  (S(P a )  S(P b )) c2 |

F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC C C 2 2 BB A BC C C 3 3 BB I MN N N 1 1 MM I MN N N 1 1 MM     {), (),(},c2=c2= A BC CC I MN M M   A BC I MN N N   A BC CC I MN   A BC I MN         C={c 1,c 2 } K(P a,P b )=|S(P a )  S(P b )|=|(S(P a )  S(P b )) c1  (S(P a )  S(P b )) c2 | Pa=Pa= Pb=Pb= S(P a )  S(P b )) c2 = subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

F.M.Zanzotto University of Rome “Tor Vergata” Ideas for building the kernel A BC BB I MN N N   A BC I MN N N   A BC BB I MN   A BC I MN          A BC BB I MN N N  A BC I MN   (S(P a )  S(P b )) c1 =(S(  a )  S(  b )) c1  S(  a )  S(  b )) c1 K(P a,P b )=|  c  C (S(P a )  S(P b )) c |=|  c  C (S(  a )  S(  b )) c  (S(  a )  S(  b )) c | subsets of S(P 1 )  S(P 2 ) Alternative constraints Partial Isomorphism Constraint compatibility

F.M.Zanzotto University of Rome “Tor Vergata” The general Equation can be computed using: 1)K S (kernel function for trees) introduced in(Duffy&Collins, 2001) and refined in (Moschitti&Zanzotto, 2007) 2)The inclusion exclusion principle Kernel on FOR feature spaces K(P 1,P 2 )=|  c  C (S(  1 )  S(  2 )) c  (S(  1 )  S(  2 )) c |

F.M.Zanzotto University of Rome “Tor Vergata” Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007) Test-bed: corpus –Recognizing Textual Entailment challenge data Computational Efficency Analysis

F.M.Zanzotto University of Rome “Tor Vergata” Computational Efficency Analysis Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders

F.M.Zanzotto University of Rome “Tor Vergata” Training: RTE 1, 2, 3 Testing: RTE 4 Accuracy Comparison

F.M.Zanzotto University of Rome “Tor Vergata” We reduced kernels in first-order feature spaces as graph-matching problems We defined a new class of graphs, tDAGs We presented an efficient algorithm for computing kernels in FOR feature spaces Conclusions

F.M.Zanzotto University of Rome “Tor Vergata” S NPVP VBNP eat VP VBNP feed NP NNS Cows NNNNS animalextracts NNS cows NNNNS animalextracts S NP NNS Farmers   VP S NP S VP 1 1   VBNP S VP VBNP    

F.M.Zanzotto University of Rome “Tor Vergata” VP S NP S VP 1 1   VBNP S VP VBNP     S NPVP VB eat VP VBNP feed NP NNS Cows NNS babies NN milk S NP NNS Mothers   NP NN milk VP S NP S VP 1 1   VBNP S VP VBNP    