Distributed Tree Kernels and Distributional Semantics: Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART.

Slides:



Advertisements
Similar presentations
EVALITA 2009 Recognizing Textual Entailment (RTE) Italian Chapter Johan Bos 1, Fabio Massimo Zanzotto 2, Marco Pennacchiotti 3 1 University of Rome La.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Fast Algorithms For Hierarchical Range Histogram Constructions
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
An Introduction of Support Vector Machine
Fabio Massimo Zanzotto and Danilo Croce University of Rome “Tor Vergata” Roma, Italy Reading what Machines ‘Think’
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification.
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Fabio Massimo Zanzotto
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Graphical models for part of speech tagging
Similarity measuress Laboratory of Image Analysis for Computer Vision and Multimedia Università di Modena e Reggio Emilia,
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.
Benk Erika Kelemen Zsolt
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Fabio Massimo Zanzotto Alessandro Moschitti Experimenting a “general purpose” textual entailment learner in AVE University of Rome “Tor Vergata” Italy.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
DeepWalk: Online Learning of Social Representations
CS 9633 Machine Learning Support Vector Machines
Linguistic Graph Similarity for News Sentence Searching
k-Nearest neighbors and decision tree
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Web News Sentence Searching Using Linguistic Graph Similarity
Learning Textual Entailment from Examples
Relation Extraction CSCI-GA.2591
Machine Learning Basics
Efficient Estimation of Word Representation in Vector Space
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Asymmetric Transitivity Preserving Graph Embedding
Text Categorization Berlin Chen 2003 Reference:
Learning to Rank with Ties
Word representations David Kauchak CS158 – Fall 2016.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Distributed Tree Kernels and Distributional Semantics: Between Syntactic Structures and Compositional Distributional Semantics Fabio Massimo Zanzotto ART Group Dipartimento di Ingegneria dell’Impresa University of Rome ”Tor Vergata”

© F.M.Zanzotto University of Rome “Tor Vergata” Prequel

© F.M.Zanzotto University of Rome “Tor Vergata” Textual Entailment Recognition T2T2 H2H2 “Kesslers team conducted 60,643 face-to-face interviews with adults in 14 countries” “Kesslers team interviewed more than 60,000 adults in 14 countries” T 2  H 2 Recognizing Textual Entailment (RTE) is a classification task: Given a pair decide if T implies H or T does not implies H In (Dagan et al. 2005), RTE has been proposed as a common semantic task for question-answering, information retreival, machine translation, and summarization.

© F.M.Zanzotto University of Rome “Tor Vergata” Learning RTE Classifiers T1T1 H1H1 “Farmers feed cows animal extracts” “Cows eat animal extracts” P 1 : T 1  H 1 T2T2 H2H2 “They feed dolphins fishs” “Fishs eat dolphins” P 2 : T 2  H 2 T3T3 H3H3 “Mothers feed babies milk” “Babies eat milk” P 3 : T 3  H 3 Training examples Classification Relevant Features Rules with Variables (First-order rules) feed  eat X X Y Y X X Y Y X X Y Y Y Y X X X X Y Y X X Y Y

© F.M.Zanzotto University of Rome “Tor Vergata” Average PrecisionAccuracyFirst Author (Group) 80.8%75.4%Hickl (LCC) 71.3%73.8%Tatu (LCC) 64.4%63.9%Zanzotto (Milan & Rome) 62.8%62.6%Adams (Dallas) 66.9%61.6%Bos (Rome & Leeds) Feature Spaces of Syntactic Rules with Variables S NPVP VBNP X X Y Y eat VP VBNP X X feed NP Y Y  Rules with Variables (First-order rules) feed  eat X X Y Y X X Y Y Zanzotto&Moschitti, Automatic learning of textual entailments with cross-pair similarities, Coling-ACL, 2006 RTE 2 Results

© F.M.Zanzotto University of Rome “Tor Vergata” Adding semantics Shallow semantics Pennacchiotti&Zanzotto, Learning Shallow Semantic Rules for Textual Entailment, Proceeding of RANLP, 2007 T H “For my younger readers, Chapman killed John Lennon more than twenty years ago.” “John Lennon died more than twenty years ago.” T  HT  H Learning example NP VP VBNP Y Y X X S VP VB Y Y S X X A generalized rule causes  cs killed died Variables with Types

© F.M.Zanzotto University of Rome “Tor Vergata” Adding semantics Distributional Semantics Mehdad, Moschitti, Zanzotto, Syntactic/Semantic Structures for Textual Entailment Recognition, Proceedings of NAACL, 2010 NP VP VBNP X X S VP VB S X X  killed died NP VP VBNP X X VP VB X X  murdered died S S Promising!!! Distributional Semantics

© F.M.Zanzotto University of Rome “Tor Vergata” Compositional Distributional Semantics hands car moving moving hands moving car A “distributional” semantic spaceComposing “distributional” meaning

© F.M.Zanzotto University of Rome “Tor Vergata” Compositional Distributional Semantics Mitchell&Lapata (2008) propose a general model for bigrams that assigns a distributional meaning to a sequence of two words “x y”: –R is the relation between x and y –K is an external knowledge handsmoving y x moving hands z f

© F.M.Zanzotto University of Rome “Tor Vergata” Matrices A R and B R can be estimated with: -positive examples taken from dictionaries -multivariate regression models CDS: Additive Model The general additive model Zanzotto, Korkontzelos, Fallucchi, Manandhar, Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd COLING, 2010 contact / ˈ k ɒ ntækt/ [kon-takt] 2. close interaction

© F.M.Zanzotto University of Rome “Tor Vergata” Recursive Linear CDS eat cowsextracts animal VN NN Let’s scale up to sentences by recursively applying the model! Let’s apply it to RTE Extremely poor results

© F.M.Zanzotto University of Rome “Tor Vergata” Recursive Linear CDS: a closer look «chickens eat beef extracts» «cows eat animal extracts» … f f … evaluating the similarity

© F.M.Zanzotto University of Rome “Tor Vergata” Recursive Linear CDS: a closer look structure meaning structure meaning <1 ? structure meaning

© F.M.Zanzotto University of Rome “Tor Vergata” The prequel … structure meaning Recognizing Textual Entailment Feature Spaces of the Rules with Variables adding shallow semantics adding distributional semantics Distributional Semantics Binary CDS Recursive CDS

© F.M.Zanzotto University of Rome “Tor Vergata” Distributed Tree Kernels structure

© F.M.Zanzotto University of Rome “Tor Vergata” Tree Kernels VP VBNP S NNS VP VBNP feed NP NNS cows NNNNS animalextracts S NP NNS Farmers VP VBNP S NNS Farmers ……… ……… T ii jj structure

© F.M.Zanzotto University of Rome “Tor Vergata” Tree Kernels in Smaller Vectors VP VBNP S NNS VP VBNP feed NP NNS cows NNNNS animalextracts S NP NNS Farmers VP VBNP S NNS Farmers ……… ……… T ii jj ……… CDS desiderata - Vectors are smaller - Vectors are obtained with a Compositional Function - Vectors are smaller - Vectors are obtained with a Compositional Function

© F.M.Zanzotto University of Rome “Tor Vergata” Names for the «Distributed» World ……… Distributed Trees (DT) Distributed Tree Fragments (DTF) Distributed Tree Kernels (DTK) As we are encoding trees in small vectors, the tradition is distributed structures (Plate, 1994)

© F.M.Zanzotto University of Rome “Tor Vergata” Outline DTK: Expected properties and challenges Model: Distributed Tree Fragments Distributed Trees Experimental evaluation Remarks Back to Compositional Distributional Semantics Future Work

© F.M.Zanzotto University of Rome “Tor Vergata” Compositionally building Distributed Tree Fragments Distributed Tree Fragments are a nearly orthonormal base that embeds R m in R d Distributed Trees can be efficiently computed DTKs shuold approximate Tree Kernels DTK: Expected properties and challenges Property 1 (Nearly Unit Vectors) Property 2 (Nearly Orthogonal Vectors)

© F.M.Zanzotto University of Rome “Tor Vergata” Compositionally building Distributed Tree Fragments Distributed Tree Fragments are a nearly orthonormal base that embeds R m in R d Distributed Trees can be efficiently computed DTKs shuold approximate Tree Kernels DTK: Expected properties and challenges Property 1 (Nearly Unit Vectors) Property 2 (Nearly Orthogonal Vectors)

© F.M.Zanzotto University of Rome “Tor Vergata” Compositionally building Distributed Tree Fragments Basic elements N a set of nearly orthogonal random vectors for node labels  a basic vector composition function with some ideal properties A distributed tree fragment is the application of the composition function  on the node vectors, according to the order given by a depth first visit of the tree.

© F.M.Zanzotto University of Rome “Tor Vergata” Building Distributed Tree Fragments Properties of the Ideal function  Property 1 (Nearly Unit Vectors) Property 2 (Nearly Orthogonal Vectors) 1.Non-commutativity with a very high degree k 2.Non-associativity 3.Bilinearity Approximation we demonstrated DTF are a nearly orthonormal base (see Lemma 1 and Lemma 2 in the paper) Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

© F.M.Zanzotto University of Rome “Tor Vergata” Compositionally building Distributed Tree Fragments Distributed Tree Fragments are a nearly orthonormal base that embeds R m in R d Distributed Trees can be efficiently computed DTKs shuold approximate Tree Kernels DTK: Expected properties and challenges Property 1 (Nearly Unit Vectors) Property 2 (Nearly Orthogonal Vectors)

© F.M.Zanzotto University of Rome “Tor Vergata” Building Distributed Trees Given a tree T, the distributed representation of its subtrees is the vector: where S(T) is the set of the subtrees of T VP VBNP S NNS VP VBNP feed NP NNS cows NNNNS animalextracts S NP NNS Farmers VP VBNP S NNS Farmers … S(S() = {,}

© F.M.Zanzotto University of Rome “Tor Vergata” Building Distributed Trees A more efficient approach N(T) is the set of nodes of T s(n) is defined as: if n is terminal if n  c 1 …c k Computing a Distributed Tree is linear with respect to the size of N(T)

© F.M.Zanzotto University of Rome “Tor Vergata” Building Distributed Trees A more efficient approach Assuming the ideal basic composition function , it is possible to show that it exactly computes: (see Theorem 1 in the paper) Zanzotto&Dell'Arciprete, Distributed Tree Kernels, Proceedings of ICML, 2012

© F.M.Zanzotto University of Rome “Tor Vergata” Compositionally building Distributed Tree Fragments Distributed Tree Fragments are a nearly orthonormal base that embeds R m in R d Distributed Trees can be efficiently computed DTKs shuold approximate Tree Kernels DTK: Expected properties and challenges Property 1 (Nearly Unit Vectors) Property 2 (Nearly Orthogonal Vectors)

© F.M.Zanzotto University of Rome “Tor Vergata” Experimental evaluation Concrete Composition Functions Evaluation: How well can concrete composition functions approximate ideal function  ? Direct Analysis: How well do DTKs approximate the original tree kernels (TKs)? Task-based Analysis: How well do DTKs perform on actual NLP tasks, with respect to TKs? Vector dimension = 8192

© F.M.Zanzotto University of Rome “Tor Vergata” Towards the reality: Approximating   is an ideal function! Proposed approximations: Shuffled normalized element-wise product Shuffled circular convolution It is possible to show that properties of  statistically hold for the two approximations

© F.M.Zanzotto University of Rome “Tor Vergata” Empirical Evaluation of Properties Non-commutativity Distributivity over the sum Norm preservation Orthogonality preservation OK ?

© F.M.Zanzotto University of Rome “Tor Vergata” Direct Analysis for z Spearman’s correlation between DTK and TK values Test trees taken from QC corpus and RTE corpus

© F.M.Zanzotto University of Rome “Tor Vergata” Task-based Analysis for x Question ClassificationRecognizing Textual Entailment

© F.M.Zanzotto University of Rome “Tor Vergata” Remarks ……… Distributed Trees (DT) Distributed Tree Fragments (DTF) Distributed Tree Kernels (DTK) are a nearly orthonormal base that embeds R m in R d can be efficiently computed approximate Tree Kernels

© F.M.Zanzotto University of Rome “Tor Vergata” Side effect Tree kernels (TK) (Collins & Duffy, 2001) have quadratic time and space complexity. Current techniques control this complexity by: exploiting of some specific characteristics of trees (Moschitti, 2006) selecting subtrees headed by specific node labels (Rieck et al., 2010) exploiting dynamic programming on the whole training and application sets of instances (Shin et al.,2011) Encoding trees in small vectors (in line with distributed structures (Plate, 1994)) Our Proposal

© F.M.Zanzotto University of Rome “Tor Vergata” Structured Feature Spaces: Dimensionality Reduction VP VBNP S NNS VP VBNP feed NP NNS cows NNNNS animalextracts S NP NNS Farmers VP VBNP S NNS Farmers ……… ……… T ii jj ……… Traditional Dimensionality Reduction Techniques Singular Value Decomposition Random Indexing Feature Selection Traditional Dimensionality Reduction Techniques Singular Value Decomposition Random Indexing Feature Selection Not applicable

© F.M.Zanzotto University of Rome “Tor Vergata” Computational Complexity of DTK n size of the tree k selected tree fragments q  reducing factor O(.) worst-case complexity A(.) average-case complexity

© F.M.Zanzotto University of Rome “Tor Vergata” Time Complexity Analysis DTK time complexity is independent of the tree sizes!

© F.M.Zanzotto University of Rome “Tor Vergata” Outline DTK: Expected properties and challenges Model: Distributed Tree Fragments Distributed Trees Experimental evaluation Remarks Back to Compositional Distributional Semantics Future Work

© F.M.Zanzotto University of Rome “Tor Vergata” Towards Distributional Distributed Trees Distributed Tree Fragments –Non-terminal nodes n: random vectors –Terminal nodes w: random vectors Distributional Distributed Tree Fragments –Non-terminal nodes n: random vectors –Terminal nodes w: distributional vectors Caveat: Property 2 Random vectors are nearly orthogonal Distributional vectors are not Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

© F.M.Zanzotto University of Rome “Tor Vergata” Experimental Set-up Task Based Comparison: –Corpus: RTE1,2,3,5 –Measure: Accuracy Distributed/Distributional Vector Size: 250 Distributional Vectors: –Corpus: UKWaC (Ferraresi et al., 2008) –LSA: applied with k=250 Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

© F.M.Zanzotto University of Rome “Tor Vergata” Accuracy Results Zanzotto&Dell‘Arciprete, Distributed Representations and Distributional Semantics, Proceedings of the ACL-workshop DiSCo, 2011

© F.M.Zanzotto University of Rome “Tor Vergata” The plot so far… Recognizing Textual Entailment Feature Spaces of the Rules with Variables adding shallow semantics adding distributional semantics structure meaning Distributional Semantics Binary CDS Recursive CDS Tree Kernels Distributed Tree Kernels (DTK) meaning

© F.M.Zanzotto University of Rome “Tor Vergata” Distributed Tree Kernels –Applying the method to other tree and graph kernels –Optimizing the code with GPU programming (CUDA) –Using Distributed Trees for different applications for indexing structured information for Syntax-aware Information Retrieval or for indexing structured information for XML Information Retrieval … Compositional Distributional Semantics –Using the insight gained with DTKs to better understand how to produce syntax-aware CDS models (see preliminary investigation in Zanzotto&Dell’Arciprete, DISCO 2011) Future Work

© F.M.Zanzotto University of Rome “Tor Vergata” Lorenzo Dell’Arciprete Marco Pennacchiotti Alessandro Moschitti Yashar Mehdad Ioannis Korkontzelos Code: Credits SEMEVAL TASK 5: EVALUATING PHRASAL SEMANTICS SEMEVAL TASK 5: EVALUATING PHRASAL SEMANTICS

© F.M.Zanzotto University of Rome “Tor Vergata” Distributed Tree Kernels Compositional Distributional Semantics Brain&Computer VP VBNP S C N F VBNP S VP 

© F.M.Zanzotto University of Rome “Tor Vergata”

© F.M.Zanzotto University of Rome “Tor Vergata” Distributed Tree Kernels Zanzotto, F. M. & Dell'Arciprete, L. Distributed Tree Kernels, Proceedings of International Conference on Machine Learning, 2012 Tree Kernels and Distributional Sematics Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010 Compositional Distributional Semantics Zanzotto, F. M.; Korkontzelos, I.; Fallucchi, F. & Manandhar, S. Estimating Linear Models for Compositional Distributional Semantics, Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010 Distributed and Distributional Tree Kernels Zanzotto, F. M. & Dell'arciprete, L. Distributed Representations and Distributional Semantics, Proceedings of the ACL-HLT 2011 workshop on Distributional Semantics and Compositionality (DiSCo), 2011 If you want to read more… SEMEVAL TASK 5: EVALUATING PHRASAL SEMANTICS SEMEVAL TASK 5: EVALUATING PHRASAL SEMANTICS

© F.M.Zanzotto University of Rome “Tor Vergata” Initial Idea Zanzotto, F. M. & Moschitti, A. Automatic learning of textual entailments with cross- pair similarities, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006 First refinement of the algorithm Moschitti, A. & Zanzotto, F. M. Fast and Effective Kernels for Relational Learning from Texts, Proceedings of 24th Annual International Conference on Machine Learning, 2007 Adding shallow semantics Pennacchiotti, M. & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment, Proceeding of International Conference RANLP , 2007 A comprehensive description Zanzotto, F. M.; Pennacchiotti, M. & Moschitti, A. A Machine Learning Approach to Textual Entailment Recognition, NATURAL LANGUAGE ENGINEERING, 2009 My first life Learning Textual Entailment Recognition Systems

© F.M.Zanzotto University of Rome “Tor Vergata” Adding Distributional Semantics Mehdad, Y.; Moschitti, A. & Zanzotto, F. M. Syntactic/Semantic Structures for Textual Entailment Recognition, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010 A valid kernel with an efficient algorithm Zanzotto, F. M. & Dell'Arciprete, L. Efficient kernels for sentence pair classification, Conference on Empirical Methods on Natural Language Processing, 2009 Zanzotto, F. M.; Dell'arciprete, L. & Moschitti, A. Efficient Graph Kernels for Textual Entailment Recognition, FUNDAMENTA INFORMATICAE Applications Zanzotto, F. M.; Pennacchiotti, M. & Tsioutsiouliklis, K. Linguistic Redundancy in Twitter, Proceedings of 2011 Conference on Empirical Methods on Natural Language Processing (EmNLP), 2011 Extracting RTE Corpora Zanzotto, F. M. & Pennacchiotti, M. Expanding textual entailment corpora from Wikipedia using co- training, Proceedings of the COLING-Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010 Learning Verb Relations Zanzotto, F. M.; Pennacchiotti, M. & Pazienza, M. T. Discovering asymmetric entailment relations between verbs using selectional preferences, ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics My first life Learning Textual Entailment Recognition Systems

© F.M.Zanzotto University of Rome “Tor Vergata” Zanzotto, F. M. & Croce, D. Comparing EEG/ERP-like and fMRI-like Techniques for Reading Machine Thoughts, BI 2010: Proceedings of the Brain Informatics Conference - Toronto, 2010 Zanzotto, F. M.; Croce, D. & Prezioso, S. Reading what Machines "Think": a Challenge for Nanotechnology, Joint Conferences on Avdanced Materials, 2009 Zanzotto, F. M. & Croce, D. Reading what machines "think", BI 2009: Proceedings of the Brain Informatics Conference - Bejing, China, October 2009 Prezioso, S.; Croce, D. & Zanzotto, F. M. Reading what machines "think": a challenge for nanotechnology, JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2011 Zanzotto, F. M.; Dell'arciprete, L. & Korkontzelos, Y. Rappresentazione distribuita e semantica distribuzionale dalla prospettiva dell'Intelligenza Artificiale, TEORIE & MODELLI, 2010 My second life Parallels between Brains and Computers

© F.M.Zanzotto University of Rome “Tor Vergata” Quick background on Supervised Machine Learning Classifier Learner Instance Instance in a feature space yiyi {(x 1,y 1 ) (x 2,y 2 ) … (x n,y n )} Training Set Learnt Model xixi xixi

© F.M.Zanzotto University of Rome “Tor Vergata” Quick background on Supervised Machine Learning Classifier Instance Instance in a feature space xixi yiyi Learnt Model xjxj xixi xjxj Some Machine Learning Methods exploit the distance between instances in the feature space For these so-called Kernel Machines, we can use the Kernel Trick: «define the distance K(x 1, x 2 ) instead of directly representing instances in the feature space» K(x 1,x 2 )

© F.M.Zanzotto University of Rome “Tor Vergata” Thank you for the attention