1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Slides:



Advertisements
Similar presentations
Features, Formalized Stephen Mayhew Hyung Sul Kim 1.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
© Fraunhofer Institut für intelligente Analyse- und Informationssysteme IAIS Henrik Grosskreutz and Stefan Rüping Fraunhofer IAIS On Subgroup Discovery.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Trees. Introduction to Trees Trees are very common in computer science They come in different forms They are used as data representation in many applications.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Comparing Information Extraction Pattern Models Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.
Linguistic Graph Similarity for News Sentence Searching
Approaches to Machine Translation
Web News Sentence Searching Using Linguistic Graph Similarity
Relation Extraction CSCI-GA.2591
Machine Learning Basics
Deep learning and applications to Natural language processing
Hyper-parameter tuning for graph kernels via Multiple Kernel Learning
Introduction Task: extracting relational facts from text
Approaches to Machine Translation
Parsing Unrestricted Text
Word embeddings (continued)
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Modeling IDS using hybrid intelligent systems
Presentation transcript:

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree Kernels For Relation Extraction From Natural Language Text

2 Intelligente Analyse- und Informationssysteme Introduction  Relation Extraction is the task of identifying semantic relations between entities within the same sentence  Most approaches consider binary target relations “Recently Obama became the president of the USA.” Role(Obama, USA) “Ballmer is still the CEO of Microsoft.” Role(Ballmer, Microsoft) Examples

3 Intelligente Analyse- und Informationssysteme Parse Trees  Syntactial Parsers transform natural language sentences into tree structures Extensive information on syntactic structure Variety of tree types in NLP e.g. dependency, phrase grammar etc.  Focus here: Dependency Parse Trees A dependency tree is a structured representation of the grammatical dependency between the words of a sentence by a vertex labeled directed tree Bijection between words and vertices (implies ordering of children) Good for language with free word order e.g. German Can be generated with a high quality by freely available parsers

4 Intelligente Analyse- und Informationssysteme Dependency Trees

5 Intelligente Analyse- und Informationssysteme Kernels  Generalized dot product for linear classifiers which implicitly maps the observations into a higher-dimensional feature space A dot product in feature space can be expressed by a kernel working on the input space (kernel trick) This input space can consist of structured data like parse trees. In this setting the embedding into a high-dimensional space may only be implicit.  Usage of kernels over parse trees as structured information for RE has been shown to be useful: Culotta et al. 2004, Bunescu & Mooney 2005 (Dependency Parse Trees) Zang et al (Phrase Grammar Parse Trees)

6 Intelligente Analyse- und Informationssysteme Shortest Path Kernel (Bunescu & Mooney 2005)  Efficient Computation  Node similarity function  „same“ length restriction ~ very restrictive

7 Intelligente Analyse- und Informationssysteme Dependency Tree Kernel (Culotta et al. 2004)  Node similarity & node matching function  One „no match“ = „zero value“

8 Intelligente Analyse- und Informationssysteme Dependency Tree Kernel (Culotta et al. 2004)  Node similarity & node matching function  One „no match“ = „zero value“

9 Intelligente Analyse- und Informationssysteme Moitivation Of Our Approaches  DTK only considers the root nodes (of the subtrees) and their matching children Possibly discarding similar structures at lower levels of the trees  First Approach: Apply the DTK to every combination of nodes in the subtrees implied by the relation nodes

10 Intelligente Analyse- und Informationssysteme All Pairs-DTK DTK DTK of every possible Pair

11 Intelligente Analyse- und Informationssysteme Motivation Of Second Approach  Shortest Path Hypothesis (Bunsecu & Mooney) Most valuable information on the path between two entities Also valuable Information close to this Path  Second Approach: Consider the information below and on the path in parallel  Apply the DTK to the path node sequences instead of only to both root nodes Relaxing the same length restriction of the SPK by using all possible subsequences of nodes on the path

12 Intelligente Analyse- und Informationssysteme Path-DTK  Considers all subtrees on the path computing a similarity for each pair  Efficient computation by intelligent caching exploiting DT statistics subsequences of same lenth upto lengt q Penalty for gaps

13 Intelligente Analyse- und Informationssysteme Efficient Computation of the DTK  Culotta et al. applied the ideas of the String (subsequence) Kernel of Lodhi et al. for the computation of the child subsequences in  We adapted a later optimization (Christinini and J. Shawe-Taylor) to this problem  But: A closer look into the dataset reveals that the Dependency tree nodes have very few children

14 Intelligente Analyse- und Informationssysteme Efficient Computation of the DTK  The out-degrees of the nodes have the following Distribution:  The upper bounds hold for large values of m and n  We propose a different approach by efficient caching

15 Intelligente Analyse- und Informationssysteme Efficient Computation of the DTK  Child sequence length combinations (m>n)

16 Intelligente Analyse- und Informationssysteme Empirical Evaluation  Experiments conducted on ACE-2003 dataset Public benchmark dataset for Relation Extraction –Newspaper texts: entities and relations ~10k trees 5 Top Level Relation –Role, Social, Near, At and Part Experiments –5-times repeated 5-fold cv –Pre-specified Split

17 Intelligente Analyse- und Informationssysteme Empirical Results Pre-specified Training/Test Split Cross-validation

18 Intelligente Analyse- und Informationssysteme Results on Cross-Validation Path-DTK better for Relations with more data

19 Intelligente Analyse- und Informationssysteme Conclusion / Future Work  Performance depends on type of relation and training size  For specific relations sufficient performance for real applications  Combination with phrase grammar parse tree kernels reasonable (Reichartz et al. ACL ‘09)  Open Source version of our SVM-Kernel toolkit for structured data  More sophisticated node-similarity functions  Combination of ML-Approaches (e.g. pattern recognition)  Application to other languages (e.g. German)

20 Intelligente Analyse- und Informationssysteme The End Thank You! Questions?