Longhua Qian School of Computer Science and Technology

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Reporter: Longhua Qian School of Computer Science and Technology

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Longhua Qian School of Computer Science and Technology

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.

A Language Independent Method for Question Classification COLING 2004.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:

Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.

Natural Language Processing Vasile Rus

Automatically Labeled Data Generation for Large Scale Event Extraction

Queensland University of Technology

CSC 594 Topics in AI – Natural Language Processing

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Deep Learning for Bacteria Event Identification

CSC 594 Topics in AI – Natural Language Processing

PRESENTED BY: PEAR A BHUIYAN

2 Research Department, iFLYTEK Co. LTD.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Statistical NLP: Lecture 3

Bag-of-Visual-Words Based Feature Extraction

Semantic Parsing for Question Answering

Relation Extraction CSCI-GA.2591

Improving a Pipeline Architecture for Shallow Discourse Parsing

Mean Shift Segmentation

Efficient Estimation of Word Representation in Vector Space

Social Knowledge Mining

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

CSCI 5832 Natural Language Processing

Natural Language - General

Discriminative Frequent Pattern Analysis for Effective Classification

iSRD Spam Review Detection with Imbalanced Data Distributions

Automatic Extraction of Hierarchical Relations from Text

Using Uneven Margins SVM and Perceptron for IE

Hierarchical, Perceptron-like Learning for OBIE

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Donghui Zhang, Tian Xia Northeastern University

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Extracting Why Text Segment from Web Based on Grammar-gram

COMPILER CONSTRUCTION

Introduction Dataset search

Presentation transcript:

Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction Longhua Qian School of Computer Science and Technology Soochow University, Suzhou, China 19 Aug. 2008 COLING 2008, Manchester, UK Good morning, everyone! It’s my great pleasure to share my research experience with everyone here. My name is .. And I am from Soochow university in china. My topic is …

Outline 1. Introduction 2. Related Work 3. Dynamic Syntactic Parse Tree 4. Entity-related Semantic Tree 5. Experimental results 6. Conclusion and Future Work This is the outline of my presentation. The first section is … The second one is … The third one is … The forth one is … The fifth one is … And the last one is …

1. Introduction Information extraction is an important research topic in NLP. It attempts to find relevant information from a large amount of text documents available in digital archives and the WWW. Information extraction by NIST ACE Entity Detection and Tracking (EDT) Relation Detection and Characterization (RDC) Event Detection and Characterization (EDC) First Let’s have a look at the introduction. … According to NIST ACE definition, Information Extraction subsumes three following subtasks EDT means .. RDC means … And EDC means … Our focus is on RDC, that is, relation extraction in general.

RDC Function RDC detects and classifies semantic relationships (usually of predefined types) between pairs of entities. Relation extraction is very useful for a wide range of advanced NLP applications, such as question answering and text summarization. E.g. The sentence “Microsoft Corp. is based in Redmond, WA” conveys the relation “GPE-AFF.Based” between “Microsoft Corp” (ORG) and “Redmond” (GPE).

2. Related work Feature-based methods Kernel-based methods have dominated the research in relation extraction over the past years. However, relevant research shows that it’s difficult to extract new effective features and further improve the performance. Kernel-based methods compute the similarity of two objects (e.g. parse trees) directly. The key problem is how to represent and capture structured information in complex structures, such as the syntactic information in the parse tree for relation extraction. Typically, there exist two approaches to relation extraction

Kernel-based related work Zelenko et al. (2003), Culotta and Sorensen (2004), Bunescu and Mooney (2005) described several kernels between shallow parse trees or dependency trees to extract semantic relations. Zhang et al. (2006), Zhou et al. (2007) proposed composite kernels consisting of a linear kernel and a convolution parse tree kernel, with the latter effectively capture structured syntactic information inherent in parse trees. kernel-based methods for relation extraction include the following work.

Structured syntactic information A tree span for relation instance part of a parse tree used to represent the structured syntactic information including two involved entities. Two currently used tree spans SPT(Shortest Path-enclosed Tree): the sub-tree enclosed by the shortest path linking the two entities in the parse tree (Zhang et al., 2006) CS-SPT(Context-Sensitive Shortest Path-enclosed Tree): Dynamically determined by further extending the necessary predicate-linked path information outside SPT. (Zhou et al., 2007) A tree span …. Currently there are two… One is SPT, … The other one is CS-SPT …

Current problems Noisy information Useful information Both SPT and CS-SPT may still contain noisy information. In other words, more noise could be pruned away from these tree spans. Useful information CS-SPT only captures part of context-sensitive information only relating to predicate-linked path. That is to say, more information outside SPT/CS-SPT may be recovered so as to discern their relationships. However, there still exist several problems relating to the tree span for relation extraction. One is … The other is …

Our solution Dynamic Syntactic Parse Tree (DSPT) Based on MCT (Minimum Complete Tree), we exploit constituent dependencies to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. Unified Parse and Semantic Tree (UPST) Instead of constructing composite kernels, various kinds of entity-related semantic information, are unified into a Dynamic Parse and Semantic Tree. Our solution to these problems is to construct DSPT and UPST. DSPT is Dynamic Syntactic Parse Tree, … UPST is Unified Parse and Semantic Tree, …

3. Dynamic Syntactic Parse Tree Motivation of DSPT Dependency plays a key role in relation extraction, e.g. the dependency tree (Culotta and Sorensen, 2004) or the shortest dependency path (Bunescu and Mooney, 2005). Constituent dependencies In a parse tree, each CFG rule has the following form: P  Ln…L1 H R1…Rm Where the parent node P depends on the head child H, this is what we call constituent dependency. Our hypothesis stipulates that the contribution of the parse tree to establishing a relationship is almost exclusively concentrated in the path connecting the two entities, as well as the head children of constituent nodes along this path. Now, let’s turn to the third section--DSPT. Dependency plays … On the other hand, … therefore, our hypothesis …

Generation of DSPT Starting from the Minimum Complete Tree, along the path connecting two entities, the head child of every node is found according to various constituent dependencies. Then the path nodes and their head children are kept while any other nodes are removed from the parse tree. Eventually we arrive at a tree span called Dynamic Syntactic Parse Tree (DSPT) Let’s look at the generation of DSPT

Constituent dependencies (1) Modification within base-NPs Base-NPs do not directly dominate an NP themselves Hence, all the constituents before the headword may be removed from the parse tree, while the headword and the constituents right after the headword remain unchanged. Modification to NPs Contrary to the first one, these NPs are recursive, meaning that they contain another NP as their child. They usually appear as follows: NP  NP SBAR [relative clause] NP  NP VP [reduced relative] NP  NP PP [PP attachment] In this case, the right side (e.g. “NP VP”) can be reduced to the left hand side, which is exactly a single NP. Constituent dependencies can be classified into 5 categories according to constituent types of the CFG rules:

Constituent dependencies (2) Arguments/adjuncts to verbs: This type includes the CFG rules in which the left side contains S, SBAR or VP. Both arguments and adjuncts depend on the verb and could be removed if they are not included in the path connecting the two entities. Coordination conjunctions: In coordination constructions, several peer conjuncts may be reduced into a single constituent, for we think all the conjuncts play an equal role in relation extraction. Modification to other constituents: Except for the above four types, other CFG rules fall into this type, such as modification to PP, ADVP and PRN etc. These cases occur much less frequently than others.

Some examples of DSPT These are some examples of DSPT, Typically (a) shows how the constituents before the 2nd entity can be removed. (c) shows how the modification to NP (“nominated for …”) can be removed. (e) shows all the conjuncts other than the one containing the entity may be reduced into a single NP. Some examples of DSPT

4.Entity-related Semantic Tree For the example sentence “they ’re here”, which is excerpted from the ACE RDC 2004 corpus, there exists a relationship “Physical.Located” between the entities “they” [PER] and “here” [GPE.Population-Center]. The features are encoded as “TP”, “ST”, “MT” and “PVB”, which denote type, subtype, mention-type of the two entities, and the base form of predicate verb if existing (nearest to the 2nd entity along the path connecting the two entities) respectively. The following section is about EST. This illustration shows three different kinds of EST setups incorporated with entity types/subtypes, mention types and predicate verb.

Three EST setups (a) Bag of Features (BOF): all feature nodes uniformly hang under the root node, so the tree kernel simply counts the number of common features between two relation instances. (b) Feature-Paired Tree (FPT): the features of two entities are grouped into different types according to their feature names, e.g. “TP1” and “TP2” are grouped to “TP”. This tree setup is aimed to capture the additional similarity of the single feature combined from different entities, i.e., the first and the second entities. (c) Entity-Paired Tree (EPT): all the features relating to an entity are grouped to nodes “E1” or “E2”, thus this tree kernel can further explore the equivalence of combined entity features only relating to one of the entities between two relation instances.

Construction of UPST Motivation How we incorporate the EST into the DSPT to produce a Unified Parse and Semantic Tree (UPST) to investigate the contribution of the EST to relation extraction. How Detailed evaluation (Qian et al., 2007) indicates that the kernel achieves the best performance when the feature nodes are attached under the top node. Therefore, we also attach three kinds of entity-related semantic trees (i.e. BOF, FPT and EPT) under the top node of the DSPT right after its original children. Then, we look into the construction of UPST

5. Experimental results Corpus Statistics Corpus processing The ACE RDC 2004 data contains 451 documents and 5702 relation instances. It defines 7 entity major types, 7 major relation types and 23 relation subtypes. Evaluation is done on 347 (nwire/bnews) documents and 4307 relation instances using 5-fold cross-validation. Corpus processing parsed using Charniak’s parser (Charniak, 2001) Relation instances are generated by iterating over all pairs of entity mentions occurring in the same sentence. The fifth section is about experimental results. The corpus we used is the ACE RDC 2004 dataset, this dataset contains … For comparison purposes, evaluation is done…. First the corpus is parsed … Then relation instances … Entity major types: PER, ORG, GPE, LOC, FAC, VEH, WEA Relation major types: PHY, PER-SOC, EMP-ORG, ART, OTHER-AFF, GPE-AFF, DISC

Classifier Tools One vs. others strategy SVMLight (Joachims 1998) Tree Kernel Toolkits (Moschitti 2004) The training parameters C (SVM) and λ (tree kernel) are also set to 2.4 and 0.4 respectively. One vs. others strategy which builds K basic binary classifiers so as to separate one class from all the others. The tools we used include .. by …and... by … For comparison purposes, the training … And for efficiency consideration, we also apply one vs. others strategy…

Contributions of various dependencies Two modes: --[M1] Respective: every constituent dependency is individually applied on MCT. --[M2] Accumulative: every constituent dependency is incrementally applied on the previously derived tree span, which begins with the MCT and eventually gives rise to a Dynamic Syntactic Parse Tree (DSPT). Dependency types P R F MCT (baseline) 75.1 53.8 62.7 Modification within base-NPs 76.5 (76.5) 59.8 (59.8) 67.1 (67.1) Modification to NPs 77.0 (76.2) 63.2 (56.9) 69.4 (65.1) Arguments/adjuncts to verb 77.1 (76.1) 63.9 (57.5) 69.9 (65.5) Coordination conjunctions 77.3 (77.3) 65.2 (55.1) 70.8 (63.8) Other modifications 77.4 (75.0) 65.4 (53.7) 70.9 (62.6) This table indicates the contribution of various dependencies on the major relation types in the ACE RDC 2004 corpus.

Contributions of various dependency The table shows that the final DSPT achieves the best performance of 77.4%/65.4%/70.9 in precision/recall/F-measure respectively after applying all the dependencies, with the increase of F-measure by 8.2 units over the baseline MCT. This indicates that reshaping the tree by exploiting constituent dependencies may significantly improve extraction accuracy largely due to the increase in recall. And modification within base-NPs contributes most to performance improvement, acquiring the increase of F-measure by 4.4 units. This indicates the local characteristic of semantic relations, which can be effectively captured by NPs around the two involved entities in the DSPT.

Comparison of different UPST setups Tree Setups P R F DSPT 77.4 65.4 70.9 UPST (BOF) 80.4 69.7 74.7 UPST (FPT) 80.1 70.7 75.1 UPST (EPT) 79.9 70.2 74.8 Compared with DSPT, Unified Parse and Semantic Trees (UPSTs) significantly improve the F-measure by average ~4 units due to the increase both in precision and recall. Among the three UPSTs, UPST (FPT) achieves slightly better performance than the other two setups. This tables compares the performance of different UPST setups, i.e. …, … and … It shows that … This means that they can effectively capture both the structured syntactic information and the entity-related semantic features. And … This suggests that additional bi-gram entity features captured by FPT are more useful than tri-gram entity features captured by EPT.

Improvements of different tree setups over SPT CS-SPT over SPT 1.5 1.1 1.3 DSPT over SPT 0.1 5.6 3.8 UPST(FPT) over SPT 10.9 8.0 It shows that Dynamic Syntactic Parse Tree (DSPT) outperforms both SPT and CS-SPT setups. Unified Parse and Semantic Tree with Feature-Paired Tree performs best among all tree setups. This tables compares the performance improvements of different tree setups over the original SPT. It shows that … And … This implies that the entity-related semantic information is very useful and contributes much when they are incorporated into the parse tree for relation extraction.

Comparison with best-reported systems Systems (composite) P R F Systems (single) Ours: Composite kernel 83.0 72.0 77.1 CTK with UPST 80.1 70.7 75.1 Zhou et al.: 82.2 70.2 75.8 Zhou et al.: CS-CTK with CS-SPT 81.1 66.7 73.2 Zhang et al.: 76.1 68.4 72.1 CTK with SPT 74.1 62.4 67.7 Zhao and Grishman 69.2 70.5 70.4 Finally, we compare our system with the best-reported relation extraction systems on the ACE RDC 2004 corpus. … It shows that Our composite kernel achieves the so far best performance. And our UPST performs best among tree setups using one single kernel, and even better than the two previous composite kernels.

6. Conclusion Dynamic Syntactic Parse Tree (DPST), which is generated by exploiting constituent dependencies, can significantly improve the performance over currently used tree spans for relation extraction. In addition to individual entity features, combined entity features (especially bi-gram) contribute much when they are integrated with a DPST into a Unified Parse and Semantic Tree. The last section is conclusion. From the previous experimental results, we can draw the following conclusions.

Future Work we will focus on improving performance of complex structured parse trees, where the path connecting the two entities involved in a relationship is too long for current kernel methods to take effect. Our preliminary experiment of applying some discourse theory exhibits certain positive results. As to further work, we will…

References Bunescu R. C. and Mooney R. J. 2005. A Shortest Path Dependency Kernel for Relation Extraction. EMNLP-2005 Chianiak E. 2001. Intermediate-head Parsing for Language Models. ACL-2001 Collins M. and Duffy N. 2001. Convolution Kernels for Natural Language. NIPS-2001 Collins M. and Duffy, N. 2002. New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. ACL-02 Culotta A. and Sorensen J. 2004. Dependency tree kernels for relation extraction. ACL’2004. Joachims T. 1998. Text Categorization with Support Vector Machine: learning with many relevant features. ECML-1998 Moschitti A. 2004. A Study on Convolution Kernels for Shallow Semantic Parsing. ACL-2004 Qian, Longhua, Guodong Zhou, Qiaoming Zhu and Peide Qian. 2007. Relation Extraction using Convolution Tree Kernel Expanded with Entity Features. PACLIC21 Zelenko D., Aone C. and Richardella A. 2003. Kernel Methods for Relation Extraction. Journal of MachineLearning Research. 2003(2): 1083-1106 Zhang M., , Zhang J. Su J. and Zhou G.D. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. COLING-ACL’2006. Zhao S.B. and Grisman R. 2005. Extracting relations with integrated information using kernel methods. ACL’2005. Zhou G.D., Su J., Zhang J. and Zhang M. 2005. Exploring various knowledge in relation extraction. ACL’2005. Zhou, Guodong, Min Zhang, Donghong Ji and Qiaoming Zhu. 2007. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information. EMNLP/CoNLL-2007

End Thank You!