A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July.

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

Face Alignment by Explicit Shape Regression

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Perceptron Learning Rule

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Jointly Identifying Temporal Relations with Markov Logic Katsumasa Yoshikawa †, Sebastian Riedel ‡, Masayuki Asahara †, Yuji Matsumoto † † Nara Institute.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Machine learning continued Image source:

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.

Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Linear Programming Applications

Distributed Representations of Sentences and Documents

An SVMs Based Multi-lingual Dependency Parsing Yuchang CHENG, Masayuki ASAHARA and Yuji MATSUMOTO Nara Institute of Science and Technology.

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Radial Basis Function Networks

Online Learning Algorithms

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

SEMANTIC ANALYSIS WAES3303

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.

A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Semi-Supervised Clustering

Learning Deep Generative Models by Ruslan Salakhutdinov

Machine Learning Clustering: K-means Supervised Learning

Restricted Boltzmann Machines for Classification

COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.

Perceptron Learning for Chinese Word Segmentation

Using Uneven Margins SVM and Perceptron for IE

Relevance and Reinforcement in Interactive Browsing

Dan Roth Department of Computer Science

Presentation transcript:

A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July 12, 2010 Tohoku University Nara Institute of Science and Technology

Page  2 Predicate-Argument Structure Analysis (Semantic Role Labeling)  Task of analyzing predicates and its arguments –A predicate represents a state or an event, and its arguments have relations to the predicate –Each of arguments has a particular semantic role (Agent, Theme, etc)  In recent years, predicate sense disambiguation has been included in predicate-argument structure analysis [Surdeanu+ 08, Hajič+ 09] –‘sell.01’ means that ‘sold’ is an instance of the first sense of ‘sell’  Important for many NLP applications –MT, QA, RTE, etc. Theme Location Temporal luxuryautomakerlastTheyearsold1,214carsintheU.S. maker.01 sell.01 Product Agent

Page  3 drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion Two Types of Dependencies of Elements in Predicate- Argument Structures (1)Inter-dependencies between a predicate and its arguments –A1: car => we can infer that the correct sense is drive.01 (2)Non-local dependencies among arguments Two or more arguments do not have the same role Basically, obligatory roles of the predicate should appear in sentences drive.01 A0 A1 SBJ NMOD OBJ Pauldrovehiscar In order to realize robust predicate-argument structure analysis, it is necessary to deal with these types of dependencies

Page  4 Previous Work (1)Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.] Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker Can not explicitly capture inter-dependencies between a predicate and its arguments (2)Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.] Jointly learn and classify pred. senses and arg. roles simultaneously MLN can not deal with particular types of global features Currently, no existing (discriminative) approach sufficiently handles both types of dependencies

Page  5 Previous Work (1)Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.] Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker Can not explicitly capture inter-dependencies between a predicate and its arguments (2)Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.] Jointly learn and classify pred. senses and arg. roles simultaneously MLN can not deal with particular types of global features Currently, no existing (discriminative) approach sufficiently handles both types of dependencies We propose a structured model that can capture both types of dependencies simultaneously

Page  6 The proposed model SBJ NMOD OBJ Pauldrovehiscar drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion

Page  7 The proposed model A0 drive.01 drive.02 … A1 A0 … Paulcar drove NONE SBJ NMOD OBJ Pauldrovehiscar drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion Expand the possible labels of predicate senses and argument roles

Page  8 The proposed model A0 drive.01 drive.02 … A1 A0 … Paulcar drove NONE SBJ NMOD OBJ Pauldrovehiscar drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion Expand the possible labels of predicate senses and argument roles We use four types of factors which score labels of elements in predicate- argument structures

Page  9 The proposed model A0 drive.01 drive.02 … A1 A0 … Paulcar drove NONE SBJ NMOD OBJ Pauldrovehiscar drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion Expand the possible labels of predicate senses and argument roles These factors are defined by (linear model) These factors are defined by (linear model) We use four types of factors which score labels of elements in predicate- argument structures

Page  10 A0 drive.01 drive.02 … A1 A0 … Paulcar drove NONE The proposed model SBJ NMOD OBJ Pauldrovehiscar FPFP drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion use a factor which scores sense labels of the predicate

Page  11 A0 drive.01 drive.02 … A1 A0 … Paulcar drove NONE The proposed model SBJ NMOD OBJ Pauldrovehiscar FAFA FPFP drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion use a factor which scores role labels of each argument

Page  12 A0 … A1 A0 … Paulcar drove NONE The proposed model SBJ NMOD OBJ Pauldrovehiscar F PA FAFA FPFP drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01 drive.02 add a factor which scores label pairs of a predicate sense and a semantic role of an argument

Page  13 The proposed model drive.02 … A1 A0 … Paulcar drove NONE A0,drive01,A1 … A0,drive01,A1 … A0 drive.01 A1 SBJ NMOD OBJ Pauldrovehiscar FPFP F PA FAFA FGFG drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion add a factor which captures plausibility of the whole predicate-argument structure (use global features) add a factor which captures plausibility of the whole predicate-argument structure (use global features)

Page  14 The proposed model drive.02 … A1 A0 … Paulcar drove NONE A0,drive01,A1 … A0,drive01,A1 … A0 drive.01 A1 SBJ NMOD OBJ Pauldrovehiscar FPFP F PA FAFA FGFG drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion add a factor which captures plausibility of the whole predicate-argument structure (use global features) add a factor which captures plausibility of the whole predicate-argument structure (use global features) The predicate ‘drive’ has all obligatory roles A0 and A1 => F G assigns the higher score to the weight corresponds to this feature The predicate ‘drive’ has all obligatory roles A0 and A1 => F G assigns the higher score to the weight corresponds to this feature

Page  15 The proposed model drive.02 … A1 A0 … Paulcar drove NONE A0 drive.01 A1 NONE SBJ NMOD OBJ Pauldrovehiscar A0,drive01,A1 … A0,drive01,A1 … FPFP F PA FAFA FGFG drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion The proposed model combines these types of factors

Page  16 The proposed model drive.02 … A1 A0 … Paulcar drove NONE A0 drive.01 A1 NONE drive.01 A0 A SBJ NMOD OBJ Pauldrovehiscar A0,drive01,A1 … A0,drive01,A1 … FPFP F PA FAFA FGFG drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion drive.01: drive a vehicle A0: driver A1: vehicle drive.02: cause to move A0: driver A1: things in motion The proposed model combines these types of factors The highest scoring assignment is returned by the proposed model

Page  17 Dealing with global (non-local) features  Introduce the fundamental idea of [Kazama and Torisawa 2007] –Features are divided into local features and global features –Inference: N-best based approach (1) Generate N-best assignments using only local features (2) Obtain global features in the N-best assignments (3) Select the argmax –Learning: train parameters with two margin constraints All: train parameters so as to ensure a sufficient margin using all features (both local features and global features) Local only: when the constraint All is satisfied, train parameters so as to ensure a sufficient margin using only local features K&T proposed a Margin-Perceptron Learning Algorithm

Page  18 Inference and Learning Algorithm of the Proposed Model Inference: generate N-best assignments for each predicate sense Learning: the online Passive-Aggressive Algorithm [Crammer 2006] The parameters are trained by solving the optimization problem used in PA with the two margin constraints: All (local + global) and Local only Inference: generate N-best assignments for each predicate sense Learning: the online Passive-Aggressive Algorithm [Crammer 2006] The parameters are trained by solving the optimization problem used in PA with the two margin constraints: All (local + global) and Local only (1) All (local + global) margin (2) Local only margin positive other positive other

Page  19 Results on the CoNLL-2009 ST Dataset (average) feature selection Overall (Sem. F1) WSD (Acc.) SRL (Lab. F1) F P +F A no F P +F A +F PA no F P +F A +F G no ALLno Björkelundyes80.80 Zhaoyes80.47 Meza-Ruizno77.46 sense FPFP F PA FGFG FAFA … role 1 role 2 role N  The best performance is obtained by using the all factors  Our model achieved the competitive results with the top system in the CoNLL-2009 Shared Task without any feature selection procedure

Page  20 Results on the CoNLL-2009 ST Dataset (average) feature selection Overall (Sem. F1) WSD (Acc.) SRL (Lab. F1) F P +F A no F P +F A +F PA no F P +F A +F G no ALLno Björkelundyes80.80 Zhaoyes80.47 Meza-Ruizno77.46 sense FPFP F PA FGFG FAFA … role 1 role 2 role N  By adding two types of factors F PA and F G, we obtained performance improvements in both tasks (predicate sense disambiguation and argument role labeling) => Succeeded in joint learning

Page  21 Summary  We proposed a structured model that can capture two types of dependencies (1)Non-local dependencies among arguments (2)Inter-dependencies between a predicate and its arguments  The proposed model achieved the competitive results with the state-of- the-art SRL systems without any feature selection procedure  By adding two types of factors, we obtained performance improvements on both predicate sense disambiguation and argument role labeling => succeeded in joint learning  Future Work –exploiting unlabeled data (unsupervised or semi-supervised predicate-argument structure analysis)