Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

Search-Based Structured Prediction
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Support Vector Machines
Semantic Role Labeling Abdul-Lateef Yussiff
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Two-Phase Semantic Role Labeling based on Support Vector Machines Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Semantic Role Labeling using Maximum Entropy Model Joon-Ho Lim NLP Lab. Korea Univ.
Ensemble Learning (2), Tree and Forest
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Final review LING572 Fei Xia Week 10: 03/11/
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Text Classification, Active/Interactive learning.
Ling 570 Day 17: Named Entity Recognition Chunking.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Random Sets Approach and its Applications Basic iterative feature selection, and modifications. Tests for independence & trimmings (similar to HITON algorithm).
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Data Mining and Decision Support
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
KNN & Naïve Bayes Hongning Wang
Natural Language Processing Vasile Rus
Chapter 7. Classification and Prediction
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
National Taiwan University
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Introductory Seminar on Research: Fall 2017
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Probabilistic and Lexicalized Parsing
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical NLP: Lecture 9
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CSCI 5832 Natural Language Processing
Machine Learning in Practice Lecture 23
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University Amsterdam

1. Representation for Classification Usual choice: Classify Constituents  Intuition: one argument  one constituent  Not available in this task Classify words?  Too fine-grained Classify chunks?  Data anylsis: Not always the right level 34% of arguments: more than one chunk 13% of arguments: do not respect chunk boundaries

Chunk sequences as classification instances Sequences of chunks and chunk parts  Adaptive level of structure  „Potential constituents“ [ NP Britain´s] [ NP manufacturing industry] [ VP is transforming] [ NP itself] [ VP to boost] [ NP exports] ARG0 = NP_NP V = VP[VBG] ARG1 = NP ARG2 = VP_NP

Frequency-based chunk sequence filtering Filter 1: Use only sequence types which realise arguments in training set 1089 types  Zipf distributed NP (23,000) S (5,000) NP_PP_NP_PP_NP_VP _PP_NP_NP (1) Filter 2: Use only frequent sequence type (f(s)>10) Examine material between sequence and target  Divider sequences  Also Zipf distributed Empty divider (14,000) NP (10,000)  Similar to „Path“ Filter 3: Use only seq.s with frequent divider (f(d)<10) Filter 4: Use only seq.s co- occurring frequently with some divider (f(s,d)<5)

Results of filtering Leaves 87 sequence types (was 1089)  43,777 tokens in devel set (about 1 seq / word)  8,698 are proper arguments (about 20%) Bad news: representation loses 16% of proper arguments

2. Features „Shallow features“:  Simple properties „Higher-level features“:  Syntactic properties (mostly heuristic) „Divider features“:  Shallow and higher-level properties of dividers

EM-based clustering Measure fit between objects y 1 (pred:arg) and y 2 (sequence)  Example: How well does NP fit give:A1? y 1 and y 2 are independent and generated by cluster  p(y 1,y 2 ) =  c p(c) p(y 1 |c) p(y 2 |c)  EM derives clusters from training data  Intention: Generalisation within clusters Features: e.g. „most likely argument slot for this sequence for this predicate“

3. Procedure Filter sequences from training set Compute features for sequence tokens and their dividers (training + development + test set) Estimate Maximum Entropy model on training set Classify sequences from devel / test set Recover semantic parses

Two-step classification procedure Classifier 1: Argument recognition  Binary decision about argumenthood  All argument classes conflated into ARG Classifier 2: Argument labelling  Consider only sequences assigned ARG by step 1

4. Classification result: Sequence chart Themanwiththebeardsleeps A0 (70%), A1 (20%)NOLABEL (70%), AM-MOD(25%) A0 (60%), NOLABEL (40%) A0 (65%), A1 (25%) Need to find optimal „semantic parse“ of argument labels

Semantic parse recovery Find most probable semantic parse p = (l 1,l 2,...) Step 1: Beam search:  Simple probability model with independence assumption: P bs (l 1,l 2,...) =  i P c (l i ) Step 2: Reestimation  Global considerations: [A0 A0]  Use counts from training set: P(l 1,l 2,...) = P bs (l 1,l 2,...) * P tr (l 1,l 2,...)

5. Results (Development Set) PrecisionRecallF-score Upper Bound Step 1 (ARG only) Final Upper Bound: given by lost chunk sequences But filtering is necessary  Only sequence frequency filtering (filter 1 and 2): Good news: 9% arguments are lost (now 16%) Bad news: 127,000 sequences (now 44,000)  Argument recognition much more difficult  F-score with same features only 0.38

Results Two steps have different profiles  Arg identification: shallow and divider features important  Arg labelling: shallow and higher-level features important Clustering features unsuccessful: Increase precision at cost of recall  Feature „most probable label for sequence“  Successful in SENSEVAL-3 model Largest problem is recall PrecisionRecallF-score Upper Bound Step 1 (ARG only) Final

What I talked about... and more Chunk sequences for SRL  Adaptive representation with „higher-level“ features  Recall problem (Filtering loses proper arguments)  EM-based features promising, but currently not helpful Since submission  Maxent vs. memory-based learner: virtually same result Left to do  Detailed error analysis  More intelligent filtering  Better features