Levenshtein-distance-based post-processing shared task spotlight Antal van den Bosch ILK / CL and AI, Tilburg University Ninth Conference on Computational.

Slides:



Advertisements
Similar presentations
Revising and Editing TRANSFORMING YOUR PAPER FOR YOUR AUDIENCE COPYRIGHT LISA MCNEILLEY, 2010.
Advertisements

CNTS LTG (UA) (i) Phoneme-to-Grapheme (ii) Transcription-to-Subtitles Bart Decadt Erik Tjong Kim Sang Walter Daelemans.
“Hounding the Innocent”
The Writing Process Communication Arts.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
A memory-based learning-plus-inference approach to morphological analysis Antal van den Bosch With Walter Daelemans, Ton Weijters, Erwin Marsi, Abdelhadi.
The Writing Process.
Computational Linguistic Techniques Applied to Drugname Matching Bonnie J. Dorr, University of Maryland Greg Kondrak, University of Alberta June 26, 2003.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Learning from Observations Chapter 18 Section 1 – 4.
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
Ann M. Bain, Ed.D. Sheppard Pratt Hospital Baltimore, Maryland.
Modern Information Retrieval Chapter 4 Query Languages.
Introduction to Machine Learning Approach Lecture 5.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
Artificial Intelligence CIS 479/579 Bruce R. Maxim UM-Dearborn.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
NLP. Introduction to NLP Extrinsic –Use in an application Intrinsic –Cheaper Correlate the two for validation purposes.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE symposium - 6/2/091 Language modelling (word FST) Operational model for categorizing mispronunciations.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Sed Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
ACM-HK Local Contest 2009 Special trainings: 6:30 - HW311 May 27, 2010 (Thur) June 3, 2010 (Thur) June 10, 2010 (Thur) Competition date: June 12,
Automatic Generation of Programming Feedback: A Data-Driven Approach Kelly Rivers and Ken Koedinger 1.
Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Revise & Edit… What For? Ms. Brooks 2007.
Dynamic Programming: Edit Distance
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Communication Arts The Writing Process. Communication Arts Five Stages of the Writing Process Prewriting Drafting Revising Editing Publishing.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Editing & Revising. Revising Revising is improving the content and organization of your writing. Writing needs consistency. All the ideas work together.
Assessment What are the differences between authentic and traditional assessment? What kinds of artifacts can be collected in authentic assessment for.
Trimester st Student has achieved reading success at level C or below Student has achieved reading success at level D or E. Student has achieved.
Dynamic Programming & Memoization. When to use? Problem has a recursive formulation Solutions are “ordered” –Earlier vs. later recursions.
Minimum Edit Distance Definition of Minimum Edit Distance.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Multiplication Timed Tests.
Communication Arts The Writing Process. Communication Arts GUIDING CONCEPT As writers, we understand and demonstrate the ability and flexibility to use.
Writing Assessment – Write informative text.
Writing Assessment – Write a narrative.
Definition of Minimum Edit Distance
Applying Deep Neural Network to Enhance EMPI Searching
CSSE463: Image Recognition Day 11
Erasmus University Rotterdam
Syntax Analysis Chapter 4.
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Nonparametric Methods: Support Vector Machines
Generalization ..
Editing and Revising.
CSSE463: Image Recognition Day 11
Scribbles Jim Li Roy Lim Matt McKenzie Anthony Wu.
Transformer result, convolutional encoder-decoder
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Dynamic Programming Computation of Edit Distance
The CoNLL-2014 Shared Task on Grammatical Error Correction
CS621/CS449 Artificial Intelligence Lecture Notes
The CoNLL-2014 Shared Task on Grammatical Error Correction
Complement to lecture 11 : Levenshtein distance algorithm
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Support Vector Machines
Shape Classification Evan Toler CAAM 210.
Preposition error correction using Graph Convolutional Networks
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Levenshtein-distance-based post-processing shared task spotlight Antal van den Bosch ILK / CL and AI, Tilburg University Ninth Conference on Computational Natural Language Learning (CoNLL- 2005) Ann Arbor, MI, June 29-30, 2005

Global models ≈ Post-processing Many systems: –local model vs. global model Goal of global model: –correcting mislabelings caused by “blind” decisions of a more local model Methods: –Probabilistic language models over argument labeling language –Distance-based error correction over argument labeling language

Hand-crafted post-processing “eraser” script by Erik Tjong Kim Sang (2004): –For each verb, if there is any double A0 - A5 in the sentence, delete the one furthest from the verb. Can only improve precision, not recall. Will typically lower recall, because it will delete the incorrect one of a double occasionally.

Data-driven post-processing Idea: argument labeling correction as spelling correction. –Classical solution: Levenshtein or string-edit distance (Levenshtein, 1965). Sum over: Deletion: distance++ Insertion: distance++ Substitution: distance++ –Closest found argument patterns contains corrections that need to be applied. Predicted: emphasize A0 V A1 A0 Nearest in training data and PropBank at distance 1: emphasize A0 V A1 Correction: delete final A0 in predicted string

Data-driven post-processing (2) Implements deletions and replacements Does not perform insertions –Does not know where to “insert” Levenshtein-based correction –Should be able to improve precision, like “eraser” –Might improve recall, due to correct replacements Can be applied to all systems! –>500 deletions, >250 replacements for some systems on WSJ dev & test –<100 deletions, <75 replacements for systems that already have a global model

Levenshtein post-processing: Example 1 System marquez, Brown: 40 deletions, 30 replacements. E.g., –bend A1 V A1 –bendA0 V A1 –loveA1 A0 V A1 –loveA0 V A1 –uniteA1 V A2 –uniteA1 V

Levenshtein postprocessing: Example 2 System punyakanok, Brown: 20 deletions, 17 replacements. E.g., –searchV A0 –searchV A1 because in data: –search A0 V –search A0 V A1

Levenshtein postprocessing: Example 3 System pradhan, WSJ dev: 111 deletions, 72 replacements. E.g., [ [ That dividend ] A1 is almost double the 35% currently taken out of Farmers by B.A.T ] A1, the spokesman added. [ That dividend is almost double the 35% currently taken out of Farmers by B.A.T ] A1, the spokesman added.

Levenshtein post-processing: effect systempostglobWSJ testWSJ test LPP Brown testBrown test LPP punyakanokX haghighiX marquez pradhan surdeanuX che moschitti yi ozgencil johansson cohnX park sutton Blue: increase; red: decrease; bold number and cell color: effect > 0.2