Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.

Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff The CoNLL-2013 Shared Task on Grammatical Error Correction, Ng el al. Better Evaluation for Grammatical Error Correction, Dahlmeier and Ng

Annotator tagSystem output Annotator tag System output Learner sentence Standard NLP evaluation Error detection evaluation Resource: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al.

 Comma restoration task  Commas are removed from well edited text (gold standard)  System tries to restore commas by predicting their locations  Comparison: ▪ Binary distinction (presence or absence of comma)

 Comma error detection task  System seeks to find and correct errors in the write’s usage of commas  Intricacies: ▪ Positive class: Error of the writer that involves comma (not presence of comma)  Mismatch between writer’s sentence and the annotator’s judgement ▪ Negative class: writer and annotator agree ▪ System’s judgement has not been considered yet ▪ Writer-Annotator-System (WAS)

Contingency scheme for WAS Considering System prediction and Writer’s form together

Contingency scheme for WAS Considering System prediction and Gold standard together

Simplified contingency scheme

Expected proportion of TP match

System1: Predications are correct at chance level

System2: Prevalence and bias remain the same

System3: Increase bias and prevalence + Predications are correct at chance level

 Dealing with sensitivity to bias  Vary threshold and generate precision-recall curve

 Dealing with sensitivity to bias  Area under Receiver Operating Characteristic (AUC) curve curve for random prediction Effect of random prediction is not nullified Area under random prediction

Class skewedness is already taken care of False Positive Rate True Positive Rate

 Positive class consists of an error in writer’s text  No 1:1:1 correspondence between writer’s sentence, annotator’s correction and type of error Book of my class inpired me A Book in my class inspired me Books for my class inspired me The books of my class were inspiring to me Article error Number error Article+Number error

 Assuming no ambiguity in error type  What would be the size of unit over which error is defined? The book in my class inspire me a) The book in my class inspires me b) The books in my class inspire me Unit size: Morpheme level? Word level? Phrase level? String level? Token-based approach vs String-based approach

EDM can handle both

 EDMs are good for comparison not for providing feedback to the writer  If book and inspire are not linked feedback like violation in subject-verb agreement cannot be provided

Accuracy: 0.54, Kappa = 0.00 Accuracy: 0.77, Kappa = 0.21 Inject 100 more TNs

 Extraction of system edit from writer’s text (source) and system output (hypothesis)  done with GNU wdiff utility Source: Our baseline system feeds word into PB-SMT pipeline Hypothesis: Our baseline system feeds a word into PB-SMT pipeline Hypothesis matches with first gold standard edit but flagged as invalid

 Key idea  There may be multiple ways to arrive at the same correction  Extraction of the set of edits that matches the gold standard maximally

 Notations  An edit is a tripple ▪ Start and end token offsets a and b with respect to a source sentence. ▪ A correction C. ▪ For gold standard edit C is set of corrections ▪ For system edit C is a single correction

 Edit metric: Levenshtein distance  Minimum number of insertions, deletions and substitutions needed to transform one string to another  How to compute levenshtein distance? ▪ Use a 2-D matrix (Levenstein matrix) to store edit costs of substrings of string pairs ▪ Compute individual cell entries (edit costs) with dynamic programming ▪ Rightmost corner cell stores optimal edit cost

 Slides from Jurafsky course page

 Spell correction  The user typed “graffe” Which is closest? ▪ graf ▪ graft ▪ grail ▪ giraffe Computational Biology Align two sequences of nucleotides Resulting alignment: Also for Machine Translation, Information Extraction, Speech Recognition AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC - AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

 The minimum edit distance between two strings  Is the minimum number of editing operations  Insertion  Deletion  Substitution  Needed to transform one into the other

 Two strings and their alignment:

 If each operation has cost of 1  Distance between these is 5  If substitutions cost 2 (Levenshtein)  Distance between them is 8

 Searching for a path (sequence of edits) from the start string to the final string:  Initial state: the word we’re transforming  Operators: insert, delete, substitute  Goal state: the word we’re trying to get to  Path cost: what we want to minimize: the number of edits

 But the space of all edit sequences is huge!  We can’t afford to navigate naïvely  Lots of distinct paths wind up at the same state. ▪ We don’t have to keep track of all of them ▪ Just the shortest path to each of those revisited states.

 For two strings  X of length n  Y of length m  We define D(i,j)  the edit distance between X[1..i] and Y[1..j] ▪ i.e., the first i characters of X and the first j characters of Y  The edit distance between X and Y is thus D(n,m)

 Dynamic programming:  Solving problems by combining solutions to subproblems.  A tabular computation of D(n,m)  Bottom-up  We compute D(i,j) for small i,j  And compute larger D(i,j) based on previously computed smaller values  i.e., compute D(i,j) for all i (0 < i < n) and j (0 < j < m)

 Initialization D(i,0) = i D(0,j) = j  Recurrence Relation For each i = 1…M For each j = 1…N D(i-1,j) + 1 D(i,j)= min D(i,j-1) + 1 D(i-1,j-1) + 2; if X(i) ≠ Y(j) 0; if X(i) = Y(j)  Termination D(N,M) is distance insertion deletion substitution

N9 O8 I7 T6 N5 E4 T3 N2 I1 #0123456789 #EXECUTION

N9 O8 I7 T6 N5 E4 T3 N2 I12 #0123456789 #EXECUTION The Edit Distance Table D(0,1) + 1 D(1,1)= min D(1,0) + 1 D(0,0) + 2; if X(1) ≠ Y(1) 0; if X(1) = Y(1)

N989101112111098 O8789 1110989 I76789 989 T65678989 11 N5456789101110 E43456789 9 T3456787898 N2345678787 I1234567678 #0123456789 #EXECUTION The Edit Distance Table

 Edit distance isn’t sufficient  We often need to align each character of the two strings to each other  We do this by keeping a “backtrace”  Every time we enter a cell, remember where we came from  When we reach the end,  Trace back the path from the upper right corner to read off the alignment

N989101112111098 O8789 1110989 I76789 989 T65678989 11 N5456789101110 E43456789 9 T3456787898 N2345678787 I1234567678 #0123456789 #EXECUTION The Edit Distance Table

 Base conditions: Termination: D(i,0) = i D(0,j) = j D(N,M) is distance  Recurrence Relation: For each i = 1…M For each j = 1…N D(i-1,j) + 1 D(i,j)= min D(i,j-1) + 1 D(i-1,j-1) + 2; if X(i) ≠ Y(j) 0; if X(i) = Y(j) LEFT ptr(i,j)= DOWN DIAG insertion deletion substitution insertion deletion substitution

Source: Our baseline system feeds word into PB-SMT pipeline Hypothesis: Our baseline system feeds a word into PB-SMT pipeline

012345678910 #ourbaselinesystemfeedsawordintoPB-SMTpipeline. 0# 1Our 2baseline 3system 4feeds 5word 6into 7PB-SMT 8pipeline 9.

012345678910 #ourbaselinesystemfeedsawordintoPB-SMTpipeline. 0#012345678910 1Our10123456789 2baseline21012345678 3system32101234567 4feeds43210123456 5word54321112345 6into65432221234 7PB-SMT76543332123 8pipeline87654443212 9.98765554321

0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1)

0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1) feeds/feeds a(2) word/a word(2) system feeds/system feeds a(3) word into/a word into (3) feeds word/feeds a word(3)

0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1) feeds/feeds a(2) word/a word(2) system feeds/system feeds a(3) word into/a word into (3) feeds word/feeds a word(3) word/a word(-45)

 Perform a single-source shortest path with negative weights from start to end vertex  Bellman-Ford algorithm

 Theorem  The set of edits corresponding to the shortest path has the maximum overlap with the gold standard annotation.

Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.

Similar presentations

Presentation on theme: "Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.

Similar presentations

Presentation on theme: "Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff."— Presentation transcript:

Similar presentations

About project

Feedback