Download presentation
Presentation is loading. Please wait.
Published byJennifer Atkinson Modified over 9 years ago
1
Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff The CoNLL-2013 Shared Task on Grammatical Error Correction, Ng el al. Better Evaluation for Grammatical Error Correction, Dahlmeier and Ng
2
Annotator tagSystem output Annotator tag System output Learner sentence Standard NLP evaluation Error detection evaluation Resource: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al.
3
Comma restoration task Commas are removed from well edited text (gold standard) System tries to restore commas by predicting their locations Comparison: ▪ Binary distinction (presence or absence of comma)
5
Comma error detection task System seeks to find and correct errors in the write’s usage of commas Intricacies: ▪ Positive class: Error of the writer that involves comma (not presence of comma) Mismatch between writer’s sentence and the annotator’s judgement ▪ Negative class: writer and annotator agree ▪ System’s judgement has not been considered yet ▪ Writer-Annotator-System (WAS)
6
Contingency scheme for WAS Considering System prediction and Writer’s form together
7
Contingency scheme for WAS Considering System prediction and Gold standard together
9
Simplified contingency scheme
13
Expected proportion of TP match
14
System1: Predications are correct at chance level
15
System2: Prevalence and bias remain the same
16
System3: Increase bias and prevalence + Predications are correct at chance level
18
Dealing with sensitivity to bias Vary threshold and generate precision-recall curve
19
Dealing with sensitivity to bias Area under Receiver Operating Characteristic (AUC) curve curve for random prediction Effect of random prediction is not nullified Area under random prediction
20
Class skewedness is already taken care of False Positive Rate True Positive Rate
21
Positive class consists of an error in writer’s text No 1:1:1 correspondence between writer’s sentence, annotator’s correction and type of error Book of my class inpired me A Book in my class inspired me Books for my class inspired me The books of my class were inspiring to me Article error Number error Article+Number error
22
Assuming no ambiguity in error type What would be the size of unit over which error is defined? The book in my class inspire me a) The book in my class inspires me b) The books in my class inspire me Unit size: Morpheme level? Word level? Phrase level? String level? Token-based approach vs String-based approach
23
EDM can handle both
24
EDMs are good for comparison not for providing feedback to the writer If book and inspire are not linked feedback like violation in subject-verb agreement cannot be provided
27
Accuracy: 0.54, Kappa = 0.00 Accuracy: 0.77, Kappa = 0.21 Inject 100 more TNs
30
Extraction of system edit from writer’s text (source) and system output (hypothesis) done with GNU wdiff utility Source: Our baseline system feeds word into PB-SMT pipeline Hypothesis: Our baseline system feeds a word into PB-SMT pipeline Hypothesis matches with first gold standard edit but flagged as invalid
31
Key idea There may be multiple ways to arrive at the same correction Extraction of the set of edits that matches the gold standard maximally
33
Notations An edit is a tripple ▪ Start and end token offsets a and b with respect to a source sentence. ▪ A correction C. ▪ For gold standard edit C is set of corrections ▪ For system edit C is a single correction
35
Edit metric: Levenshtein distance Minimum number of insertions, deletions and substitutions needed to transform one string to another How to compute levenshtein distance? ▪ Use a 2-D matrix (Levenstein matrix) to store edit costs of substrings of string pairs ▪ Compute individual cell entries (edit costs) with dynamic programming ▪ Rightmost corner cell stores optimal edit cost
36
Slides from Jurafsky course page
37
Spell correction The user typed “graffe” Which is closest? ▪ graf ▪ graft ▪ grail ▪ giraffe Computational Biology Align two sequences of nucleotides Resulting alignment: Also for Machine Translation, Information Extraction, Speech Recognition AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC - AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
38
The minimum edit distance between two strings Is the minimum number of editing operations Insertion Deletion Substitution Needed to transform one into the other
39
Two strings and their alignment:
40
If each operation has cost of 1 Distance between these is 5 If substitutions cost 2 (Levenshtein) Distance between them is 8
41
Searching for a path (sequence of edits) from the start string to the final string: Initial state: the word we’re transforming Operators: insert, delete, substitute Goal state: the word we’re trying to get to Path cost: what we want to minimize: the number of edits
42
But the space of all edit sequences is huge! We can’t afford to navigate naïvely Lots of distinct paths wind up at the same state. ▪ We don’t have to keep track of all of them ▪ Just the shortest path to each of those revisited states.
43
For two strings X of length n Y of length m We define D(i,j) the edit distance between X[1..i] and Y[1..j] ▪ i.e., the first i characters of X and the first j characters of Y The edit distance between X and Y is thus D(n,m)
44
Dynamic programming: Solving problems by combining solutions to subproblems. A tabular computation of D(n,m) Bottom-up We compute D(i,j) for small i,j And compute larger D(i,j) based on previously computed smaller values i.e., compute D(i,j) for all i (0 < i < n) and j (0 < j < m)
45
Initialization D(i,0) = i D(0,j) = j Recurrence Relation For each i = 1…M For each j = 1…N D(i-1,j) + 1 D(i,j)= min D(i,j-1) + 1 D(i-1,j-1) + 2; if X(i) ≠ Y(j) 0; if X(i) = Y(j) Termination D(N,M) is distance insertion deletion substitution
46
N9 O8 I7 T6 N5 E4 T3 N2 I1 #0123456789 #EXECUTION
47
N9 O8 I7 T6 N5 E4 T3 N2 I12 #0123456789 #EXECUTION The Edit Distance Table D(0,1) + 1 D(1,1)= min D(1,0) + 1 D(0,0) + 2; if X(1) ≠ Y(1) 0; if X(1) = Y(1)
48
N989101112111098 O8789 1110989 I76789 989 T65678989 11 N5456789101110 E43456789 9 T3456787898 N2345678787 I1234567678 #0123456789 #EXECUTION The Edit Distance Table
49
Edit distance isn’t sufficient We often need to align each character of the two strings to each other We do this by keeping a “backtrace” Every time we enter a cell, remember where we came from When we reach the end, Trace back the path from the upper right corner to read off the alignment
50
N989101112111098 O8789 1110989 I76789 989 T65678989 11 N5456789101110 E43456789 9 T3456787898 N2345678787 I1234567678 #0123456789 #EXECUTION The Edit Distance Table
51
Base conditions: Termination: D(i,0) = i D(0,j) = j D(N,M) is distance Recurrence Relation: For each i = 1…M For each j = 1…N D(i-1,j) + 1 D(i,j)= min D(i,j-1) + 1 D(i-1,j-1) + 2; if X(i) ≠ Y(j) 0; if X(i) = Y(j) LEFT ptr(i,j)= DOWN DIAG insertion deletion substitution insertion deletion substitution
54
Source: Our baseline system feeds word into PB-SMT pipeline Hypothesis: Our baseline system feeds a word into PB-SMT pipeline
55
012345678910 #ourbaselinesystemfeedsawordintoPB-SMTpipeline. 0# 1Our 2baseline 3system 4feeds 5word 6into 7PB-SMT 8pipeline 9.
56
012345678910 #ourbaselinesystemfeedsawordintoPB-SMTpipeline. 0#012345678910 1Our10123456789 2baseline21012345678 3system32101234567 4feeds43210123456 5word54321112345 6into65432221234 7PB-SMT76543332123 8pipeline87654443212 9.98765554321
57
012345678910 #ourbaselinesystemfeedsawordintoPB-SMTpipeline. 0#012345678910 1Our10123456789 2baseline21012345678 3system32101234567 4feeds43210123456 5word54321112345 6into65432221234 7PB-SMT76543332123 8pipeline87654443212 9.98765554321
59
0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1)
62
0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1) feeds/feeds a(2) word/a word(2) system feeds/system feeds a(3) word into/a word into (3) feeds word/feeds a word(3)
63
0,01,1 Our(1) 2,2 baseline(1) 3,3 system(1) 4,4 feeds(1) 4,5 5,6 word(1) 6,7 into(1) 7,8 PB-SMT(1) 8,9 pipeline(1) 9,10.(1) feeds/feeds a(2) word/a word(2) system feeds/system feeds a(3) word into/a word into (3) feeds word/feeds a word(3) word/a word(-45)
64
Perform a single-source shortest path with negative weights from start to end vertex Bellman-Ford algorithm
65
Theorem The set of edits corresponding to the shortest path has the maximum overlap with the gold standard annotation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.