Download presentation
Presentation is loading. Please wait.
1
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group
2
Data-Driven Approaches to Error Correction Classification – Represent the context of a word as a feature vector – P(word | context) – Useful for: detection + candidate generation + candidate ranking Language modeling – Assign a probability to a sequence of words – Useful for: candidate ranking
3
Language model data More data -> more reliable scores For example: Gigaword corpus (3B tokens) How about the web? – Largest text collection in existence – Diverse but noisy – Training a web-based language model: difficult – BUT: search engines return estimated page counts for a search
4
Googleology as Bad Science (Kilgariff 2007) No part-of-speech or lemma information Search syntax is limited Number of automatic queries may be limited Counts are web page count estimates, NOT phrase counts Also: – Page count estimates are black magic – Estimates fluctuate over time
5
Let’s see, then: Can web data be used to distinguish errors from correct phrases? What resource is best? – Bing API – Google API – Google 5-gram data: preserves case + punctuation + sentence boundaries Count cutoffs: unigrams = 200, higher n-grams = 40 What query size is best? How does all this compare to using a standard language model?
6
Error types Preposition errors: – 16% of errors in CUP data – Prominent for all L1 backgrounds Determiner errors: – 13% of errors in CUP data – Depends on L1
7
Evaluation Data: Cambridge Learners Corpus (CLC): – Random sample of ~9k sentences with preposition/article errors – Cleanup: Correct spelling errors, change British spelling to US spelling Keep all other errors in the sentence Eliminate: – Sentences with nested errors – Sentences with multiple prep/art errors Task: Distinguish the user error from the annotated correction
8
Evaluation Metrics correct (the query results favor the correction of the learner error over the error itself): count(q correction ) > count(q error ) incorrect (the query results favor the learner error over its correction): count(q error ) >= count(q correction ) where(count(q error ) ≠ 0 OR count(q correction ) ≠ 0) noresult: count(q correction ) = count(q error ) = 0
9
Query types Fixed Window: n tokens to the right, m tokens to the left: 1_1, 2_1, 1_2, 2_2, 3_2, 2_3: 1_1: rely 0/on this 2_1: we rely 0/on this … we rely 0/on this kind of information FixedLength: number of tokens for original and correction query is identical. For substitution: same as Fixed Window For deletion/insertion: need one extra word from the left/right to keep length equal: LeftTrigram: we rely 0 this rely on this
10
SmartQueries Expand at sentence edges: “Nobody in/at the” vs: “Nobody in/at the party” Find edges of noun phrases, verb phrases, prepositional phrases Include punctuation as it indicates a clause boundary: “have a/0 lunch,” will not match on “have a lunch date” Don’t go beyond punctuation: “buy clothes 0/in, but” vs: “to buy clothes 0/in,” rely 0/on this kind of information
11
Results: Prepositions Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr SmartQuery0.86370.95480.97420.87870.85620.52060.75890.81760.5071 Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr Left4g0.74590.84540.88530.96240.95200.78170.71780.80480.6920 Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr SmartQuery0.73960.81830.86330.79870.78780.41080.59060.64460.5071 DELETIONS INSERTIONS SUBSTITUTIONS
12
Results: Articles Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr 2_20.76780.90560.93860.83530.81080.46440.64140.73420.4359 Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr Left4g0.82920.90830.94600.95050.94280.70720.78800.85620.6690 Query type non-zero-result accuracyretrieval ratioraw accuracy B-APIG-APIG-5grB-APIG-APIG-5grB-APIG-APIG-5gr 2_20.69700.78420.84860.82850.81450.44210.57740.63880.3752 DELETIONS INSERTIONS SUBSTITUTIONS
13
Summary so far Best NonZeroResultAccuracy ranges from 85% - 98% on Google 5grams RetrievalRatio on Google 5grams ranges from 35% - 71% Google 5grams yield best accuracy but worst RetrievalRatio Google API drops a little in accuracy but has much better RetrievalRatio No single query strategy works best, need to have different query strategy per error type
14
Error Analysis 1.In over half of errors, disambiguating info not within query: A/The discount will be 5%. 2.~10%: Both original and correction seem good: It’s very important to/for us. 3.~10%: Other error in query: It’s a/0 great new. ~ great news. 4.~10%: Unexpected n-gram frequencies: “guilty for you”=171 versus “guilty about you”=137 5.~10%: Annotation introduces error: Don’t work on/*at Sunday. (Includes British English.) 6.For prepositions: ~7%: meaning changing – annotator used context of essay: I will buy it from/for you.
15
Using Language Model as Backoff operationNon-zero result accuracy Google 5-gram LM accuracyLM as backoff deletion0.974 0.5960.728 insertion0.885 0.8740.879 substitution0.863 0.7540.767 operationNon-zero result accuracy Google 5-gram LM accuracyLM as backoff deletion0.939 0.3290.549 insertion 0.9460.9440.949 substitution0.849 0.7810.806 Prepositions Articles
16
Conclusions Can web data be used to distinguish errors from correct phrases? – Yes, precision is high, but recall is low What resource is best? – Google 5-gram achieves highest precision What query size is best? – No “one-size-fits-all” answer How does the Google ngram approach compare to using a standard language model? – precision of Google 5-gram outperforms language model BUT language model can be used as a backoff
17
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.