Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Robust Extraction of Named Entity Including Unfamiliar Word Masatoshi Tsuchiya, Shinya Hida & Seiichi Nakagawa Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi.
A method for unsupervised broad-coverage lexical error detection and correction 4th Workshop on Innovative Uses of NLP for Building Educational Applications.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
User Input and Interactions on Microsoft Research ESL Assistant Claudia Leacock, Butler Hill Group Michael Gamon, Microsoft Research Chris Brockett, Microsoft.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
The Ups and Downs of Preposition Error Detection in ESL Writing Joel Tetreault[Educational Testing Service] Martin Chodorow[Hunter College of CUNY]
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Some Advances in Transformation-Based Part of Speech Tagging
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
Short Text Understanding Through Lexical-Semantic Analysis
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Opinion Sentence Search Engine on Open-domain Blog Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, Ryoji Kataoka NTT Cyber Solutions Laboratories, NTT.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.
Corpus-based generation of suggestions for correcting student errors Paper presented at AsiaLex August 2009 Richard Watson Todd KMUTT ©2009 Richard Watson.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
National Taiwan University, Taiwan
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Web Intelligence and Intelligent Agent Technology 2008.
A classifier-based approach to preposition and determiner error correction in L2 English Rachele De Felice, Stephen G. Pulman Oxford University Computing.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
CSCI 5832 Natural Language Processing
Michael Gamon, Chris Brockett, William B
The CoNLL-2014 Shared Task on Grammatical Error Correction
Statistical n-gram David ling.
University of Illinois System in HOO Text Correction Shared Task
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Using Contextual Speller Techniques and Language Modeling for ESL Error Correction Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko, Lucy Vanderwende Reporter: Chia-Ying Lee Advisor: Hsin-Hsi Chen Microsoft Research & University of Illinois IJCNLP 2008

Introduction About 750M (74%) people use English as a second language (Crystal 1997) Non-native writer encountered some special problem. (Ex: prepositions 介係詞 ) Challenge: Writing errors often present a semantic dimension(Ex: at school 指地點, in school 指時間 ) 2

Target Error Type 1. Preposition 介係詞 presence and choice: In the other hand,... (On the other hand...) 2. Definite and indefinite determiner presence and choice: I am teacher... (am a teacher) 3. Gerund 動名詞 /infinitive 不定詞 confusion: I am interesting in this book. (interested in) 4. Auxiliary verb presence and choice: 從屬動詞 My teacher does is a good teacher (my teacher is...) 5. Over-regularized verb inflection: I writed a letter (wrote) 6. Adjective/noun confusion: This is a China book (Chinese book) 7. Word order (adjective sequences and nominal compounds): I am a student of university (university student) 8. Noun pluralization: They have many knowledges (much knowledge) 3

Problem Definition Present a modular system for detection and correction of errors made by non- native writers. Focus on preposition and determiner related problem. 4

Related Work Turner and Charniak (2007) utilize a language model based on a statistical parser for determiner and preposition selection De Felice and Pulman (2007) utilize a set of sophisticated syntactic and semantic analysis features to predict 5 common English prepositions Han et al. (2004, 2006) use a maximum entropy classifier to propose article corrections Izumi et al. (2003) and Chodorow et al. (2007) present techniques of automatic preposition choice modeling 5

System Description 0. Preprocessing Tokenized and POS tagged 1. Suggestion Provider (SP) Detection and correction 2. Language Model (LM) Delete the suggestions whose score is lower than original 3. Example Provider (EP) Query the web for exemplary sentences 6

Suggestion Provider(1/3) Classifiers : Presence/absence or pa classifier ex: p(article + teacher) = 0.54 Choice or ch classifier ex: p( the) = 0.04 p(a/an) = 0.96 Potential insertion sites are determined heuristically from the sequence of POS tags 7

Suggestion Provider(2/3) Features: ( ±6 tokens) Relative position Token string POS tags Example: 0/I/PRP 1/am/VBP 2/teacher/NN 3/from/IN 4/Korea/NNP 5/./. Decision tree classifiers (WinMine toolkit Chickering 2002) Better than linear SVM 8

Suggestion Provider(3/3) Data set: English Encarta encyclopedia (560k sentences) A random set of 1M sentences from a Reuters news data set. Preposition from the NICT Japanese Learners of English corpus : about, as, at, by, for, from, in, like, of, on, since, to, with, than, “other“ 9

Language Model 5-gram model trained on the English Gigaword corpus (LDC2005T12) 120K-word vocabulary 54 million bigrams, 338 million trigrams, 801 million 4-grams and 12 billion 5-grams. Use interpolated Kneser-Ney smoothing (Kneser and Ney 1995) without count cutoff Score: I am teacher from Korea. score = 0.19 I am a teacher from Korea. score =

Example Provider (1/2) Web Search String query in a small window Ranking rule: In the same sentence Sentence length Context overlap 11

Example Provider (2/2) Original: I want to travel Disneyland in March. Suggestion: I want to travel to Disneyland in March. Top 3 examples: 1. Timothy's wish was to travel to Disneyland in California. 2. Should you travel to Disneyland in California or to Disney World in Florida? 3. The tourists who travel to Disneyland in California can either choose to stay in Disney resorts or in the hotel for Disneyland vacations. 12

Evaluation (1/5) Suggestion provider Determiner choice Preposition choice Language model Human evaluation 70% for training; 30%for testing Combined accuracy: 13

Evaluation (2/5) Suggestion provider Determiner choice Baseline:69.9% Choosing the most frequent class label none State of the art Turner and Charniak (Penn Tree Bank) : 86.74% 14

Evaluation (3/5) Suggestion provider Preposition choice Baseline : 28.94% Using no preposition 15

Evaluation (4/5) Language Model Reduced the number of preposition corrections by 66.8% and the determiner corrections by 50.7% Increase precision dramatically For the accuracy of preposition suggestions LM score + classifier probability : 62.32% LM score alone: 58.36% 16

Evaluation (5/5) Human evaluation 17 CLEC: Chinese Learners of English Corpus (Gui and Yang 2003)

Conclusion and Future Work Successfully combining contextual speller based methods with language model scoring and providing web-based examples. The system can work even in extremely noisy text with reasonable accuracy Future Work : Using web counts to build a learned ranker that combines information from language model and classifiers 18

Thank you! 19 買敏順找敏順! 敏順讓您呼吸順暢 輕鬆舒爽