English project More detail and the data collection system

Slides:



Advertisements
Similar presentations
1 Train to Gain Continuing professional development ILP Targets for Company Classes EHWL College This resource was produced as part of the Effective Practice.
Advertisements

1 Grammars and Parsing Allen ’ s Chapters 3, Jurafski & Martin ’ s Chapters 8-9.
Introduction Data analysis & charts Introduce a new scenic spot to tourists Conclusion References Index.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
Resources of Education in Hong Kong HKIEd Library Feb 2011.
Latin Grammar: Singular and Plural Magister Henderson Latin I.
Grammar Translation Method
T raining on Read&Write GOLD Dick Powers
Standard Grade Computing System Software & Operating Systems.
2014 Fall Semester- Lesson Plan 1. Step One  Make a clear introduction of Grammar Translation with a PPT (20 minutes)—You can use any material from internet.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Lesson 7: Proofreading Topics Proofreading on a Computer and on Paper Common Mistakes to Look for Avoiding Sexist Language.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Communicative and Academic English for the EFL Professional.
Lesson 4 Teaching Grammar Chapter 13. Questions re: Teaching Grammar What kind of approach do you take when teaching grammar? What assumptions do you.
2013 Fall Semester- Week 3. Step One  Make a brief introduction of Grammar Translation with a PPT (10 minutes)  Tell the classmates (5 minutes) 1. your.
Collins Type 3 Practice Part 1: Teacher How do I create a Type 3 Writing Assignment? Step 1 Choose an assignment Form Audience Step 2 Select FCAs Select.
T H E D I R E C T M E T H O D DM. Background DM An outcome of a reaction against the Grammar- Translation Method. It was based on the assumption that.
For Editing. CAPITALIZATION RULES EXERCISE SIMPLE SENTENCE A sentence is a group of words that (a) contains at least one subject and one verb and.
NEVER TRUESOMETIMES TRUEUSUALLY TRUEALWAYS TRUE Listen attentively to English Language teacher during the lesson. 0 (0%) 7 (43.75%)9 (56.25%) Listen.
Free English Grammar Check Tool? -Golden Advice!
GGGE6533 LANGUAGE LEARNING STRATEGY INSTRUCTION SUCCESSFUL ENGLISH LANGUAGE LEARNING INVENTORY (SELL-IN) FINDINGS & IMPLICATIONS PREPARED BY: ZULAIKHA.
Was and were Get it right! One of first five of the top ten errors you must not make.
For NCEA L2 and L3 Statistics
Using language corpora in developing Arabic lessons & syllabuses
Lesson Plan Presentation
Techniques and Principles in Language Teaching
GCSE SUCCESS EVENING.
ELT 213 APPROACHES TO ELT I DIRECT METHOD WEEK 4
DELHI PUBLIC SCHOOL TAPI, SURAT PROJECT ACTION RESEARCH
Computational and Statistical Methods for Corpus Analysis: Overview
How to Learn English Mark Brierley.
Custom rules on subject verb agreement
ELLs in the Spotlight Collaborating with ESL Teachers
Topic 3 My school life is very interesting.
Subject-Verb Agreement (Tuesday, 10/10)
Assessing Writing Module 5 Activity 2.
Marie Perera Dept. of Humanities Education
English correction corpora
Melbourn Village College
THE DIRECT METHOD.
A1-A2 Unit One Lesson 4B Making mistakes.
web1T and deep learning methods
A1-A2 Unit One Lesson 4B Making mistakes.
Transformer result, convolutional encoder-decoder
•Topic: How many BANANAS?
The CoNLL-2014 Shared Task on Grammatical Error Correction
Automatic Detection of Causal Relations for Question Answering
The CoNLL-2014 Shared Task on Grammatical Error Correction
Hong Kong English in Students’ Writing
Grammar correction – Data collection interface
Statistical n-gram David ling.
That Man is a big shot. He is on the high horse.
Wording Practice.
Needs analysis (ESP) Communicative language needs for your job ?
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
EPAS Educational Planning and Assessment System By: Cindy Beals
PYTHON: BUILDING BLOCKS Inputs & Outputs
NAACL-HLT 2010 June 5, 2010 Jee Eun Kim (HUFS) & Kong Joo Lee (CNU)
Research using Databases and Google
English 0300 HCC – Katy Center Mrs. N. Puder
Chinese People Learning English
Building an annotated Corpus
Languagetool implementation
Editing Process: English 10 Spoken Language
Abstracts.
Computer Programming Tutorial
Presentation transcript:

English project More detail and the data collection system 2018-08-24 David Ling

Contents Project background Evaluation Training Data collection system

Background Project in charge: Target: Holly Chung, Amy Kwok, Anora Wong (ENG) Target: English error corrections for HK students Highlight good practices (not well defined yet) More than traditional grammar checkers: Chinglish, collocation, meaning, and style Math lessons use English.  Math lessons are conducted in English. He can say Chinese.  He can speak Chinese.

Background Old methods: New method: Rule based Statistical methods An error rule extracted from LanguageTool on subject-verb-agreement Sentence start + determiner + plural noun + is (Eg. The dogs is …, The teachers is, ….) Pattern matched  Trigger correction About 1.7k error handcrafted error patterns Old methods: Rule based eg. Microsoft Word, LanguageTool Statistical methods New method: Deep Learning, Translation, data driven Chollampatt, 2018 (National Singapore University) Fairseq (Facebook) + Language model (KenLM)

Which set is by deep learning? Input sentences He go to schol tomorrow. "I go to school by bus.", said David yesterday. … she did not want another mother would also feeling it. It can make the audiences having the same feeling on it. He goes to school tomorrow. "I go to school by bus.", said, David, yesterday. … she did not want another mother would also feel it. [No change] He will go to school tomorrow. "I go to school by bus," said David yesterday. … she did not want another mother to feel it. It can make the audience feel the same way. A B Correction based on the context Recall more errors Not just correcting errors, but also improving styles Grammarly Deep learning Which set is by deep learning?

Evaluation – Four main steps INPUT He go to schol tomorrow. 1. Tokenize + Byte pair encoding He go to scho@@ l tomorrow . He will go to school tomorrow. OUTPUT 2. Fairseq (Beam search 12 sentences) 4. Reweighted with number of edit operations and sentence length He will go to school tomorrow . ||| F0= -0.053 He goes to school tomorrow . ||| F0= -0.375 He is going to school tomorrow . ||| F0= -0.397 … He will go to school tomorrow . ||| LM0= -24.9241 He goes to school tomorrow . ||| LM0= -25.8588 He is going to school tomorrow . ||| LM0= -25.1118 … 3. Language model (Kenlm 150GB)

Training – data sets LANG-8 (Japan social website), 2012 ~2000k sentences NUCLE (National University of Singapore), 2014 ~60k sentences (1500 essays) Topics and errors are far from enough, eg. eSports

Training - with additional training sentences Before After are conducted in are conducted in be used in be used in math Recalled successfully Chinese and Science are not in training data Five additional training sentences 1. Math lessons used English .  Math lessons were conducted in English . 2. Physics lessons used English .  Physics lessons were conducted in English . 3. Biology classes use English .  Biology classes are conducted in English . 4. History lessos used English .  History lessons was conducted in English . 5. Philosophy lessons often used English .  Philosophy lessons are conducted in English often

Grammar correction data set Building a data set for Hong Kong students Improvement on the checker Different sentence style Different error types Literature value statistical analysis on HK students’ English

Data collection system Four tables in the database System = database + interface (PHP+JS) System http://10.244.0.191/annotation/login.php Contains about 40 computer corrected essays Asked the English teachers to try SQLITE Easy compatible with python and php Stored as a single file

Data collection system Table -- ESSAYS Table -- ANNOTATIONS Stored in JSON format

END Thank you