LanguageTool 3 07-11-2017 David Ling.

Slides:



Advertisements
Similar presentations
SUBJECT VERB AGREEMENT Adventures in Grammar!!!!!!!!!!!!!!!
Advertisements

Chaucer Skills and Principles Day 1 Unclear Antecedent An antecedent is the noun to which a pronoun refers. If the antecedent is unclear- difficult to.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
IVAN CAPP The 8 Parts of Speech.
ENGLISH. PUNCTUATION Apostrophes Commas Semi-colons GRAMMAR Subject-Verb Agreement Verb Tense Pronoun – Antecedent Agreement Subject – Object Pronouns.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Common mistakes in students writing Ms.Hatoon Aljulayel.
Programming Errors. Errors of different types Syntax errors – easiest to fix, found by compiler or interpreter Semantic errors – logic errors, found by.
Parts of Speech By: Miaya Nischelle Sample. NOUN A noun is a person place or thing.
The New Curriculum. English Continued focus on quality writing Grammar objectives for all year groups Focus on reading for pleasure Read a broad range.
D.L.P. – Week Three GRADE EIGHT. Day One – Skills Elimination of double comparison The subject and verb of a clause must agree in person and number. This.
Extending LanguageTool, a style and grammar checker Daniel Naber, Marcin Miłkowski 11:15 – 13:00 Friday, Workshop Notes v2.0, updated
August 13, Names a person, place or thing Proper Noun: specific, begins with a capital letter Common Noun: general and does not get capitalized.
Grammar for Parents 20th October 2016 Welcome! Questions are welcome…
Finstall First School English Information Evening for Parents
PREPOSITIONS Click here to start
Little things mean a lot! PREPOSITIONS.
Prepositions Prepositional Phrases Object of the Preposition
English 108 Final Review.
Learn about some aspects of lexis and semantics.
Custom rules on subject verb agreement
LanguageTool - Part A David Ling.
Telegraphic speech: two- and three-word utterances
SAT GRAMMAR.
SATs Meeting
PREPOSITION POWER Click here to start
Welcome to miss frey’s 2nd grade classroom
National Tests Year 2.
Parts of speech - overview
GRAMMAR قواعد اللغــــــــــة الإنجليزية
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
The CoNLL-2014 Shared Task on Grammatical Error Correction
SATs Meeting
The CoNLL-2014 Shared Task on Grammatical Error Correction
Hong Kong English in Students’ Writing
PREPOSITION POWER Click here to start
Parts of Speech: Definitions
PREPOSITION POWER Click here to start
Turn in Homework Please turn in the homework reading questions that were due for last week (Week 8) on the schedule. Before we begin peer review, we will.
English parts of speech
Practical Grammar Workplace Guide ENG/230
PREPOSITIONAL PHRASES
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
Statistical n-gram David ling.
PREPOSITION POWER Click here to start
PREPOSITION POWER This STAIR will address middle school students with a working knowledge of nouns, pronouns, verbs, adjectives, adverbs, articles and.
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
Add To Your Agenda: design your Milestones games
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
English grammar, punctuation and spelling
Part-of-Speech Tagging Using Hidden Markov Models
PREPOSITION POWER Click here to start
PREPOSITION POWER Click here to start
Editing Process: English 10 Spoken Language
Key Stage 1 Grammar.
CAPITALIZATION & PUNCTUATION
PREPOSITION POWER Click here to start
Presentation transcript:

LanguageTool 3 07-11-2017 David Ling

Contents LanguageTool Rule xml syntax and customization Overview on rules Web demonstration Performances on students’ scripts Rule xml syntax and customization Token, exception, POSTag, skip Example: Third person singular Making a custom example: Math lessons use English Java rules: neural network Resolving a custom confusion pairs: causal, casual

LanguageTool LanguageTool To use your own LanguageTool, you can Open source grammar checking Java program Rule-based, highly customizable Input features for the rules: POSTag, word pattern, Chunking-tag To use your own LanguageTool, you can double click ‘languagetool.jar’ via windows command line prompt cmd Run a local http server, connect via browsers Online demo available: https://languagetool.org/

LanguageTool rules Two main kinds of rules LanguageTool contains a POS dictionary Two main kinds of rules Xml rules Java rules Xml rules are customizable. Two corresponding files: Disambiguation.xml  for reducing multiple POSTags of a token, 346 rules Grammar.xml  for grammar rules, ~1700 rules Modal verb Noun Verb (base form) Verb (3rd person singular) Adjective

Grammar rules in grammar.xml Categories of xml rules Number of rules 1 Possible typo 506 2 Grammar 405 3 Collocations 9 4 Miscellaneous 21 5 Punctuation Errors 48 6 Commonly Confused Words 241 7 Nonstandard Phrases 8 Redundant Phrases 159 Style 17 10 Semantic 13 11 Plain English (default: off) 92 12 Wikipedia (default: off) Typography 14 Misused terms in EU publications, Gardner (default: off) 149 Total: 1704

Rule examples (name and outcome) Grammar all/most/some (of) + noun < correction="All students|All of the students">All of students like mathematics. both... as well as (and) < correction="and">He is both very rich as well as handsome. Use of past form with 'going to ...' < correction="write">I'm going to wrote him. inspired with (by) < correction="inspired by">The artist was inspired with the beauty of the mountains. beware PREPOSITION < correction="Beware of">Beware about malware. objective case after with(out)/at/to/... < correction="to me|to her|to him|to us|to them">Give it to I.

Rule examples (name and outcome) Redundant phrases absolutely essential/necessary (essential/necessary) < correction="essential">This is absolutely essential. established fact (fact) < correction="a fact">This is an established fact. there are also other (also) < correction="there are other|there are also">However, there are also other marbles in the jar. Punctuations extraneous apostrophes before ‘are’ < correction="cars">The car's are cheap. Comma after a month < correction="October 1958">The store closed its doors for good in October, 1958. Missing comma between day of month and year < correction="October 18,">My birthday is October 18 1983.

Students’ scripts 1

Students’ scripts 2

Students’ scripts 3 Fail to check: Misusing of prepositions: for (1st line) Missing prepositions: to (4th line) Incorrect word: force (4th line) Able to check: Misuse of ‘much’ and ‘many’ (7th line)

Examples by teachers Syntax/ Discourse Semantics (using of wrong word) Example for the neural network at a later part Unable to check: Since… therefore, although …but

LanguageTool Able to check: Example limitations on the current rules: Spelling 1st/2nd/3rd person singular Adverb + noun (eg. simply question) Some common phrases: concerned about, regarding to Example limitations on the current rules: Unable to tackle long and complex phrases (eg. why these video can became) False alarm: (eg. unseen named entities) Limited in resolving confusing words (eg. Casual, causal) Prepositions (eg. for his talk) Other not implemented grammar rules (eg. Although… but,) Uncountable nouns

LanguageTool To improve: Add and modify the current grammar rules to the LanguageTool Hybrid with deep learning for complementation

Rules in grammar.xml Steps: Example: Third person singular with “I” Split a sentence into a sequence of tokens Check if it matches the token pattern of an xml rule Return a message if the token pattern matches Example: Third person singular with “I” Input: I goes to to school by bus. Xml rule: Agreement error - Third person verb with I Token 1: I Token 2: VBZ (Verb, 3rd ps. sing. present: eats, jumps, believes, is, has)] Return: The pronoun ‘I’ must be used with a non-third-person form of a verb: go LanguageTool contains a POS dictionary

However, in real situation, there are many exceptions have to be added The rule pattern in xml However, in real situation, there are many exceptions have to be added Examples: Extra adverb token: I recently goes to… (fail to include) ‘I’ as a number: Phase I corresponds to…(fail to exclude) ‘I’ as a letter: I is the ninth letter of alphabet. (fail to exclude) These can be done using attribute “exception” and “skip” for <token>

The actual rule pattern in grammar.xml postag, exception, skip, and scope are common conditions used in grammar.xml Current limitations: fail in excluding ‘Paper I’, ‘article I’, ‘I also recently goes to …’, etc. skip=“1”: allow an optional arbitrary word follows the token. Includes: I recently goes to… <exception> with scope =“previous”: filter cases with word “phase” before “I”. Excludes: Phase I corresponds to… <exception> at the second token. Excludes: I is the letter…

Another example: Third person singular with “you” Rule pattern Token 1: you Token 2: VBZ Includes: You goes to school. You is a boy. Anti-rule pattern: Excludes: What I have told you is true. Except the previous token of you is ‘IN’ (Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without,…) Excludes: One of you goes to school. The man nearest you is awake. Except with negate = double negation Require the previous token of the verb is RB/PRP/DT (Adverb, negation, Personal pronoun, determiner) : Excludes: Do I have to tell you he isn't here?

Making a custom rule Problem: Math lessons used English. Generalize: (Noun/adjective) + lesson + use + (Noun/adjective)

Making a custom rule Outcome

Neural network rule One of the few Java rules Will be a new feature in the coming release of LanguageTool Resolve confusing words using neural network Eg. causal, causal; Context: “well as causal/casual wears .” 64x1 well as wears . 256x1 2x1 y=softmax(Wx+b) concatenate causal casual W: weight matrix Will be updated during training y x

Neural network rule – training and validation Resolving [causal, casual] Corpus from Wikipedia articles ~3GB Number of training sets: [979,2765] Validation sets: [106, 310] Results: correct: [48, 243] incorrect: [14, 11] Accuracy: [77%, 96%] unclassified: [44, 56] (min abs score > 0.5) Training samples: is a causal association because or the causal plane or . The causal plane is , the causal plane is friendly , casual script after well as casual wears . popular among casual players . and a causal agent of conclusions about causal links , may miss causal relationships . and no causal connection has

Neural network rule – working in languagetool

END Useful links of Languagetool Online demo: https://languagetool.org/ Xml syntax overview: http://wiki.languagetool.org/development-overview Online xml rule editor: https://community.languagetool.org/ruleEditor2/ neural network rule: https://github.com/gulp21/languagetool-neural-network Tagset: https://github.com/languagetool-org/languagetool/blob/master/languagetool-language- modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt Thank you