LanguageTool - Part A 23-10-2017 David Ling.

Slides:



Advertisements
Similar presentations
Keyboarding Objective Apply language skills in keyed documents
Advertisements

Powerful Proofreading
Guidelines for Writing Technical Documents Computer Science 312.
High School Writing Conventions Flipbook Project
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Editing your own work Mark Ragg. Aim To make the document invisible so there is nothing between the reader and the meaning. To make the meaning incomparably.
Keyboarding Objective 3.01 Interpret Proofreader Marks
Grammar Rules. Pronouns 1.Use as a S, DO, PN, or IO 2.Personal pronouns may be adjectives 3.Relative pronouns may introduce adjective clauses.
ENGLISH PUNCTUATION Apostrophes Commas Semi-colons GRAMMAR Subject-Verb Agreement Verb Tense Pronoun – Antecedent Agreement Subject – Object Pronouns Adjectives.
Revising First Drafts What Does It Mean to Revise?
Grammar Unit Prepositions. Let’s Review... The preposition is the sixth of the eight parts of speech. Just for the record, here are all eight: Noun Pronoun.
Three stages of rewriting: Revising, Editing, Proofing Revising: reorganizing, moving text around, reworking paragraphs and sections Revising: reorganizing,
A Remedial English Grammar. CHAPTERS ARTICLES AGREEMENT OF VERB AND SUBJECT CONCORD OF NOUNS, PRONOUNS AND POSSESSIVE ADJECTIVES CONFUSION OF ADJECTIVES.
ENGLISH. PUNCTUATION Apostrophes Commas Semi-colons GRAMMAR Subject-Verb Agreement Verb Tense Pronoun – Antecedent Agreement Subject – Object Pronouns.
The Parts of Speech The 8 Parts of Speech… Nouns Adjectives Pronouns Verbs Adverbs Conjunctions Prepositions Interjections.
A name is just a word. However, it is more than a word. Names have feelings, memories, meanings and histories associated with them. We can find meanings.
Verbals. What are Verbals?  A verbal is a word that is based on a verb and expresses action or a state of being, but is acting as a different part of.
PROJECT EDITING 8th grade Project. WRITING CHECKLIST 8th grade Project.
Prepositions. Definition of a Preposition  A preposition relates the noun or pronoun following it to another word in the sentence.  Examples of frequently.
“Re-entering” your writing to improve depth, clarity, and organization.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
This week’s topic…phrases! Prepositional phrases Verbal phrases Appositives.
Subject/Predicate Bell Ringer…
The “Dos and Don’ts” of College-Level Writing Mini-Lesson #31.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
1 Proofreading & Language Skills Keyboarding Objective Apply language skills in keyed documents.
Extending LanguageTool, a style and grammar checker Daniel Naber, Marcin Miłkowski 11:15 – 13:00 Friday, Workshop Notes v2.0, updated
INFORMATION FOR PARENTS AUTUMN 2014 SPELLING, PUNCTUATION AND GRAMMAR.
SPAG Parent Workshop April Agenda English and the new SPaG curriculum How to help your children at home How we teach SPaG Sample questions from.
Grammar and Composition Review
Being a Writer at St Leonard’s
Parts of Speech Review.
Use a dictionary to answer questions about spelling, syllabication, pronunciation, parts of speech, and definitions. Objectives Use an office reference.
6.00 Proofread and Correct Errors in Keyed Copies.
Or What You Need to Know to Survive Latin I
KS2 SPaG Parent Workshop January 2015
Day 3 – Honors Prepositions and Annotations.
Project editing Ist grade Project.
Project editing IInd grade Project.
Custom rules on subject verb agreement
Year 2 Assessments th October 2017.
Year 3 Key: Programmes of Study in bold print.
Year 4 Objectives: Writing
Year 2 Objectives: Writing
Nouns Nouns not noun noun noun not not
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
DGP: Daily Grammar Practice
How to proofread your Research Paper 1. If a paper is not proofread meticulously before it is submitted, it will earn poor marks due to careless mistakes.
Parts of speech - overview
GRAMMAR قواعد اللغــــــــــة الإنجليزية
web1T and deep learning methods
Credits. Credits Random question generator Credits G1 Grammatical terms and word classes G2 Functions of sentences G3 Combining words, phrases and.
MATHS Wombwell Park Street Primary School Working at the
Project editing 7th grade Project.
The CoNLL-2014 Shared Task on Grammatical Error Correction
6.00 Proofread and Correct Errors in Keyed Copies.
The CoNLL-2014 Shared Task on Grammatical Error Correction
Word Forms, Prepositions and Collocations
LanguageTool David Ling.
Statistical n-gram David ling.
Year 4 Key: Programmes of Study in bold print.
Natural Language Processing
High School Writing Conventions Flipbook Project
Parts of Speech II.
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
Part-of-Speech Tagging Using Hidden Markov Models
All about Phrases.
Editing Process: English 10 Spoken Language
Keyboarding Objective Interpret Proofreaders’ Marks in Documents
Presentation transcript:

LanguageTool - Part A 23-10-2017 David Ling

LanguageTool LanguageTool -- Open source Java program Language_check -- python wrapper of LanguageTool, supports only up to v3.5 (currently v3.9) To use, you can double click ‘languagetool.jar’, or Run as a local host http server via cmd Main papers Daniel Naber, A Rule-Based Style and Grammar Checker, Diploma Thesis, University of Bielefeld, 2003 Marcin Miłkowski, Developing an open-source, rule-based proofreading tool, Software – Practice and Experience 2010, 40 (7), pp. 543-566. DOI: 10.1002/spe.971

Rules in LanguageTool Xml rules Java rules grammar.xml (collaborative) Java rules Rules cannot be handled by xml rules (eg. missing of closing parenthesis, a space after comma) Spell checking n-gram frequency for potential homophones (like there - their) There are only a few Java rules (according to Marcin’s paper in 2010) xml rules use the following input features: word token part of speech of the token – postag (from dictionary) chunk tag of the (by opennlp)

Xml rules Categories of xml rules Number of rules 1 Possible typo 506 2 Grammar 405 3 Collocations 9 4 Miscellaneous 21 5 Punctuation Errors 48 6 Commonly Confused Words 241 7 Nonstandard Phrases 8 Redundant Phrases 159 Style 17 10 Semantic 13 11 Plain English (default: off) 92 12 Wikipedia (default: off) Typography 14 Misused terms in EU publications, Gardner (default: off) 149 Total: 1704

Xml rules – possible typo Notes: MD: modal words JJ.? : adjective VBN: verb, past participle DT: determiner: an, an, all, … rule name = "'as follow' (as follows) " as follow [\.:,—\-–]  suggests “as follows” rule name = "'by' + passive participle (be) " postag = "MD " by postag = "JJ.?|VBN“, except postag = "DT" suggests “be” Example: This can by consistent with…  This can be consistent with Example: It can by found. It can be found.

Xml rules – possible typo Notes: VB[DNPZ]?“: verb infected: use, uses, used, … Xml rules – possible typo rule name="miss use (misuse) “ miss understand|spell|use|place|lead|…|dial, inflected, postag="VB[DNPZ]?“  suggests “mis”+token Example: These words are miss used.  These words are misused. Other randomly selected rules: land lover (landlubber) <correction="landlubber">The sailors considered John to be a serious land lover. I/you/... thing (think) <correction="think|thinks">I thing that's a good idea. to get ride (rid) of <correction="rid"> Let's get ride of that broken chair.

Xml rules - Grammar Rule name = "will follows be ('he is would') " Notes: WP: wh-pronoun: that, whatever, what,… WRB: wh-adverb: however, how,… VB.*: verb MD: modal words infected: be, is, am, are Xml rules - Grammar Rule name = "will follows be ('he is would') " postag = " W(RB|P) " be, infected will|must, infected message: redundant Example: How is would this approach be useful? How is this … or How would this… Rule name="missing verb after 'if there'“ if, <exception scope="previous">as</exception> there <exception postag="VB.*|MD" /> <exception>[´`'’]</exception> message: missing verb Example: If there one who has …  If there is one who has …

Randomly selected xml rules in Grammar some faculty... (some faculty members...) < correction="faculty members">Three faculty support the change. all/most/some (of) + noun < correction="All students|All of the students">All of students like mathematics. both... as well as (and) < correction="and">He is both very rich as well as handsome. Use of past form with 'going to ...' < correction="write">I'm going to wrote him. Who + verb (who know's/knows) < correction="Who cares">Who care's? inspired with (by) < correction="inspired by">The artist was inspired with the beauty of the mountains. beware PREPOSITION < correction="Beware of">Beware about malware. objective case after with(out)/at/to/... < correction="to me|to her|to him|to us|to them">Give it to I.

xml rules – commonly confused words rule name ="and than (then) " and|since than suggest: then rule name="rather/other/different then (than) " rather|other then suggest: than Other rule names: turned of (off) 'economical (economic) growth' etc. in the passed (in the past) too go (to go)

xml rules – redundant phrases & punctuations absolutely essential/necessary (essential/necessary) < correction="essential">This is absolutely essential. established fact (fact) < correction="a fact">This is an established fact. there are also other (also) < correction="there are other|there are also">However, there are also other marbles in the jar. Punctuations extraneous apostrophes before ‘are’ < correction="cars">The car's are cheap. Comma after a month < correction="October 1958">The store closed its doors for good in October, 1958. Missing comma between day of month and year < correction="October 18,">My birthday is October 18 1983.

N-gram data rule Resolve confusing words pair, like their and there Given a confusion list (currently ~600 pairs): eg. (their, there; adapting, adopting) Input sentence: This is there last chance to escape. System will consider 3-gram frequency of ‘there’ with ‘their’: This is there, is there last, there last chance This is their, is their last, their last chance Recommend using their if the probability ratio is greater than a ratio Remarks: n-gram data is from google book ngram viewer Someone is developing word2vec to calculate the probability instead of the 3-gram (context: {this, is, last, chance}, guessing {there, their})

Next time other xml rules spell check chunking by opennlp references: http://wiki.languagetool.org https://community.languagetool.org/rule/list