Download presentation
Presentation is loading. Please wait.
1
LanguageTool 3 David Ling
2
Contents LanguageTool Rule xml syntax and customization
Overview on rules Web demonstration Performances on students’ scripts Rule xml syntax and customization Token, exception, POSTag, skip Example: Third person singular Making a custom example: Math lessons use English Java rules: neural network Resolving a custom confusion pairs: causal, casual
3
LanguageTool LanguageTool To use your own LanguageTool, you can
Open source grammar checking Java program Rule-based, highly customizable Input features for the rules: POSTag, word pattern, Chunking-tag To use your own LanguageTool, you can double click ‘languagetool.jar’ via windows command line prompt cmd Run a local http server, connect via browsers Online demo available:
4
LanguageTool rules Two main kinds of rules
LanguageTool contains a POS dictionary Two main kinds of rules Xml rules Java rules Xml rules are customizable. Two corresponding files: Disambiguation.xml for reducing multiple POSTags of a token, 346 rules Grammar.xml for grammar rules, ~1700 rules Modal verb Noun Verb (base form) Verb (3rd person singular) Adjective
5
Grammar rules in grammar.xml
Categories of xml rules Number of rules 1 Possible typo 506 2 Grammar 405 3 Collocations 9 4 Miscellaneous 21 5 Punctuation Errors 48 6 Commonly Confused Words 241 7 Nonstandard Phrases 8 Redundant Phrases 159 Style 17 10 Semantic 13 11 Plain English (default: off) 92 12 Wikipedia (default: off) Typography 14 Misused terms in EU publications, Gardner (default: off) 149 Total: 1704
6
Rule examples (name and outcome)
Grammar all/most/some (of) + noun < correction="All students|All of the students">All of students like mathematics. both... as well as (and) < correction="and">He is both very rich as well as handsome. Use of past form with 'going to ...' < correction="write">I'm going to wrote him. inspired with (by) < correction="inspired by">The artist was inspired with the beauty of the mountains. beware PREPOSITION < correction="Beware of">Beware about malware. objective case after with(out)/at/to/... < correction="to me|to her|to him|to us|to them">Give it to I.
7
Rule examples (name and outcome)
Redundant phrases absolutely essential/necessary (essential/necessary) < correction="essential">This is absolutely essential. established fact (fact) < correction="a fact">This is an established fact. there are also other (also) < correction="there are other|there are also">However, there are also other marbles in the jar. Punctuations extraneous apostrophes before ‘are’ < correction="cars">The car's are cheap. Comma after a month < correction="October 1958">The store closed its doors for good in October, 1958. Missing comma between day of month and year < correction="October 18,">My birthday is October
8
Students’ scripts 1
9
Students’ scripts 2
10
Students’ scripts 3 Fail to check:
Misusing of prepositions: for (1st line) Missing prepositions: to (4th line) Incorrect word: force (4th line) Able to check: Misuse of ‘much’ and ‘many’ (7th line)
11
Examples by teachers Syntax/ Discourse Semantics (using of wrong word)
Example for the neural network at a later part Unable to check: Since… therefore, although …but
12
LanguageTool Able to check: Example limitations on the current rules:
Spelling 1st/2nd/3rd person singular Adverb + noun (eg. simply question) Some common phrases: concerned about, regarding to Example limitations on the current rules: Unable to tackle long and complex phrases (eg. why these video can became) False alarm: (eg. unseen named entities) Limited in resolving confusing words (eg. Casual, causal) Prepositions (eg. for his talk) Other not implemented grammar rules (eg. Although… but,) Uncountable nouns
13
LanguageTool To improve:
Add and modify the current grammar rules to the LanguageTool Hybrid with deep learning for complementation
14
Rules in grammar.xml Steps: Example: Third person singular with “I”
Split a sentence into a sequence of tokens Check if it matches the token pattern of an xml rule Return a message if the token pattern matches Example: Third person singular with “I” Input: I goes to to school by bus. Xml rule: Agreement error - Third person verb with I Token 1: I Token 2: VBZ (Verb, 3rd ps. sing. present: eats, jumps, believes, is, has)] Return: The pronoun ‘I’ must be used with a non-third-person form of a verb: go LanguageTool contains a POS dictionary
15
However, in real situation, there are many exceptions have to be added
The rule pattern in xml However, in real situation, there are many exceptions have to be added Examples: Extra adverb token: I recently goes to… (fail to include) ‘I’ as a number: Phase I corresponds to…(fail to exclude) ‘I’ as a letter: I is the ninth letter of alphabet. (fail to exclude) These can be done using attribute “exception” and “skip” for <token>
16
The actual rule pattern in grammar.xml
postag, exception, skip, and scope are common conditions used in grammar.xml Current limitations: fail in excluding ‘Paper I’, ‘article I’, ‘I also recently goes to …’, etc. skip=“1”: allow an optional arbitrary word follows the token. Includes: I recently goes to… <exception> with scope =“previous”: filter cases with word “phase” before “I”. Excludes: Phase I corresponds to… <exception> at the second token. Excludes: I is the letter…
17
Another example: Third person singular with “you”
Rule pattern Token 1: you Token 2: VBZ Includes: You goes to school. You is a boy. Anti-rule pattern: Excludes: What I have told you is true. Except the previous token of you is ‘IN’ (Preposition/subordinate conjunction: except, inside, across, on, through, beyond, with, without,…) Excludes: One of you goes to school. The man nearest you is awake. Except with negate = double negation Require the previous token of the verb is RB/PRP/DT (Adverb, negation, Personal pronoun, determiner) : Excludes: Do I have to tell you he isn't here?
18
Making a custom rule Problem: Math lessons used English.
Generalize: (Noun/adjective) + lesson + use + (Noun/adjective)
19
Making a custom rule Outcome
20
Neural network rule One of the few Java rules
Will be a new feature in the coming release of LanguageTool Resolve confusing words using neural network Eg. causal, causal; Context: “well as causal/casual wears .” 64x1 well as wears . 256x1 2x1 y=softmax(Wx+b) concatenate causal casual W: weight matrix Will be updated during training y x
21
Neural network rule – training and validation
Resolving [causal, casual] Corpus from Wikipedia articles ~3GB Number of training sets: [979,2765] Validation sets: [106, 310] Results: correct: [48, 243] incorrect: [14, 11] Accuracy: [77%, 96%] unclassified: [44, 56] (min abs score > 0.5) Training samples: is a causal association because or the causal plane or . The causal plane is , the causal plane is friendly , casual script after well as casual wears . popular among casual players . and a causal agent of conclusions about causal links , may miss causal relationships . and no causal connection has
22
Neural network rule – working in languagetool
23
END Useful links of Languagetool
Online demo: Xml syntax overview: Online xml rule editor: neural network rule: Tagset: modules/en/src/main/resources/org/languagetool/resource/en/tagset.txt Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.