Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
NATURAL LANGUAGE PROCESSING. Applications  Classification ( spam )  Clustering ( news stories, twitter )  Input correction ( spell checking )  Sentiment.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Algorithms. Introduction Before writing a program: –Have a thorough understanding of the problem –Carefully plan an approach for solving it While writing.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
Solving Equations with Variables on Both Sides
TH-OCR NK. content introduction go to next page background assumptions overall structure chart IPO for overall structure dataflow diagram of overall structure.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
© 2006 SOUTH-WESTERN EDUCATIONAL PUBLISHING 11th Edition Hulbert & Miller Effective English for Colleges Chapter 7 PREPOSITIONS.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Tracking Changes in MS Word. Track Changes Allows you to keep track of the changes you make to a document Extremely helpful when more than one person.
GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics.
LING 388: Language and Computers Sandiway Fong Lecture 27: 12/6.
IR Homework #1 By J. H. Wang Mar. 16, Programming Exercise #1: Vector Space Retrieval - Indexing Goal: to build an inverted index for a text collection.
Some Common Errors in English
IELTS Intensive Writing part two. IELTS Writing Two parts of ielts writing Part one writing about a Graph, chart, diagram Part two is an essay.
Solve a two-step equation by combining like terms EXAMPLE 2 Solve 7x – 4x = 21 7x – 4x = 21 Write original equation. 3x = 21 Combine like terms. Divide.
Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Language Model for Machine Translation Jang, HaYoung.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
FLAX Shaoqun Wu and Ian H. Witten Computer Science Department Waikato University New Zealand Utilizing lexical data from a web-derived.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Teaching vocabulary: Going beyond the textbook
Search in Google's N-grams
Quadratic Classifiers (QC)
DP for Optimum Strategies in Games
Nouns Nouns Verbs Verbs Verbs Verbs Plurals Plurals Categories Side Tabs for Interactive Language Notebooks: Page 1 Pronouns Pronouns Nouns Nouns.
Query by Singing/Humming via Dynamic Programming
Learning Usage of English KWICly with WebLEAP/DSR
National Taiwan University
Daily Grammar Practice Week One Grade 8
Intro to Machine Learning
Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference.
Introduction to Music Information Retrieval (MIR)
Feature Selection for Pattern Recognition
Digital Speech Processing
The BonPatron Vocabulary Guide
3-2: Solving Linear Systems
Lecture 12: Data Wrangling
web1T and deep learning methods
Daily Grammar Practice Week One Grade 8
The CoNLL-2014 Shared Task on Grammatical Error Correction
Deep Neural Networks (DNN)
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Mentor Sentences Sentences Daily Tasks.
Statistical n-gram David ling.
Machine Learning in FinTech
3-2: Solving Linear Systems
Spreadsheets, Modelling & Databases
Introduction to Text Analysis
National Taiwan University
Applications of Heaps J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Query by Singing/Humming via Dynamic Programming
University of Illinois System in HOO Text Correction Shared Task
The quality of choices determines the quantity of Key words
3-2: Solving Linear Systems
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Game Trees and Minimax Algorithm
3-2: Solving Linear Systems
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Jyh-Shing Roger Jang (張智星) CSIE Dept, National Taiwan University
Storing Game Entries in an Array
Warm Up Problem of the Day Lesson Presentation Lesson Quizzes.
Pre and Post-Processing for Pitch Tracking
Computer Programming Tutorial
Presentation transcript:

Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University

2/7 English Preposition Usage Checker English preposition usage checker Goal: To suggest the right usage of English prepositions based on Google Web 1T dataset 20 prepositions in consideration of, to, in, for, with, on, at by, from, up, about, than, after, before, down, between, under, since, without, near. General procedure Generate a candidate set of extended queries based on the given query. Compute the frequency of each element in the candidate set. List top-10 candidates (with nonzero frequencies) based on descending order of the frequency.

3/7 Corpus Google Web 1T dataset Freely available at LDC Introduction: Natural Language Corpus Data by Peter NorvigNatural Language Corpus DataPeter Norvig Applications: Linggle by Jason Chang sLinggleJason Chang Our version: A slim version around 300MB, already under /tmp2/dsa2016_project/ of linux1 to linux13. 2-gram (bigrams) 3-gram (trigrams) 4-gram (fourgrams) 5-gram (fivegrams)

4/7 Two rules for generating the candidate set If there is no prepositions in the query Define EDIT a based on insertion only Candidate set = EDIT a (EDIT a (query)) + EDIT a (query) + query Otherwise Find preposition sequences Define EDIT b based on insert, delete, substitute Find EDIT b set of each preposition sequences Candidate set = Cartesian product of all EDIT b sets, together with the original non-preposition words. Remember to add the original input query to the candidate set of extended queries. Candidate Set

5/7 Examples like listen musiclog in to check with

6/7 Example Input and Output Files Example input have difficulty finding angry at me pleased at me worry for cancer Example output query: have difficulty finding output: 3 have difficulty finding30918 have difficulty in finding4636 to have difficulty finding1174 query: angry at me output: 2 angry with me60929 angry at me24354 query: pleased at me output: 3 pleased me40402 pleased with me10015 pleased for me1067 query: worry for cancer output: 1 worry about cancer2120 At most 10 entries ordered by freq first, then by alphabetic order

7/7 Percentage of scoring 75%: Correctness of your program 25%: Efficiency of your program Formula: 0.75*C+25*(1-rankRatio)*C/100 C: Score based on correctness of answer rankRatio: Zero-based rank of speed divided by no. of students minus 1 The formula will not be changed unless something unexpected happens to destroy the fairness of the original formula. Scoring

8/7 Other tasks that rely on Google Web1T dataset Text normalization Smoothing Grammar/usage check for a given sentence Verbs & nouns collocation Scoring of English articles Real-time computer-assisted composition Demos Linggle Writeahead Future Work