S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Grammar Recipes, Grammar Ideas and Writing Labs
Chaucer Skills and Principles Day 1 Unclear Antecedent An antecedent is the noun to which a pronoun refers. If the antecedent is unclear- difficult to.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Logo Lesson 5 TBE Fall 2004 Farah Fisher. Prerequisites  Given a shape, use basic Logo commands and/or a procedure to draw the shape, with and.
Part of speech (POS) tagging
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
Syntax.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Linguistics 101: Review Gareth Price. New Site for Powerpoints
Albert Gatt Corpora and Statistical Methods Lecture 9.
English Language and Literature Prelim Lesson: Investigating Language Use in ‘The Handmaid’s Tale’
Exercise 2. No.1  (Worse) Either the supply or consumers determines the market outcome.  (Better) Either the supply or consumers determine the market.
Copyright © Cengage Learning. All rights reserved.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Grammar Goodies Subject Verb Agreement Basic Rule Singular subjects need singular verbs. Plural subjects need plural verbs.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Descriptive Statistics A Short Course in Statistics.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
A Language Independent Method for Question Classification COLING 2004.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Artificial Intelligence: Natural Language
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Part-of-speech tagging
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Part-of-Speech Tagging with Limited Training Corpora Robert Staubs Period 1.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
I can use close-reading strategies to interpret increasingly challenging text. I can describe how archetypal images are used in literature. I can apply.
Validation & Verification Today will look at: The difference between accuracy and validity Explaining sources of errors and how they could be overcome.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Language Model for Machine Translation Jang, HaYoung.
Natural Language Processing Vasile Rus
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
LING/C SC 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
The CoNLL-2014 Shared Task on Grammatical Error Correction
Brian Nisonger Shauna Eggers Joshua Johanson
Statistical n-gram David ling.
Chapter 10: Compilers and Language Translation
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015

Mathematical Models A mathematical model is a simplification of a real world situation. It essentially tries to make predictions about some system, where we then hopefully can test how good the predictions are using a statistical test, before refining the model to make better predictions. Interestingly I taught ‘Machine Learning’ and ‘Computational Linguistics’ classes to graduate/undergraduate students while at Oxford, which is doing this very thing! My/PRP$ dog/NN also/RB likes/VBZ eating/VBG sausage/NN./. Possessive pronoun. Noun Adverb Verb, 3 rd person singular present Verb, gerund. My dog also likes eating sausage. (Using the Stanford tagger) In Computational Linguistics, a Part-Of-Speech tagger is a system that predicts the most likely ‘types’ of each word. As you might imagine, such tagging is extremely useful in grammar checking, predictive text, dialogue systems, question answering systems, etc., and a fuller syntactic analysis can lead to building a sense of ‘meaning’ of the sentence (i.e. semantics).

Example We could get around 90% accuracy just by tagging each word with its most common tag in English usage. But the biggest difficult is dealing with heteronyms, words with the same spelling but different word types, as above. The potential steps in building such a system are: 1. Collecting data We can train a system but collecting a whole bunch of sentences which are already tagged. Thankfully someone has already done this! This is known as ‘supervised learning’, because in the training data we’ve fully indicated the correct tagging, but amazingly it’s possible to build systems with just raw sentences (known as ‘unsupervised learning’). People have literally hand- crafted these syntax trees for a huge body of text. The tree shows the full grammatical structure of the sentence – we’re just interested at the tags at the bottom of the tree.

Example 2. Building a model We need some model that inputs a sentence and spits out a tagged sentence. We typically use something called ‘n-grams’, where we observe counts of two words together (bigrams) or three words (trigrams): The probability of the bigram ‘happy cat’ appearing, given any randomly chosen word pair in any published piece of literature. (Click image to view online) ‘Cat Renaissance’

Example 2. Building a model (continued) To keep the system simple, we made a simplifying assumption that words are only dependent on the previous word (e.g. a noun is most likely to follow an article such as ‘the’, but we don’t care about previous words). Given this, we can use a Naïve Bayes Classifier to put all the probabilities together so we have a single probability for a complete tagging for a complete sentence. We can use something called the Viterbi Algorithm to construct the most likely sequence of tags given all our probabilities. The probabilities from tag to tag form something called a Markov Model.

Example 3. Testing To see how good our system is, we then try out the tagger on some fresh sentences (i.e. sentences we didn’t train the system with!) and compare the predicted tags with the actual tags. The/DT solider/NN decided/VBD to/TO desert/NN his/PRP$ … Correct tagging: Predicted tagging by our system:  4. Revise model If our system is poor is might be because some our simplifying assumptions (e.g. that a part-of-speech tag like NN only depends on the tag of the previous word) is poor. We might then decide to change our model, whether to either tweak certain parameters/probabilities, or change the model altogether, e.g. use trigrams such as VBD-TO-VB rather than just bigrams.

Stuff that could appear in exams There have been three questions since 2000 that have appeared on this chapter in exams, the most recent in Jan 2006 Q5/Jan 2007 Q6a ? June 2015 Q4a (pens at the ready)

Stuff that could appear in exams Jan 2007 Q6b Model used to make predictions. Experimental data collected. Model is refined. ? ? ?