S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015
Mathematical Models A mathematical model is a simplification of a real world situation. It essentially tries to make predictions about some system, where we then hopefully can test how good the predictions are using a statistical test, before refining the model to make better predictions. Interestingly I taught ‘Machine Learning’ and ‘Computational Linguistics’ classes to graduate/undergraduate students while at Oxford, which is doing this very thing! My/PRP$ dog/NN also/RB likes/VBZ eating/VBG sausage/NN./. Possessive pronoun. Noun Adverb Verb, 3 rd person singular present Verb, gerund. My dog also likes eating sausage. (Using the Stanford tagger) In Computational Linguistics, a Part-Of-Speech tagger is a system that predicts the most likely ‘types’ of each word. As you might imagine, such tagging is extremely useful in grammar checking, predictive text, dialogue systems, question answering systems, etc., and a fuller syntactic analysis can lead to building a sense of ‘meaning’ of the sentence (i.e. semantics).
Example We could get around 90% accuracy just by tagging each word with its most common tag in English usage. But the biggest difficult is dealing with heteronyms, words with the same spelling but different word types, as above. The potential steps in building such a system are: 1. Collecting data We can train a system but collecting a whole bunch of sentences which are already tagged. Thankfully someone has already done this! This is known as ‘supervised learning’, because in the training data we’ve fully indicated the correct tagging, but amazingly it’s possible to build systems with just raw sentences (known as ‘unsupervised learning’). People have literally hand- crafted these syntax trees for a huge body of text. The tree shows the full grammatical structure of the sentence – we’re just interested at the tags at the bottom of the tree.
Example 2. Building a model We need some model that inputs a sentence and spits out a tagged sentence. We typically use something called ‘n-grams’, where we observe counts of two words together (bigrams) or three words (trigrams): The probability of the bigram ‘happy cat’ appearing, given any randomly chosen word pair in any published piece of literature. (Click image to view online) ‘Cat Renaissance’
Example 2. Building a model (continued) To keep the system simple, we made a simplifying assumption that words are only dependent on the previous word (e.g. a noun is most likely to follow an article such as ‘the’, but we don’t care about previous words). Given this, we can use a Naïve Bayes Classifier to put all the probabilities together so we have a single probability for a complete tagging for a complete sentence. We can use something called the Viterbi Algorithm to construct the most likely sequence of tags given all our probabilities. The probabilities from tag to tag form something called a Markov Model.
Example 3. Testing To see how good our system is, we then try out the tagger on some fresh sentences (i.e. sentences we didn’t train the system with!) and compare the predicted tags with the actual tags. The/DT solider/NN decided/VBD to/TO desert/NN his/PRP$ … Correct tagging: Predicted tagging by our system: 4. Revise model If our system is poor is might be because some our simplifying assumptions (e.g. that a part-of-speech tag like NN only depends on the tag of the previous word) is poor. We might then decide to change our model, whether to either tweak certain parameters/probabilities, or change the model altogether, e.g. use trigrams such as VBD-TO-VB rather than just bigrams.
Stuff that could appear in exams There have been three questions since 2000 that have appeared on this chapter in exams, the most recent in Jan 2006 Q5/Jan 2007 Q6a ? June 2015 Q4a (pens at the ready)
Stuff that could appear in exams Jan 2007 Q6b Model used to make predictions. Experimental data collected. Model is refined. ? ? ?