Download presentation
Presentation is loading. Please wait.
Published bySilvia Butler Modified over 9 years ago
1
A* Search Uses evaluation function f (n)= g(n) + h(n) where n is a node. – g is a cost function Total cost incurred so far from initial state at node n For 8-puzzle, each move has equal cost – h is an heuristic that estimates cost to goal
2
(Hamming distance)
3
A* Pseudocode create the open list of nodes, initially containing only our starting node create the closed list of nodes, initially empty while (we have not reached our goal) { consider the best node in the open list (the node with the lowest f value) if (this node is the goal) { then we're done } else { move the current node to the closed list and consider all of its successors for (each successor) { if (this successor is in the closed list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else if (this successor is in the open list and our current g value is lower) { update the successor with the new, lower, g value change the successor's parent to our current node } else this successor is not in either the open or closed list { add the successor to the open list and set its g value } } } }
4
Go over HW 2
5
Go over HW 3 Nim Demo: http://www.math.uri.edu/~bkaskosz/flash mo/marien.html
6
Presentations on Wednesday: Nathan Nifong Matthaus Litteken
7
Example of statistical language models: n-grams Estimates probability distribution of a word w, given n-1 words that have come before in the sequence. P(w|w 1, w 2, …, w n-1 ) Purpose: to guess next word from previous words to disambiguate: – “Students have access to a list of course requirements” – “Would you like a drink of [garbled]? – “He loves going to the [bark].”
8
N grams: Applications throughout natural language processing: – text classification – speech recognition – machine translation – intelligent spell-checking – handwriting recognition – playing “Jeopardy!”
9
What is P(bark|he loves going to the) ? What is P(park|he loves going to the) ? Can estimate from a large corpus: – P (w |w 1,.., w n-1 ) = frequency of w 1 …w n-1 w divided by frequency of w 1 …w n-1 – Example: Use GoogleGoogle
10
Problem! Web doesn’t give us enough examples to get good statistics. One solution: – Approximate P (w|w 1,.., w n-1 ) by using small n (e.g., n =2: bigrams). Bigram example: P (bark|the) vs. P(park|the) (calculate using Google) Trigram: P (bark|to the) vs. P (park|to the)
11
Typically, bigrams are used:
12
Now can calculate probability of utterance: P (he loves going to the bark) P(he| ) P (loves| he) P (going| loves) P (to| going) P (the| to) P (bark | the) P( bark) = sentence start marker = sentence end marker
13
Text classification using N-grams
14
Mini-example (Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): I am he as you are he I am the Walrus I am the egg man Class 1 Bigram Probabilities (examples): P(I | ) = 1P(am | I) = 1P( man | egg) = 1 P(are | you) = 1 P (egg | the) =.5P(the | am) =.67
15
Mini-example (Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): I am he as you are he I am the Walrus I am the egg man Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | ) = 1P(am | I) = 1P( man | egg) = 1 P(are | you) = 1 P (egg | the) =.5P(the | am) =.67 New sentence 1: “They are the egg man”
16
Mini-example (Adapted from Jurafsky & Martin, 2000) Corpus 1 (Class 1): Corpus 2 (Class 2): I am he as you are he I am the Walrus I am the egg man Class 1 Bigram Probabilities (examples): Class 2 Bigram Probabilities (examples): P(I | ) = 1P(am | I) = 1P( man | egg) = 1 P(are | you) = 1 P (egg | the) =.5P(the | am) =.67 New sentence 1: “They are the egg man” New sentence 2: “Goo goo g’joob”
17
N-gram approximation to Shakespeare (Jurafsky and Martin, 2000) Trained unigram, bigram, trigram, and quadrigram model on complete corpus of Shakespeare’s works (including punctuation). Use these models to generate random sentences by choosing new unigram/bigram/trigram/quadrigram probabilistically
18
Unigram model 1.To him swallowed confess hear both. Which. Of save on trail for are ay device and rote life have. 2.Every enter now severally so, let 3.Hill he late speaks; or! a more to leg less first you enter 4.Are where exeunt and sighs have rise excellency took of...Sleep knave we. near; vile like.
19
Bigram model 1.What means, sir. I confess she? then all sorts, he is trim, captain. 2.Why dost stand forth thy canopy, forsooth; he is this palpable hit the King Henry. Live king. Follow. 3.What we, hath got so she that I rest and sent to scold and nature bankrupt, nor the first gentleman? 4.Thou whoreson chops. Consumption catch your dearest friend, well, and I know where many mouths upon my undoing all but be, how soon, then; we’ll execute upon my love’s bonds and we do you will?
20
Trigram model 1.Sweet prince, Falstaff shall die. Harry of Monmouth’s grave. 2.This shall forbid it should be branded, if renown made it empty. 3.Indeed the duke; and had a very good friend. 4.Fly, and will rid me these news of price. Therefore the sadness of parting, as they say, ‘tis done.
21
Quadrigram model 1.King Henry. What! I will go seek the traitor Gloucester. Exeunt some of the watch. A great banquet serv’d in; 2.Will you not tell me who I am? 3.Indeed the short and long. Marry, ‘tis a noble Lepidus. 4.Enter Leonato’s brother Antonio, and the rest, but seek the weary beds of people sick.
22
From Cavnar and Trenkle N-gram-based text categorization (1994) Early paper, but clearly lays out main ideas of n-gram text classification. Categorization of USENET newsgroups – by language – by topic
23
Categorization requirements
24
N-grams (in this paper) N-character slice (rather than N-word slice) Examples:
25
Advantages of using character n-grams versus word n-grams Less sensitive to errors (e.g., in OCR documents) Helps deal with limited statistics problem (some words might not appear in document)
26
Frequency distribution of n-grams Zipf’s law: Frequency (n-gram) ≅ 1 / rank(n-gram) Also true for words
28
Generate profile of document Can also do this for entire category by putting all n-grams from all category documents in a single “bag of n-grams”
29
Observations
32
Measure profile distance Given profile for entire category (e.g., “cryptography”), can calculate distance from a new document to that category by comparing their profiles. For each n-gram in document profile, calculate how “out of place” it is in rank compared with its rank in the category profile.
34
Classifying document To classify document D, calculate its distance from each category, and choose the category with minimum distance (must be below some threshold distance). If no category is below threshold distance, then class of D is “not known”.
35
Cavnar and Trenkle’s Results Used newsgroup FAQs as “category” documents from which to “learn” n-gram models Results (Confusion Matrix)
36
Smoothing Needed to overcome problem of sparse data. E.g., even in a large corpus, can get zero probability for valid bigrams. Laplace smoothing (or “add-one” smoothing): Add 1 to all the bigram counts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.