LING 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/19
Administrivia review homework #3 new homework #4 –out today –usual rules apply - due next Thursday
Last Time Spelling errors and correction Error Correction –correct Bayesian Probability –Minimum Edit Distance Computation Dynamic Programming
Minimum Edit Distance example –assuming insert =1 delete=1 substitution=2 (or 0 for substituting the same character) recursive formula –incrementally computed from minimum edit distances of shorter strings intent execut intent execu inten execut inten execu one edit operation away L DB min(L+1,D+0,B+1) cost: =8
Minimum Edit Distance Computation one formula Microsoft Excel implementation $ in a cell reference means don’t change when copied from cell to cell e.g. in C$1, 1 stays the same in $A3,A stays the same (not 3) min(C2+1,B3+1,B2+if(C$1=$A3,0,2))min(D2+1,C3+1,C2+if(D$1=$A3,0,2)) min(C3+1,B4+1,B3+if(C$1=$A4,0,2)) inc col inc row row columnprotected
Minimum Edit Distance Computation demo example pairs –intention, intent: –intention, intentional: –intention, ten: –intention, ton: –intention, teen: min edit distance (assuming substitution cost 2)
Homework 3 Review
Question 1 438/538 (4pts) Give the minimum size regular expression for the FSA below (2pt) Minimum size regular expression for the FSA: –a + b* not minimum size in terms of number of symbols: –aa*b* –(aa*)|(aa*b*) s xy a a b ε
Question 1 438/538 (4pts) Give an equivalent FSA without the ε-transition (2pts) –answer in the form of a diagram or formal definition or Prolog definition are all ok Equivalent ε-free FSA s xy a a b ε sab ab ab How to arrive at this answer? by inspection or by consideration of a + b* b* = ε | b + sa a a sb b b
Question 1 438/538 (4pts) Give an equivalent FSA without the ε-transition (2pts) –answer in the form of a diagram or formal definition or Prolog definition are all ok Set-of-States Construction method: s xy a a b ε {s}{x,y}{y} ab aba sab ab ab
Question 2 438/538 (8pts) convert the NDFSA into a deterministic FSA (3pts) figure 2.27 in the textbook {1} a {2} b {3,4} a {2,3} b a {1} a {2} b {3,4} a {2,3} b a set-of-states construction:
Question 2 438/538 (8pts) implement both the NDFSA and the equivalent FSA in Prolog using the “one predicate per state” encoding Prolog code: one([a|L]) :- two(L). two([b|L]) :- three(L). two([b|L]) :- four(L). three([]). three([a|L]) :- two(L). four([a|L]) :- three(L). strings abab and abaaba, how many steps (transitions + final stop)?
Question 2 438/538 (8pts) implement both the NDFSA and the equivalent FSA in Prolog using the “one predicate per state” encoding Prolog code: s1([a|L]) :- s2(L). s2([b|L]) :- s34(L). s34([]). s34([a|L]) :- s23(L). s23([]). s23([b|L]) :- s34(L). s23([a|L]) :- s2(L). {1} a {2} b {3,4} a {2,3} b a strings abab and abaaba, how many steps (transitions + final stop)?
Question 3 438/538 (8pts) (5pts) Give a FSA in Prolog that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0 –examples: – –10 –* FSA:
Question 3 438/538 (8pts) (5pts) Give a FSA in Prolog that accepts a binary string (made up of 0’s and 1’s) if and only if it begins with a 1 and contains exactly one 0 (3pts) Given the regular expression equivalent of the FSA Regular Expression: –11*01*
Homework #4
Question 1 438/538 (8pts) Implement the e-insertion rule (Context-Sensitive) Spelling Rule: (3.5) – e / { x, s, z } ^ __ s# –as a FST in Prolog Goals: –pass through non-matching cases unchanged –implement rule exactly –no deletion of boundaries ^ and #
Question 2 438/538 (6pts) What does the Porter Stemmer output for the following words: –(2 pts) availability –(2 pts) shipping –(2pts) unbelievable Show the steps (stages) in your answer
Question 2 438/538 (6pts) –the Porter Stemmer handles -ement for cases like replacement replac(e) –it doesn’t handle statement stat(e) i.e. it outputs statement –Why? Explain (2pts) –Modify the Porter rule responsible to allow for statement stat(e) Submit your rule (2pts) Give 2 examples where the modified rule would be too liberal, i.e. it overstems (2pts)
Summary Q1: 8pts Q2: 6+6=12pts Total: 20 pts