LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29
Administrivia reminder –homework 2 due tonight
Last Time regular grammars –aka Chomsky hierarchy type-3 grammars –are formal grammars with severe restrictions on what can appear on the RHS –are limited in generative capacity or power –in Prolog DCG notation: x --> y, [t]. x --> [t]. (left recursive variant) or x --> [t],y. x --> [t]. (right recursive variant) –can ’ t have both left and right recursive rules in the same grammar
Last Time regular grammars examples regular languages –“ one or more a ’ s followed by one or more b ’ s ” –sheeptalk {ba!, baa!, baaa!,...} i.e. –can be encoded by a regular grammar beyond regular grammars examples –a n b n = –{ab, aabb, aaabbb,... } –ww R : where w {a,b} + –i.e. any non-empty sequence of a’s and b’s informal idea about the crucial difference “needing to keep track of history”
Today’s Topic Finite State Automata –plus more on what it means to be a regular language Merge Point –Textbook – Chapter 2: Regular Expressions and Automata
+ left & right recursive rules Today’s Topic Finite State Automata –plus more on what it means to be a regular language formally equivalent – in terms of generative capacity or power Regular Grammars FSA Regular Expressions Regular Languages
Some Regular Expression Notation... some notation first (more on regexps next time) Regular Expressions (regexp) shorthand for describing sets of strings Operators: –string + set of one or more occurrences of string a + = {a, aa, aaa, aaaa, aaaaa, …} (abc) + = {abc, abcabc, abcabcabc, …} –Note: parentheses used to delimit the scope of the operator –string * set of zero or more occurrences of string a * = {, a, aa, aaa, aaaa, …} (abc) * = {, abc, abcabc, …} –Note: - zero length string
Some Regular Expression Notation... some notation first Relation between * and + –a a * = a + –“a concatenated with a*” –a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …} Operators: –string n exactly n occurrences of string a 4 b 3 = { aaaabbb } Language = a set of strings
Regular Expressions regular expressions –formally equivalent to regular grammars and finite state automata How to show this? Proof by construction… beyond regular expressions –examples {a n b n | n>0} is not regular {ww R | w {a,b} + } is not regular, e.g. (abc) R = cba –How to show this? –Proof by Pumping Lemma Regular Grammars FSA Regular Expressions
Regular Expressions Example: –Language: L = {a + b + } “one or more a’s followed by one or more b’s” regular language –described by a regular expression Note: –infinite set of strings belonging to language L »e.g. abbb, aaaab, aabb, *abab, * Notation: – is the empty string (or string with zero length) –* means string is not in the language regular grammar s --> [a],b. b --> [a],b. b --> [b],c. b --> [b]. c --> [b],c. c --> [b].
Finite State Automata (FSA) sx y a a b b L = {a + b + } L = {aa * bb * } deterministic FSA (DFSA) no ambiguity about where to go at any given state non-deterministic FSA (NDFSA) no restriction on ambiguity (surprisingly, no increase in power)
Finite State Automata (FSA) more formally –(Q,s,f,Σ, ) 1.set of states (Q): {s,x,y}must be a finite set 2.start state (s): s 3.end state(s) (f): y 4.alphabet ( Σ ): {a, b} 5.transition function : signature: character × state → state (a,s)=x (a,x)=x (b,x)=y (b,y)=y sx y a a b b
Finite State Automata (FSA) practical applications can be encoded and run efficiently on a computer widely used –encode regular expressions –compress large dictionaries –morphological analyzers Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation) see chapter 3 of textbook speech recognizers Markov models = FSA + probabilities and many more …
Finite State Automata (FSA) how: 3 vs. 6 keystrokes michael: 7 vs. 15 keystrokes –T9 text entry (tegic.com) built in to your cellphone predictive text entry for mobile messaging/data entry reduces the number of keystrokes for inputting words on a telephone keypad (8 keys)
RegExp FSA From Regular Expression to FSA Operators –asingle symbol a –a n n occurrences of a –a –a n a 3 a a aa
RegExp FSA Operators –a * zero or more occurrences of a –a + one or more occurrences of a –a * –a + a + = aa * a a a
Regular Grammar FSA examples –s --> [a], t. –x --> [a], x. –x --> [a]. a st a x a x final state y
Next Time Prolog and FSA