LING 388 Language and Computers Lecture 4 9/11/03 Sandiway FONG
Administrivia Homework 1 Homework 1 account for submissions Need help with Homework 1? Ask Charles after class today I’ll also be in Douglass 309 this afternoon Homework hint Number of inference steps? Count the number of calls in the debugger
Administrivia Question from Tuesday’s computer lab class Question from Tuesday’s computer lab class How to make SWI-Prolog print the entire list? Answer: ?- set_prolog_flag(toplevel_print_options,[max_depth(0)]).
Strings and Languages String: String: sequence of characters, possibly empty constructor operation: concatenation Examples: a aaabc (empty string) Alphabet: Alphabet: set of possible characters Examples: {a,b} {0,1,2,3,4,5,6,7,8,9} Language: Language: set of strings defined over an alphabet Examples: {,a,aa,aaa,aaaa} {}
Regular Expressions Regular Expressions Regular Expressions shorthand for describing sets of strings Examples: string + - set of one or more occurrences of string a + = {a, aa, aaa, aaaa, aaaaa, …}a + = {a, aa, aaa, aaaa, aaaaa, …} (abc) + = {abc, abcabc, abcabcabc, …}(abc) + = {abc, abcabc, abcabcabc, …} string * - set of zero or more occurrences of string a * = {, a, aa, aaa, aaaa, …}a * = {, a, aa, aaa, aaaa, …} (abc) * = {, abc, abcabc, …}(abc) * = {, abc, abcabc, …} Note: a a * = a +a a * = a + a {, a, aa, aaa, aaaa, …}a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …}
Regular Expressions Regular Expressions Regular Expressions Examples: contd. string n - exactly n occurrences of string a 4 b 3 = { aaaabbb }a 4 b 3 = { aaaabbb } (string 1 | string 2 ) - either string 1 or string 2 (a|b) = {a,b}(a|b|c) = {a, b, c}(a|b) = {a,b}(a|b|c) = {a, b, c}
Regular Expressions Regular Expressions contd. Regular Expressions contd. can’t describe all possible languages formally equivalent to regular grammars/finite state automata How to show this?How to show this? –Proof by construction… as we’ll see How to show this? Examples: {a n b n | n>=0}is not regular{a n b n | n>=0}is not regular {ww R | w {a,b} + } is not regular{ww R | w {a,b} + } is not regular –R = reverse, e.g. abc R = cba Proof by Pumping LemmaProof by Pumping Lemma
Regular Expressions Regular Expressions contd. Regular Expressions contd. popularized by string matching utilities - grep in Unix Perl typically augmented with meta-characters denoting classes of characters e.g. [:digit:] = [0-9],. = any single charactere.g. [:digit:] = [0-9],. = any single character special operators negation: e.g. [^0-9]negation: e.g. [^0-9] optionality: e.g. 1?optionality: e.g. 1? for dealing with lines: e.g. ^ (beginning) $ (end)for dealing with lines: e.g. ^ (beginning) $ (end)
Finite State Automata (FSA) Example: Example: Language: L = {a + b + } “one or more a’s followed by one or more b’s” regular language described by a regular expressiondescribed by a regular expression Note: infinite set of strings belonging to language Linfinite set of strings belonging to language L –e.g. abbb, aaaab, aabb, *abab, * –e.g. abbb, aaaab, aabb, *abab, *
Finite State Automata (FSA) sx y a a b b L = {a + b + }
Finite State Automata (FSA) FSA shown on previous slide: FSA shown on previous slide: acceptor - no output cf. transducer - input/output pairs deterministic- no ambiguity i.e. at any given state and input character, the next state is uniquely determined cf. non-deterministic FSA (NDFSA) Note: NDFSA are not more powerful than FSANDFSA are not more powerful than FSA Proof: by constructionProof: by construction
Finite State Automata (FSA) More formally: More formally: Set of states: {s,x,y} Start state: s, end state: y Alphabet: {a, b} Transition function : signature: character x state -> state (a,s)=x (a,x)=x (b,x)=y (b,y)=y
Finite State Automata (FSA) A possible Prolog encoding strategy: A possible Prolog encoding strategy: Define one predicate for each state taking one argument (the input string) consume input character call next state with remaining input string Initially: fsa(L) :- s(L). call start state s
Finite State Automata (FSA) State s: (start state) s([a|L]) :- x(L). match input string beginning with a and call state x with remainder of input State x: x([a|L]) :- x(L). x([b|L]) :- y(L). State y: (end state) y([]). y([b|L]) :- y(L).
Finite State Automata (FSA) Example: 1. ?- fsa([a,a,b]). 2. s([a,a,b). 3. x([a,b]). 4. x([b]). 5. y([]). succeeds
Finite State Automata (FSA) Example: 1. ?- fsa([a,b,a]). 2. s([a,b,a]). 3. x([b,a]). 4. y([a]). fails
Finite State Automata (FSA) Another possible Prolog encoding strategy: Another possible Prolog encoding strategy: fsa(S,L) :- L = [C|M], transition(S,C,T),fsa(T,M). fsa(y,[]).% End state transition(s,a,x).% Encodes transition transition(x,a,x).% function transition(x,b,y). transition(y,b,y).
Finite State Automata (FSA) Example: 1. ?- fsa(s,[a,a,b]). 2. transition(s,a,T). T=x 3. fsa(x,[a,b]). 4. transition(x,a,T’). T’=x 5. fsa(x,[b]). 6. transition(x,b,T”). T”=y 7. fsa(y,[]). succeeds