Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising infinite languages? i.e. given a language description and a string, is there an algorithm which will answer yes or no correctly? We will define an abstract machine which takes a candidate string and produces the answer yes or no. The abstract machine will be the specification of the language.
Finite State Automata A finite state automaton is an abstract model of a simple machine (or computer). The machine can be in a finite number of states. It receives symbols as input, and the result of receiving a particular input in a particular state moves the machine to a specified new state. Certain states are finishing states, and if the machine is in one of those states when the input ends, it has ended successfully (or has accepted the input). Example: A a b b b a a a,b
Formal definition of FSAs We present here the special case of a Deterministic FSA (DFSA) –As proven in CS2013, DFSAs can recognise the same set of languages as Nondeterministic FSAs (NDFSAs)
DFSA: Formal Definition A DFSA is a 5-tuple (Q, I, F, T, E) where: Q = states Q is a finite set; I = initial state I is an element of Q; F = final states F is a subset of Q; T = an alphabet; E = edges E is a partial function from Q T Q. FSA can be represented by a labelled, directed graph =set of nodes (some final; one initial) + directed arcs (arrows) between nodes + each arc has a label from the alphabet. Example: formal definition of A 1 Q = {1, 2, 3, 4} I = {1} F = {4} T = {a, b} E = { (1,a,2), (1,b,4), (2,a,3), (2,b,4), (3,a,3), (3,b,3), (4,a,2), (4,b,4) } A1A a b b b a a a,b
What does it mean to accept a string/language? If (x,a,y) is an edge, x is its start state and y is its end state. A path is a sequence of edges such that the end state of one is the start state of the next. path p 1 = (2,b,4), (4,a,2), (2,a,3) A path is successful if the start state of the first edge is an initial state, and the end state of the last is a final state. path p 2 = (1,b,4),(4,a,2),(2,b,4),(4,b,4) The label of a path is the sequence of edge labels. label(p 1 ) = baa.
What does it mean to accept a string/language? A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1.
A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1. The language accepted by A 1 is a b b b a a a,b
A string is accepted by a FSA if it is the label of a successful path. Let A be a FSA. The language accepted by A is the set of strings accepted by A, denoted L(A). babb = label(p 2 ) is accepted by A 1. The language accepted by A 1 is the set of strings of a's and b's which end in b, and in which no two a's are adjacent a b b b a a a,b
Some simple examples (assuming determinism) 1.Draw an FSA to accept the set of bitstrings starting with 0 2.Draw an FSA to accept the set of bitstrings ending with 0 3.Draw an FSA to accept the set of bitstrings containing a sequence 00 4.Draw an FSA to accept the set of bitstrings containing both 1 and 0 5.Can you draw an FSA to accept the set of bitstrings that contain an equal number of 0 and 1?
L1. Bitstrings starting with 0 q1 q2 q
L1. Bitstrings starting with 0 q1 q2 q Can you make a smaller FSA that accepts the same language?
L2. Bitstrings ending with 0 q1 q2 q At home: Can you find a smaller FSA that accepts the same language?
L3. Bitstrings containing 00 q1 q3 0 1 q
L5. Bitstrings with 0 and 1
L5. Bitstrings with equal numbers of 0 and 1 This cannot be done. FSAs are not powerful enough. Later we shall meet automata that can do it Later: Some problems cannot be solved by any automaton
Recognition Algorithm Problem: Given a DFSA, A = (Q,I,F,T,E), and a string w, determine whether w L(A). Note: denote the current state by q, and the current input symbol by t. Since A is deterministic, (q,t) will always be a singleton set or will be undefined. If it is undefined, denote it by ( Q). Algorithm: Add symbol # to end of w. q := initial state t := first symbol of w#. while (t # and q ) begin q := (q,t) t := next symbol of w# end return ((t == #) & (q F))
Minimum Size of FSA's Let A = (Q,I,F,T,E). Definition: For any two strings, x, y in T*, x and y are distinguishable w.r.t. A if there is a string z T* s.t. exactly one of xz and yz are in L(A). z distinguishes x and y w.r.t. A. This means that with x and y as input, A must end in different states - A has to distinguish x and y in order to give the right results for xz and yz. This is used to prove the next result: Theorem: (proof omitted) Let L T*. If there is a set of n elements of T* s.t. any two of its elements are distinguishable w.r.t. A, then any FSA that recognises L must have at least n states.
Applying the theorem (1) L3 (above) = the set of bitstrings containing 00 Distinguishable: all of {11,10,00} {11,10}: 100 is in L, 110 is out, {11,00}: 001 is in L, 111 is out, {10,00}: 001 is in L, 101 is out. {11,10,00} has 3 elements, hence, the DFA for L3 requires at least 3 states
Applying the theorem (2) L5 again: equal numbers of 0 and 1 n=2: {01,001} need 2 states n=3: {01,001,0001} need 3 states n=4: {01,001,0001,00001} need 4 states … For any finite n, there’s a set of n elements that are distinguishable the FSA for L4 would need more than finitely many states, (which is not permitted)!
A taste of the theory of Formal Languages This theorem tells you something about the kind of automaton (in terms of its number of states) that’s required given a particular kind of problem (i.e., a particular kind of language) It also tells you that certain languages cannot be accepted by any FSA
*Deterministic* FSAs Note (i) there are no -labelled edges; (ii) for any pair of state and symbol (q,t), there is at most one edge (q,t, p); and (iii) there is only one initial state. DFSA and NDFSA stand for deterministic and non-deterministic FSA respectively. At home: revise old definition of FSA, to become the definition of a DFSA. All three conditions must hold Note: acceptance (of a string or a language) has been defined in a declarative (i.e., non- procedural) way.
Automata with Output (sketch) We now have ways to formally define languages, and ways to automatically test whether a given string is a member of a language. We also have a simple model of a computer. Can we extend this model, so that instead of simply replying "yes" or "no" when presented with input, our abstract machine writes some output from an alphabet, determined by the input string?
Moore Machines A Moore Machine is a 6-tuple (Q,I,T,E, ,O), where Q, I, T and E are as for DFSA's, is an alphabet (the output alphabet), and O is a set of pairs (a function : Q ) defining the output corresponding to each state of the machine. Graph: if (q, x) O, then draw state q as q/x Example: print out a 1 every time an aab substring is input. 0/ 1/ 2/ 3/ b b b b a a a a aaababaaab gives output of 11.
Mealy Machines A Mealy Machine is a 6-tuple (Q,I,T,E, ,O), where Q,I,T and E are as for DFSA's, is an alphabet (the output alphabet), and O is a set of triples (a function Q T ) defining the output corresponding to each (state,symbol) pair of the machine. Graph: if (q,t,x) O, then draw it as an arc from q labelled t/x. Example: read in a binary number (reversed), and output the (reversed) number one larger. 0/1 0/0, 1/1 0/1 1/ input gives as output.
Moore-Mealy Equivalence Let M be a Moore machine or a Mealy machine, with output alphabet . Define M o (w) to be the output of M on w. Let M 1 = (Q 1,I 1,T 1,E 1, 1,O 1 ) be a Moore machine, and M 2 = (Q 2,I 2,T 2,E 2, 2,O 2 ) be a Mealy machine. Let M o ( ) = b. M 1 and M 2 are equivalent if T 1 = T 2 and for all strings w T 1 *, M 1 0 (w) = b M 2 0 (w) Theorem: Moore-Mealy Equivalence If M 1 is a Moore machine, then there exists a Mealy machine, M 2, equivalent to M 1. If M 2 is a Mealy machine, then there exists a Moore machine, M 1, equivalent to M 2.
Varieties of FSAs FSAs with output have applications e.g. in the design of machines (e.g. a vending machine: reads pounds, pennies, etc; outputs a chocolate bar); coding and decoding of messages; etc. Many other variations on the theme of FSAs exist: Markov models (FSAs whose edges have probabilities) Hidden Markov models (with hidden states) These have many applications, e.g. in Natural Language Processing
Markov Models Markov Models model the probability of one state following another. (Assumption: only the previous state matters.) Example: a simple weather model:
Applications of Markov models In Natural Language Processing, Markov Models are often used For instance, they can model how a word is pronounced as phones (“speech sounds”) –State transition model, where each state is a phone Probability of a path through the model is the product of the probabilities of the transitions
Markov models for word pronunciation
Pronunciation Example Suppose this string of Phones has been recognized: [aa n iy dh ax] (Assume we know this for sure) Three different utterances (i.e., stings of words) may explain what was recognized: 1.I/[aa] need/[n iy] the/[dh ax] 2.I/[aa] the/[n iy] the/[dh ax] 3.On/[aa n] he/[iy] the/[dh ax]
Probabilities (based on the Markov Model) Probabilities of paths: –I/[aa] need/[n iy] the/[dh ax].2 *.12 *.92*.77 =.017 –I/[aa] the/[n iy] the/[dh ax].2 *.08 *.12 *.92*.77 =.001 –On/[aa n] he/[iy] the/[dh ax] 1 *.1 *.92*.77 =.071
Ranked Probabilities Ranked list of paths: On he the (.071) I need the (.017) I the the (.001) etc.. System would only keep top N
These probabilities can help a speech recognition system Other information is needed as well: –Probability of each phone –Probability of each word –Probability that one word follows another (Can “on he the” be part of an English sentence?) More about this in level 4 (NLP course).
Footnote (not for exam) Hidden Markov models States S1,..Sn cannot be observed Observations O1,..,Om can be made Given an observation O, each State has a probability But the probability of a State depends on the previous state as well For those you you who know them: this makes Hidden Markov Models a special case of a Bayesian network
Footnote (not for exam): Hidden Markov Model Definition Hidden Markov Model = (A, B, π) A={a ij }: State transition probabilities a ij =P(q t+1 =S j | q t =S i ) π={π i }: initial state distribution π i =P(q 1 =S i ) Β={b i (v)}: Observation probability distribution b i (v)=P(O t =v | q t =S i )
Summing up Finite automata are flexible tools –Important in practical applications DFSAs are relatively simple automata –Variants of DFSAs can produce output (and/or use probabilities) Later in this course: More complex automata called Turing Machines (TM) –Some variants of TMs produce output