Presentation is loading. Please wait.

Presentation is loading. Please wait.

12. Automata and Regular Expressions

Similar presentations


Presentation on theme: "12. Automata and Regular Expressions"— Presentation transcript:

1 12. Automata and Regular Expressions
Discrete Maths 242/ , Semester 2, 12. Automata and Regular Expressions Recognizing input using: automata: a graph-based technique regular expressions: an algebraic technique equivalent to automata

2 Overview Regular Expressions UNIX Regular Expressions
Introduction to Automata Representing Automata The ‘aeiou’ Automaton Generating Output Deterministic and Nondeterministic Automata Regular Expressions UNIX Regular Expressions From REs to Automata More Information

3 1. Introduction to Automata
A finite state automaton represents a problem as a series of states and transitions between the states the automaton starts in an initial state input causes a transition from the current state to another; a state may be accepting the automaton can terminate successfully when it enters an accepting state (if it wants to)

4 1.1. The ‘even-odd’ Automaton
b b start a evenA oddA a The states are the ovals. The transitions are the arrows labelled with the input that ‘trigger’ them The ‘oddA’ state is accepting. continued

5 Execution Sequence b a b a a evenA b a b a a evenA b a b a a oddA
Input Move to State b a b a a evenA initial state b a b a a evenA the automaton could choose to terminate here b a b a a oddA b a b a a oddA b a b a a evenA stops since no more input b a b a a oddA

6 1.2. The Light Switch Automaton
start press off on press

7 1.3. Simplified TCP Automaton
start 1.3. Simplified TCP Automaton

8 1.4 Game Playing Automaton
start

9 1.5. Wall Bouncing Robot start

10 1.5. Why are Automata Useful?
Automata are a very good way of modeling finite-state systems which change state due to input. Examples: text editors, compilers, UNIX tools like grep communications protocols (e.g. TCP) game states digital hardware components e.g. adders, RAM robots very different applications

11 2. Representing Automata
Automata have a mathematical basis which allows them to be analysed, e.g.: prove that they accept correct input prove that they do not accept incorrect input Automata can be manipulated to simplify them, and they can be automatically converted into code.

12 2.1. A Mathematical Coding We can represent an automaton in terms of sets and mathematical functions. The ‘even-odd’ automaton is: startSet = { evenA } acceptSet = { oddA } nextState(evenA, b) => evenA nextState(evenA, a) => oddA nextState(oddA, b) => oddA nextState(oddA, a) => evenA continued

13 Analysis of the mathematical form can show that the ‘even-odd’ automaton only accepts strings which:
contain an odd number of ‘a’s e.g. babaa abb abaab aabba aaaaba …

14 2.2. Automaton in Code It is easy to (automatically) translate an automaton into code, but ... an automaton graph does not contain all the details needed for a program The main extra coding issues: what to do when we enter an accepting state? what to do when the input cannot be processed? e.g. abzz is entered

15 Encoding the ‘even-odd’ Automaton
enum state {evenA, oddA}; // possible states enum state currState = evenA; // start state int isAccepting = 0; // false int ch; while ((ch = getchar()) != EOF)) { currState = nextState(currState, ch); isAccepting = acceptable(currState); } if (isAccepting) printf(“accepted\n); else printf(“not accepted\n”); accepting state only used at end of input continued

16 simple handling of incorrect input continued
enum state nextState(enum state s, int ch) { if ((s == evenA) && (ch == ‘b’)) return evenA; if ((s == evenA) && (ch == ‘a’)) return oddA; if ((s == oddA) && (ch == ‘b’)) return oddA; if ((s == oddA) && (ch == ‘a’)) return evenA; printf(“Illegal Input”); exit(1); } simple handling of incorrect input continued

17 int acceptable(enum state s) { if (s == oddA) return 1; // oddA is an accepting state return 0; }

18 3. The ‘aeiou’ Automaton What English words contain the five vowels (a, e, i, o, u) in order? Some words that match: abstemious facetious sacrilegious

19 3.1. Automaton Graph L = all letters L - a L - e L - i L - o L - u
start a e i o u 1 2 3 4 5

20 3.2. Execution Sequence (1) f a c e t i o u s f a c e t i o u s 1
Input Move to State f a c e t i o u s f a c e t i o u s f a c e t i o u s 1 f a c e t i o u s 1 continued

21 f a c e t i o u s 2 f a c e t i o u s 2 f a c e t i o u s 3
Input Move to State f a c e t i o u s 2 f a c e t i o u s 2 f a c e t i o u s 3 f a c e t i o u s 4 the automaton can terminate here; no need to process more input f a c e t i o u s 5

22 Execution Sequence (2) a n d r e w a n d r e w 1 a n d r e w 1
Input Move to State a n d r e w a n d r e w 1 a n d r e w 1 a n d r e w 1 continued

23 Input Move to State a n d r e w 1 a n d r e w 2 a n d r e w 2, and end of input means failure

24 3.3. Translation to Code stop processing when the accepting
enum state {0, 1, 2, 3, 4, 5}; // poss. states enum state currState = 0; // start state int isAccepting = 0; // false int ch; while ((ch = getchar()) != EOF) && !isAccepting) { currState = nextState(currState, ch); isAccepting = acceptable(currState); } if (isAccepting) printf(“accepted\n); else printf(“not accepted\n”); stop processing when the accepting state is entered continued

25 enum state nextState(enum state s, int ch) { if (s == 0) { if (ch == ‘a’) return 1; else return 0; // input is L-a } if (s == 1) { if (ch == ‘e’) return 2; else return 1; // input is L-e } if (s == 2) { if (ch == ‘i’) return 3; else return 2; // input is L-i } : continued

26 simple handling of incorrect input
: if (s == 3) { if (ch == ‘o’) return 4; else return 3; // input is L-o } if (s == 4) { if (ch == ‘u’) return 5; else return 4; // input is L-u } printf(“Illegal Input”); exit(1); } // end of nextState() simple handling of incorrect input

27 int acceptable(enum state s) { if (s == 5) return 1; // 5 is an accepting state return 0; }

28 4. Generating Output One possible extension to the basic automaton idea is to allow output: when a transition is ‘triggered’ there can be optional output as well Automata which generate output are sometimes called Finite State Machines (FSMs).

29 4.1. ‘even-odd’ with Output
b b a/1 start evenA oddA a When the ‘a’ transition is triggered out of the evenA state, then a ‘1’ is output.

30 4.2. Mathematical Coding Add an ‘output’ mathematical function to the automaton representation: output( evenA, a ) => 1

31 4.3. Extending the C Coding The while loop for ‘even-odd’ will become:
: while ((ch = getchar()) != EOF)) { output(currState, ch); currState = nextState(currState, ch); isAccepting = acceptable(currState); } : continued

32 The output() C function:
void output(enum state s, int ch) { if ((s == evenA) && (ch == ‘a’)) putchar(‘1’); }

33 5. Deterministic and Nondeterministic Automata
w We have been writing deterministic automata so far: for an input read by a state there is at most one transition that can be fired state ‘s’ can process input ‘a’ and ‘w’, and fails for anything else

34 Nondeterministic Automata
V a x S T x U A nondeterministic (ND) automaton can have 2 or more transitions with the same label leaving a state. Problem: if state S sees input ‘x’, then which transition should it use?

35 5.1. The ‘man’ Automaton Accept all strings that contain “man”
this is hard to write as a deterministic automaton. The following has bugs: L - m WRONG start m a n 1 2 3 L - a L - n continued

36 The input string command will get stuck at state 0:
1 c o m m a n d the problem starts here

37 5.2. A ND Automaton Solution
start m a n 1 2 3 It is nondeterministic because an ‘m’ input in state 0 can be dealt with by two transitions: a transition back to state 0, or a transition to state 1 continued

38 Processing command input:
c o a n d m m 1 2 3 accepting state a n fail: reject the input 1 m

39 5.3. Executing a ND Automata
It is difficult to code ND automata in conventional languages, such as C. Two different coding approaches: 1. When an input arrives, execute all transitions in parallel. See which succeeds. 2. When an input arrives, try one transition. If it leads to failure then backtrack and try another transition.

40 5.4. Why use ND Automata? With nondeterminism, some problems are easier to solve/model. Nondeterminism is common in some application areas, such as AI, graph search, and compilers. continued

41 It is possible to translate a ND automaton into a (larger, complex) deterministic one.
In mathematical terms, ND automata and determinstic automata are equivalent they can be used to model all the same problems

42 6. Regular Expressions (REs)
REs are an algebraic way of specifying how to recognise input ‘algebraic’ means that the recognition pattern is defined using RE operands and operators REs are equivalent to automata REs and automata can be used on all the same problems

43 6.1. REs in grep grep searches input lines, a line at a time.
If the line contains a string that matches grep's RE (pattern), then the line is output. input lines (e.g. from a file) output matching lines (e.g. to a file) grep "RE" hello andy my name is andy my bye byhe continued

44 Examples grep "and" grep –E "an|my" "|" means "or" continued
hello andy my name is andy my bye byhe hello andy my name is andy grep –E "an|my" hello andy my name is andy my bye byhe hello andy my name is andy my bye byhe "|" means "or" continued

45 grep "hel*" "*" means "0 or more" hello andy my name is andy
my bye byhe hello andy my bye byhe "*" means "0 or more"

46 6.2. Why use REs? They are very useful for expressing patterns that recognise textual input. For example, REs are used in: editors compilers web-based search engines communication protocols

47 6.3. The RE Language The RE language is an algebraic way of specifying how to recognise input ‘algebraic’ means that the recognition pattern is defined using RE operands and operators

48 RE Operands There are 4 basic kinds of operands:
characters (e.g. ‘a’, ‘1’, ‘(‘) the symbol e (means an empty string ‘’) the symbol {} (means the empty set) variables, which can be assigned a RE variable = RE

49 RE Operators There are three basic operators: union ‘|’ concatenation
closure *

50 Concatenation S T What a string is matched by a RE "abc"
this RE will use the S RE followed by the T RE to match against strings What a string is matched by a RE "abc" it is equivalent to: 'a' followed by 'b' followed by 'c'

51 6.4. REs for C Identifiers We define two RE variables, letter and digit: letter = A | B | C | D ... Z | a | b | c | d .... z digit = 0 | 1 | 2 | ... 9 ident is defined using letter and digit: ident = letter ( letter | digit )* continued

52 Strings matched by ident include:
ab345 w h5g Strings not matched: 2 $abc ****

53 7. UNIX Regular Expressions
Different UNIX tools use slightly different extensions of the basic RE language vi, awk, sed, grep, egrep, etc. Extra operands include: character classes line start ‘^’ and end ‘$’ symbols the wild card symbol ‘.’ additional operators, R? and R+

54 7.1. Character Classes The character class [a1 a2 ... an] stands for a1 | a2 | ... | an a1- an stands for the set of characters between a1 and an e.g. [A-Z] [a-z0-9]

55 7.2. Line Start and End The ‘^’ matches the beginning of the line, ‘$’ matches the end e.g grep ‘^andr’ /usr/share/dict/words grep '^[washingto]*$' /usr/share/dict/words

56 /usr/share/dict/words
Example as a Diagram grep "^andr" A A's AOL AOL's : androgen androgen's androgynous android android's androids /usr/share/dict/words

57 7.3. Wild Card Symbol The ‘.’ stands for any character except the newline e.g. grep ‘^a..b.$’ chapter1.txt grep ‘t.*t.*t’ manual

58 /usr/share/dict/words
grep "^a..b.$" A A's AOL AOL's : adobe alibi ameba /usr/share/dict/words

59 7.4. R? and R+ R? stands for e | R (0 or 1 R)
R+ stands for R | RR | RRR | ... which can also be written as R R* one or more occurrences of R

60 8. From REs to Automata e-NFA  ND automaton
The translation uses a special kind of ND automata which uses e-transitions. Automata of this type are sometimes called e-NFAs. The translation steps are: RE  e-NFA e-NFA  ND automaton ND automaton  deterministic automaton deterministic automaton  code

61 e-NFAs A e-NFA allows a transition to use a e label.
A transition using an e label can be triggered without having to match any input.

62 e-NFA Example a*b | b*a is accepted by the following e-NFA: b a 2 3 e
start nondeterminism occurs here 6 1 e e 4 5 b Example input: "bbba" a

63 9. More Information Johnsonbaugh, R Discrete Mathematics, Prentice Hall, chapter 10. Discrete Mathematics and its Applications Kenneth H. Rosen McGraw Hill, 2007, 7th edition chapter 13, sections 13.2 – 13.3


Download ppt "12. Automata and Regular Expressions"

Similar presentations


Ads by Google