Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © 2008-2014 – Curt Hill.

Similar presentations


Presentation on theme: "Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © 2008-2014 – Curt Hill."— Presentation transcript:

1 Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © 2008-2014 – Curt Hill

2 Introduction Kleene showed that a Finite State Automaton can recognize a class of languages This is Kleene’s Theorem This set may be built up using only the following: The empty set  The empty string All single characters from the alphabet Union Concatenation Kleene closure –Three operations, three starting points

3 Regular Sets A regular set is any set that can be constructed using the three starting points and three operations just given Thus every regular set is the language accepted by a regular grammar (type 3) and a FSA Another way to specify these regular sets is by using regular expressions Copyright © 2008-2014 – Curt Hill

4 Regular Expressions There are two common understandings of regular expressions –These two are fundamentally related but have different purposes A means of specifying a set of strings –This will be the principle meaning for this class A means of specifying a string to be searched for within a document –Much more common Copyright © 2008-2014 – Curt Hill

5 Set of Strings In the text are the : Concatenation –Merely the writing of two items next to each other Union –Symbol:  signifying that either of two sets may be used Kleene Closure –Symbol: * signifying that zero or more copies may be concatenated together Parentheses for grouping Copyright © 2008-2014 – Curt Hill

6 Examples An alphabet contains a, b, c The string aac is the concatenation of three letters The string a(b  c) represents two strings ab and ac The string a(b)* represents every string starting with an a and followed by zero or more cs a(a  b  c)*c represents all the strings that start with a end with c (a  b  c)* is the set of all strings Copyright © 2008-2014 – Curt Hill

7 Search Strings Fundamentally the same but modified to the task at hand –Mathematics is not concerned with beginning and end of lines, special characters or characters not on a keyboard The  is replaced by the | Concatenation and Kleene Closure is similar Many special characters Copyright © 2008-2014 – Curt Hill

8 Specials The special characters include –[ ]\^|*$.?+(){} Any other character just matches itself Since many of these characters are valuable in strings the escape is used to match them Most of these are for the special requirements of finding an element of this set in a much larger piece of text or a document Copyright © 2008-2014 – Curt Hill

9 Escape The backslash character is the escape Thus to look for an asterisk (a special) in a string it must be escaped: \* –This allows a search to find the asterisk The C family uses some of the same escape sequences: –\n newline or linefeed –\t tab –\r carriage return Copyright © 2008-2014 – Curt Hill

10 Positioning There are two specials that force a position ^ matches the beginning of the line $ matches the end of the line Both of these match a position rather than a character Without these a pattern could match anywhere within a string Copyright © 2008-2014 – Curt Hill

11 Repetition There are three repetition characters which are more general Closure is the * –It represents zero or more repetitions of the previous item –Kleene star The + represents one or more repetitions of the previous item The ? represent zero or one occurrences of the previous item Copyright © 2008-2014 – Curt Hill

12 Examples ~* matches any number (including zero) of successive tildes \-* matches zero or more dashes.+ matches one or more of any character hats? matches either hat or hats Copyright © 2008-2014 – Curt Hill

13 Grouping The repetitions could only be applied to a single character What is next needed is some type of grouping This is provided by the parenthesis Enclosing a pattern in parenthesis makes it a group This group can then be followed by a repetition character Copyright © 2008-2014 – Curt Hill

14 Examples (\*\-)* will match –*- –*-*- –*-*-*- etc The * is greedy – it will try to match as many of these as is possible Copyright © 2008-2014 – Curt Hill

15 More interesting patterns A number is pretty easy to understand from our perspective but not so easy to describe –Except in regular expressions An integer is a string of digits –Possibly preceded by a plus or minus So how is this done? With sets and repetition Copyright © 2008-2014 – Curt Hill

16 A set A pair of brackets may be filled with character This will match any one of them Thus the digits could be done with: [0123456789] An integer could then be: [-+]? [0123456789]+ Any single vowel is: [aeiouAEIOU] Copyright © 2008-2014 – Curt Hill

17 Alternation A set provides intuitive alternation The match process may choose any character within the set to use The alternation is only applied to number of single characters There is also an alternation character –The vertical bar | This allows either simple or complicated patterns to alternate Copyright © 2008-2014 – Curt Hill

18 Alternation Thus: A|E|I|O|U is equivalent to [AEIOU] However, more interesting alternations are possible and useful –(abc)|(123) will match either of the two strings –([-+]?\d)+|(\w+) will match any string of characters that looks like a number or word Copyright © 2008-2014 – Curt Hill

19 Audience Participation Suppose the following expression: ^ab(cde)*f$ Which of the following lines match this? abf abcdecdef abcdeaf abcdecdecdecdef acdef abcdefa Copyright © 2008-2014 – Curt Hill

20 Limitations What kind of sets are not regular? Consider the following language: 0 n 1 n –The number of zeros and one are the same We know that 0 m 1 n is regular, why is 0 n 1 n not? Copyright © 2008-2014 – Curt Hill

21 We Really Do Know Copyright © 2008-2014 – Curt Hill s0 s1 0 1 1 This accepts 0 m 1 n and is clearly a FSA Why is 0 n 1 n harder? Counter-intuitive since 0 n 1 n is a subset of 0 m 1 n Shouldn’t it be harder to generate a full set than a subset?

22 Memory An FSA determines its next state only based on input and current state Since it has no memory, it cannot remember how many zeros we processed so that we can process that many ones Next we consider those machines stronger than these Copyright © 2008-2014 – Curt Hill

23 Exercises 13.4 –3, 5, 15 Copyright © 2008-2014 – Curt Hill


Download ppt "Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © 2008-2014 – Curt Hill."

Similar presentations


Ads by Google