LECTURE NOTES On FINITE AUTOMATA.

Slides:



Advertisements
Similar presentations
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Introduction to Computability Theory
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Great Theoretical Ideas in Computer Science.
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS490 Presentation: Automata & Language Theory Thong Lam Ran Shi.
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
Theory of Languages and Automata
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Grammars CPSC 5135.
Languages Given an alphabet , we can make a word or string by concatenating the letters of . Concatenation of “x” and “y” is “xy” Typical example: 
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions CIS 361. Need finite descriptions of infinite sets of strings. Discover and specify “regularity”. The set of languages over a finite.
CHAPTER 1 Regular Languages
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Finite Automata A simple model of computation. 2 Finite Automata2 Outline Deterministic finite automata (DFA) –How a DFA works.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Language Recognition MSU CSE 260.
Pushdown Automata.
CS510 Compiler Lecture 2.
Lecture 1 Theory of Automata
CIS Automata and Formal Languages – Pei Wang
Natural Language Processing - Formal Language -
CSE 105 theory of computation
Pushdown Automata.
Pushdown Automata.
CSE 105 theory of computation
Recognizer for a Language
Jaya Krishna, M.Tech, Assistant Professor
Chapter 2 FINITE AUTOMATA.
Hierarchy of languages
Lecture3 DFA vs. NFA, properties of RL
CS 154, Lecture 3: DFANFA, Regular Expressions.
Non-Deterministic Finite Automata
Intro to Data Structures
Finite Automata Reading: Chapter 2.
CSE 2001: Introduction to Theory of Computation Fall 2009
Chapter Five: Nondeterministic Finite Automata
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
CS21 Decidability and Tractability
Chapter 1 Introduction to the Theory of Computation
Chapter 1 Regular Language
CSCI 2670 Introduction to Theory of Computing
Languages Given an alphabet , we can make a word or string by concatenating the letters of . Concatenation of “x” and “y” is “xy” Typical example: ={0,1},
CHAPTER 1 Regular Languages
CSE 105 theory of computation
Presentation transcript:

LECTURE NOTES On FINITE AUTOMATA

Mathematical Preliminaries Symbol A character, glyph, mark. An abstract entity that has no meaning by itself, often called uninterpreted. Letters from various alphabets, digits and special characters are the most commonly used symbols. Alphabet A finite set of symbols. An alphabet is often denoted by sigma , yet can be given any name. B = {0, 1} Says B is an alphabet of two symbols, 0 and 1. C = {a, b, c} Says C is an alphabet of three symbols, a, b and c. Sometimes space and comma are in an alphabet while other times they are meta symbols used for descriptions.

String / Word A finite sequence of symbols from an alphabet. 01110 and 111 are strings from the alphabet B above. aaabccc and b are strings from the alphabet C above. A null string is a string with no symbols, usually denoted by epsilon . The null string has length zero. The null string is usually denoted epsilon . Vertical bars around a string indicate the length of a string expressed as a natural number. For example |00100| = 5, |aab| = 3, | epsilon | = 0 Power of an alphabet n Represented as 0 = { }, 2 = { 00, 01, 10, 11 } for  = { 0, 1 } Concatenation of Strings denoted as x . y

Formal Language, also called a Language A set of strings from an alphabet. The set may be empty, finite or infinite. There are many ways to define a language and there are many classifications for languages. Because a language is a set of strings, the words language and set are often used interchangeably in talking about formal languages. L(M) is the notation for a language defined by a machine M. The machine M accepts a certain set of strings, thus a language. L(G) is the notation for a language defined by a grammar G. The grammar G recognizes a certain set of strings, thus a language. M(L) is the notation for a machine that accepts a language.

Formal Language, also called a Language(Contd.,) The language L is a certain set of strings. G(L) is the notation for a grammar that recognizes a language. The union of two languages is a language. L = L1  L2 The intersection of two languages is a language. L = L1  L2 The complement of a language is a language. L = * - L1 The difference of two languages is a language. L = L1 - L2 Thus a set of strings all of which are chosen from some *, where  is a particular alphabet is called a language If  is an alphabet and L  *, then L is a language over . Example: for  = { a, b } { a, aba, aa } is a finite language ------ is an infinite language

Formal Language, also called a Language(Contd.,) Typical examples: L = { x | x is a bit string with two zeros } L = { anbn | n  N } L = {1n | n is prime}

For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3, Formal Language, also called a Language(Contd.,) Do not confuse the concatenation of languages with the Cartesian product of sets. For example, let A = {0,00} then A•A = { 00, 000, 0000 } with |A•A|=3, AA = { (0,0), (0,00), (00,0), (00,00) } with |AA|=4

M Let L be a language  S a machine M recognizes L if Recognizing Languages Let L be a language  S a machine M recognizes L if “accept” if and only if xL xS M “reject” if and only if xL

Language Definition Problem How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures

Specifying Formal Languages Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another

Specifying Lexical Structure Using Regular Expressions Have some alphabet  = set of letters Regular expressions are built from:  - empty string Any letter from alphabet  r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice  | r | rr | … Parentheses to indicate grouping/precedence

Concept of Regular Expression Generating a String Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2  r1 2) r1| r2  r2 3) r* rr* 4) r*  

Nondeterminism in Generation Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1

Concept of Language Generated by Regular Expressions Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token

Examples of Languages and Regular Expressions  = { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros  = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers

Language Definition Problem How to precisely define language Layered structure of language definition Start with a set of letters in language Lexical structure - identifies “words” in language (each word is a sequence of letters) Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) Semantics - meaning of program (specifies what result should be for each input) lexical and syntactic structures

Specifying Formal Languages Huge Triumph of Computer Science Beautiful Theoretical Results Practical Techniques and Applications Two Dual Notions Generative approach (grammar or regular expression) Recognition approach (automaton) Lots of theorems about converting one approach automatically to another

Specifying Lexical Structure Using Regular Expressions Have some alphabet  = set of letters Regular expressions are built from:  - empty string Any letter from alphabet  r1r2 – regular expression r1 followed by r2 (sequence) r1| r2 – either regular expression r1 or r2 (choice) r* - iterated sequence and choice  | r | rr | … Parentheses to indicate grouping/precedence

Concept of Regular Expression Generating a String Rewrite regular expression until have only a sequence of letters (string) left Example (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 General Rules 1) r1| r2  r1 2) r1| r2  r2 3) r* rr* 4) r*  

Nondeterminism in Generation Rewriting is similar to equational reasoning But different rule applications may yield different final results Example 1 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 1(0|1)*.(0|1)* 1.(0|1)* 1.(0|1)(0|1)* 1.(0|1) 1.0 Example 2 (0 | 1)*.(0|1)* (0 | 1)(0 | 1)*.(0|1)* 0(0|1)*.(0|1)* 0.(0|1)* 0.(0|1)(0|1)* 0.(0|1) 0.1

Concept of Language Generated by Regular Expressions Set of all strings generated by a regular expression is language of regular expression In general, language may be (countably) infinite String in language is often called a token

Examples of Languages and Regular Expressions  = { 0, 1, . } (0 | 1)*.(0|1)* - Binary floating point numbers (00)* - even-length all-zero strings 1*(01*01*)* - strings with even number of zeros  = { a,b,c, 0, 1, 2 } (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers (0|1|2)* - trinary numbers

Alternate Abstraction Finite-State Automata Alphabet  Set of states with initial and accept states Transitions between states, labeled with letters (0 | 1)*.(0|1)* . Start state 1 1 Accept state

Automaton Accepting String Conceptually, run string through automaton Have current state and current letter in string Start with start state and first letter in string At each step, match current letter against a transition whose label is same as letter Continue until reach end of string or match fails If end in accept state, automaton accepts string Language of automaton is set of strings it accepts

. Example 11.0 Current state Start state 1 1 Accept state Accept state 11.0 Current letter

. Example 11.0 Current state Start state 1 1 Accept state Accept state 11.0 Current letter

. Example 11.0 Current state Start state 1 1 Accept state Accept state 11.0 Current letter

. Example 11.0 Current state Start state 1 1 Accept state Accept state 11.0 Current letter

. Example 11.0 Current state Start state 1 1 Accept state Accept state 11.0 Current letter

. Example 11.0 String is accepted! Current state Start state 1 1 Accept state 11.0 String is accepted! Current letter

Finite Automaton The most simple machine that is not just a finite list of words. “Read once”, “no write” procedure. Typical is its limited memory. Think cell-phone, elevator door, etc.

A Simple Automaton (0) transition rules states starting state q1 q2 q3 1 0,1 starting state accepting state

A Simple Automaton (1) start accept on input “0110”, the machine goes: q1 q2 q3 1 0,1 start accept on input “0110”, the machine goes: q1  q1  q2  q2  q3 = “reject”

A Simple Automaton (2) on input “101”, the machine goes: q1 q2 q3 1 0,1 on input “101”, the machine goes: q1  q2  q3  q2 = “accept”

A Simple Automaton (3) 010: reject 11: accept 010100100100100: accept q1 q2 q3 1 0,1 010: reject 11: accept 010100100100100: accept 010000010010: reject : reject

Finite Automaton (def.) A deterministic finite automaton (DFA) M is defined by a 5-tuple M=(Q,,,q0,F) Q: finite set of states : finite alphabet : transition function :QQ q0Q: start state FQ: set of accepting states

M = (Q,,,q,F) q1 q2 q3 1 0,1 states Q = {q1,q2,q3} 0,1 states Q = {q1,q2,q3} alphabet  = {0,1} start state q1 accept states F={q2} transition function :

Generative Versus Recognition Regular expressions give you a way to generate all strings in language Automata give you a way to recognize if a specific string is in language Philosophically very different Theoretically equivalent (for regular expressions and automata) Standard approach Use regular expressions when define language Translated automatically into automata for implementation

From Regular Expressions to Automata Construction by structural induction Given an arbitrary regular expression r, Assume we can convert r to an automaton with One start state One accept state Show how to convert all constructors to deliver an automaton with

Basic Constructs -Thompson’s construction Start state Accept state   a a

Sequence    r1r2 r1 r2 Old start state Start state Old accept state

Choice  r1  r1|r2  r2  Old start state Start state Old accept state Accept state  r1  r1|r2  r2 

Kleene Star    r r*  Old start state Start state Old accept state

NFA vs. DFA DFA No  transitions At most one transition from each state for each letter NFA – neither restriction a a NOT OK OK b a

Conversions Our regular expression to automata conversion produces an NFA Would like to have a DFA to make recognition algorithm simpler Can convert from NFA to DFA (but DFA may be exponentially larger than NFA)

NFA to DFA Construction DFA has a state for each subset of states in NFA DFA start state corresponds to set of states reachable by following  transitions from NFA start state DFA state is an accept state if an NFA accept state is in its set of NFA states To compute the transition for a given DFA state D and letter a Set S to empty set Find the set N of D’s NFA states For all NFA states n in N Compute set of states N’ that the NFA may be in after matching a Set S to S union N’ If S is nonempty, there is a transition for a from D to the DFA state that has the set S of NFA states Otherwise, there is no transition for a from D

NFA to DFA Example for (a|b)*.(a|b)*  a   3 5  .  11 13      1 2 7 8 9 10 15 16   4 6  12 14    b b . a a 5,7,2,3,4,8 . 13,15,10,11,12,16 a a a a 1,2,3,4,8 9,10,11,12,16 b b . b b 6,7,2,3,4,8 14,15,10,11,12,16 b b

Regular Languages The language recognized by a finite automaton M is denoted by L(M). A regular language is a language for which there exists a recognizing finite automaton.

Two DFA Questions Given the description of a finite automaton M = (Q,,,q,F), what is the language L(M) that it recognizes? In general, what kind of languages can be recognized by finite automata? (What are the regular languages?)

Union of Two Languages Theorem 1.12: If A1 and A2 are regular languages, then so is A1  A2. (The regular languages are ‘closed’ under the union operation.) Proof idea: A1 and A2 are regular, hence there are two DFA M1 and M2, with A1=L(M1) and A2=L(M2). Out of these two DFA, we will make a third automaton M3 such that L(M3) = A1  A2.

Proof Union-Theorem (1) M1=(Q1,,1,q1,F1) and M2=(Q2,,2,q2,F2) Define M3 = (Q3,,3,q3,F3) by: Q3 = Q1Q2 = {(r1,r2) | r1Q1 and r2Q2} 3((r1,r2),a) = (1(r1,a), 2(r2,a)) q3 = (q1,q2) F3 = {(r1,r2) | r1F1 or r2F2}

Proof Union-Theorem (2) The automaton M3 = (Q3,,3,q3,F3) runs M1 and M2 in ‘parallel’ on a string w. In the end, the final state (r1,r2) ‘knows’ if wL1 (via r1F1?) and if wL2 (via r2F2?) The accepting states F3 of M3 are such that wL(M3) if and only if wL1 or wL2, for: F3 = {(r1,r2) | r1F1 or r2F2}.

Concatenation of L1 and L2 Definition: L1• L2 = { xy | xL1 and yL2 } Example: {a,b} • {0,11} = {a0,a11,b0,b11} Theorem 1.13: If L1 and L2 are regular languages, then so is L1•L2. (The regular languages are ‘closed’ under concatenation.)

Proving Concatenation Thm. Consider the concatenation: {1,01,11,001,011,…} • {0,000,00000,…} (That is: the bit strings that end with a “1”, followed by an odd number of 0’s.) Problem is: given a string w, how does the automaton know where the L1 part stops and the L2 substring starts? We need an M with ‘lucky guesses’.

Nondeterminism Nondeterministic machines are capable of being lucky, no matter how small the probability. A nondeterministic finite automaton has transition rules/possibilities like q2 1 q1 q2  q1 q3 1

A Nondeterministic Automaton 0,1 0,1 1 0,  1 q1 q2 q3 q4 This automaton accepts “0110”, because there is a possible path that leads to an accepting state, namely: q1  q1  q2  q3  q4  q4

A Nondeterministic Automaton 0,1 0,1 1 0,  1 q1 q2 q3 q4 The string 1 gets rejected: on “1” the automaton can only reach: {q1,q2,q3}.

Nondeterminism ~ Parallelism For any (sub)string w, the nondeterministic automaton can be in a set of possible states. If the final set contains an accepting state, then the automaton accepts the string. “The automaton processes the input in a parallel fashion. Its computational path is no longer a line, but a tree.”

Nondeterministic FA (def.) A nondeterministic finite automaton (NFA) M is defined by a 5-tuple M=(Q,,,q0,F), with Q: finite set of states : finite alphabet : transition function :QP (Q) q0Q: start state FQ: set of accepting states

Nondeterministic :QP (Q) The function :QP (Q) is the crucial difference. It means: “When reading symbol “a” while in state q, one can go to one of the states in (q,a)Q.” The  in  = {} takes care of the empty string transitions.

Recognizing Languages (def) A nondeterministic FA M = (Q,,,q,F) accepts a string w = w1…wn if and only if we can rewrite w as y1…ym with yi and there is a sequence r0…rm of states in Q such that: 1) r0=q0 2) ri+1  (ri,yi+1) for all i=0,…,m–1 3) rm  F

Building a Lexical Analyzer • Token = Pattern • Pattern = Regular Expression • Regular Expression = NFA • NFA = DFA • DFA = Lexical Analyzer

Lexical Structure in Languages Each language typically has several categories of words. In a typical programming language: Keywords (if, while) Arithmetic Operations (+, -, *, /) Integer numbers (1, 2, 45, 67) Floating point numbers (1.0, .2, 3.337) Identifiers (abc, i, j, ab345) Typically have a lexical category for each keyword and/or each category Each lexical category defined by regexp

Lexical Categories Example If Keyword = if While Keyword = while Operator = +|-|*|/ Integer = [0-9] [0-9]* Float = [0-9]*. [0-9]* Identifier = [a-z]([a-z]|[0-9])* Note that [0-9] = (0|1|2|3|4|5|6|7|8|9) [a-z] = (a|b|c|…|y|z) Will use lexical categories in next level

Grammar and Automata Correspondence Regular Grammar Context-Free Grammar Context-Sensitive Grammar Automaton Finite-State Automaton Push-Down Automaton Turing Machine

Context-Free Grammars Grammar Characterization (T,NT,S,P) Finite set of Terminals T Finite set of Nonterminals NT Start Nonterminal S (goal symbol, start symbol) Finite set of Productions P: NT  (T | NT)* RHS of production can have any sequence of terminals or nonterminals

Push-Down Automata DFA Plus a Stack (S,A,V, F,s0,sF) Finite set of states S Finite Input Alphabet A, Stack Alphabet V Transition relation F : S (A U{})V  S  V* Start state s0 Final states sF Each configuration consists of a state, a stack, and remaining input string

CFG Versus PDA CFGs and PDAs are of equivalent power Grammar Implementation Mechanism: Translate CFG to PDA, then use PDA to parse input string Foundation for bottom-up parser generators

Context-Sensitive Grammars and Turing Machines Context-Sensitive Grammars Allow Productions to Use Context P: (T.NT)+  (T.NT)* Turing Machines Have Finite State Control Two-Way Tape Instead of A Stack

Exercises (1) Let A = {x,y,z} and B = {x,y}, answer: Is A a subset of B? Is B a subset of A? What is AB? What is AB? What is AB? What is P (Q)?

Exercises (2) Give NFAs with the specified number of states that recognize the following languages over the alphabet ={0,1}: { w | w ends with 00}, three states {0}; two states { w | w contains even number of 0s, or exactly two 1s}, six states {0n | nN }, one state

Exercises (3) Proof the following result: “If L1 and L2 are regular languages, then is a regular language too.” Describe the language that is recognized by this nondeterministic automaton: 1 0,1 1 0,  1 q1 q2 q3 q4 

Additional Practice Problems Write the formal descriptions of the sets containing … the numbers 1,10 and 100 … all integers greater than 5 … all natural numbers less than 5 … the string “aba” … the empty string … nothing Give a formal description of this NFA: Give DFA state diagrams for the following languages over ={0,1}: { w | w begins with 1 and ends with 0} { w | w does not contain substring 110} {} all strings except the empty string 1 0,  1 q1 q2 q3 