Lexical analysis Finite Automata

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Non-Deterministic Finite Automata
4b Lexical analysis Finite Automata
Lecture 6 Nondeterministic Finite Automata (NFA)
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions and DFAs COP 3402 (Summer 2014)
DFA Minimization Jeremy Mange CS 6800 Summer 2009.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
CS5371 Theory of Computation
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Rosen 5th ed., ch. 11 Ref: Wikipedia
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
1/29/02CSE460 - MSU1 Nondeterminism-NFA Section 4.1 of Martin Textbook CSE460 – Computability & Formal Language Theory Comp. Science & Engineering Michigan.
Theory of Computation Automata Theory Dr. Ayman Srour.
Theory of Computation Automata Theory Dr. Ayman Srour.
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Finite automate.
Non Deterministic Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Finite-State Machines (FSMs)
Chapter 2 Finite Automata
Two issues in lexical analysis
Recognizer for a Language
Chapter 2 FINITE AUTOMATA.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
NFAs and Transition Graphs
Finite Automata Reading: Chapter 2.
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
CSCI 2670 Introduction to Theory of Computing
Deterministic Finite Automaton (DFA)
4b Lexical analysis Finite Automata
CSC312 Automata Theory Transition Graphs Lecture # 9
Chapter 1 Regular Language
NFAs and Transition Graphs
Lecture 5 Scanning.
Lexical Analysis Uses formalism of Regular Languages
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Part Two : Nondeterministic Finite Automata
Presentation transcript:

Lexical analysis Finite Automata

Finite Automata (FA) FA also called Finite State Machine (FSM) Abstract model of a computing entity. Decides whether to accept or reject a string. Every regular expression can be represented as a FA and vice versa Two types of FAs: Non-deterministic (NFA): Has more than one alternative action for the same input symbol. Deterministic (DFA): Has at most one action for a given input symbol. Example: how do we write a program to recognize java keyword “int”? q0 q3 t q2 q1 i n

RE and Finite State Automaton (FA) Regular expression is a declarative way to describe the tokens It describes what is a token, but not how to recognize the token. FA is used to describe how the token is recognized FA is easy to be simulated by computer programs; There is a 1-1 correspondence between FA and regular expression Scanner generator (such as lex) bridges the gap between regular expression and FA. Scanner generator Finite automaton Regular expression scanner program String stream Tokens

Inside scanner generator RE NFA DFA Minimized DFA Program Thompson construction Subset construction DFA simulation Scanner generator Minimization Main components of scanner generation (e.g., Lex) Convert a regular expression to a non-deterministic finite automaton (NFA) Convert the NFA to a determinstic finite automaton (DFA) Improve the DFA to minimize the number of states Generate a program in C or some other language to “simulate” the DFA

Non-deterministic Finite Automata (FA) NFA (Non-deterministic Finite Automaton) is a 5-tuple (S, Σ, , S0, F): S: a set of states; : the symbols of the input alphabet;  : a set of transition functions; move(state, symbol)  a set of states S0: s0 S, the start state; F: F  S, a set of final or accepting states. Non-deterministic -- a state and symbol pair can be mapped to a set of states. Finite—the number of states is finite.

Transition Diagram FA can be represented using transition diagram. Corresponding to FA definition, a transition diagram has: States represented by circles; An Alphabet (Σ) represented by labels on edges; Transitions represented by labeled directed edges between states. The label is the input symbol; One Start State shown as having an arrow head; One or more Final State(s) represented by double circles. Example transition diagram to recognize (a|b)*abb q0 q3 b q2 q1 a

Simple examples of FA a a* a+ (a|b)* 1 a a 1 a a, b b start a start start a start a 1 start a b start a, b

Procedures of defining a DFA/NFA Defining input alphabet and initial state Draw the transition diagram Check Do all states have out-going arcs labeled with all the input symbols (DFA) Any missing final states? Any duplicate states? Can all strings in the language can be accepted? Are any strings not in the language accepted? Naming all the states Defining (S, , , q0, F)

Example of constructing a FA Construct a DFA that accepts a language L over the alphabet {0, 1} such that L is the set of all strings with any number of “0”s followed by any number of “1”s. Regular expression: 0*1*  = {0, 1} Draw initial state of the transition diagram Start

Example of constructing a FA Start 1 Draft the transition diagram Is “111” accepted? The leftmost state has missed an arc with input “1” Start 1

Example of constructing a FA Is “00” accepted? The leftmost two states are also final states First state from the left:  is also accepted Second state from the left: strings with “0”s only are also accepted Start 1

Example of constructing a FA The leftmost two states are duplicate their arcs point to the same states with the same symbols Start 1 Check that they are correct All strings in the language can be accepted , the empty string, is accepted strings with “0”s / “1”s only are accepted No strings not in language are accepted Naming all the states Start 1 q0 q1

How does a FA work NFA definition for (a|b)*abb q0 q3 b q2 q1 a NFA definition for (a|b)*abb S = {q0, q1, q2, q3 }  = { a, b } Transitions: move(q0,a)={q0, q1}, move(q0,b)={q0}, .... s0 = q0 F = { q3 } Transition diagram representation Non-determinism: exiting from one state there are multiple edges labeled with same symbol, or There are epsilon edges. How does FA work? Input: ababb move(0, a) = 1 move(1, b) = 2 move(2, a) = ? (undefined) REJECT ! move(0, a) = 0 move(0, b) = 0 move(0, a) = 1 move(1, b) = 2 move(2, b) = 3 ACCEPT !

FA for (a|b)*abb What does it mean that a string is accepted by a FA? q0 q3 b q2 q1 a What does it mean that a string is accepted by a FA? An FA accepts an input string x iff there is a path from the start state to a final state, such that the edge labels along this path spell out x; A path for “aabb”: Q0a q0a q1b q2b q3 Is “aab” acceptable? Q0a q0a q1b q2 Q0a q0a q0b q0 Final state must be reached; In general, there could be several paths. Is “aabbb” acceptable? Q0a q0a q1b q2b q3 Labels on the path must spell out the entire string.

Transition table (a|b)*abb A transition table is a good way to implement a FSA One row for each state, S One column for each symbol, A Entry in cell (S,A) gives the state or set of states can be reached from state S on input A. A Nondeterministic Finite Automaton (NFA) has at least one cell with more than one state. A Deterministic Finite Automaton (DFA) has a singe state in every cell (a|b)*abb STATES INPUT a b >Q0 {q0, q1} q0 Q1 q2 Q2 q3 *Q3 q0 q3 b q2 q1 a

DFA (Deterministic Finite Automaton) A special case of NFA where the transition function maps the pair (state, symbol) to one state. When represented by transition diagram, for each state S and symbol a, there is at most one edge labeled a leaving S; When represented transition table, each entry in the table is a single state. There are no ε-transition Example: DFA for (a|b)*abb STATES INPUT a b q0 q1 q2 q3 Recall the NFA:

DFA to program NFA is more concise, but not as easy to implement; In DFA, since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm. Every NFA can be converted to an equivalent DFA What does equivalent mean? There are general algorithms that can take a DFA and produce a “minimal DFA. Minimal in what sense? There are programs that take a regular expression and produce a program based on a minimal DFA to recognize strings defined by the RE. You can find out more in 451 (automata theory) and/or 431 (Compiler design) RE NFA DFA Minimized DFA Program Thompson construction Subset construction DFA simulation Scanner generator Minimization