1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

4b Lexical analysis Finite Automata
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
Finite Automata CPSC 388 Ellen Walker Hiram College.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Compiler Construction
©2004 Brooks/Cole FIGURES FOR CHAPTER 2 SCANNING Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter in the book.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Compiler Construction
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
CS-5800 Theory of Computation II PROJECT PRESENTATION By Quincy Campbell & Sandeep Ravikanti.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis Constructing a Scanner from Regular Expressions.
2. Scanning College of Information and Communications Prof. Heejin Park.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lexical analysis Finite Automata
Non Deterministic Automata
Two issues in lexical analysis
Recognizer for a Language
Finite Automata & Regular Languages
Chapter 2 FINITE AUTOMATA.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non-Deterministic Finite Automata
COSC 3340: Introduction to Theory of Computation
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non Deterministic Automata
Lecture 4: Lexical Analysis II: From REs to DFAs
Animated Conversion of Regular Expressions to C Code
DFA Equivalence & Minimization
Finite Automata.
4b Lexical analysis Finite Automata
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Compiler Construction
Lecture 5 Scanning.
Lexical Analysis Uses formalism of Regular Languages
Presentation transcript:

1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's construction) –Convert that to a deterministic one (Subset construction) –Minimize the DFA (Hopcroft's algorithm) –Implement it Existing scanner generator: flex

2 The scanning process: step 1 Let's build a mini-scanner that recognizes exactly those strings of as and bs that end in ab Step 1: Come up with a Regular Expression (a|b)*ab

3 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression We want to be able to automate the process Thompson's construction gives a systematic way to create an NFA from a RE. It builds the NFA in a bottom-up manner. At any time during construction –there is only one final state –no transitions leave the final state –components are linked together using  -productions.

4 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression a b a b     a|b a b     (a|b)*   

5 The scanning process: step 2 Step 2: Use Thompson's construction to create an NFA for that expression a b     (a|b)*ab    ab 

6 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA Observation: –Two states q i, q k, linked together with an  - productions in the NFA should be the same state in the DFA because the machine goes from q i to q k without consuming input. The  -closure() function takes a state q and returns all the states that can be reached from q on  - productions only.

7 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA Observation: –If, on some input a, the NFA can go to any one of k states, then those k state should be represented by a single state in the DFA. The  () function takes as input a state q and a character x and returns all states that we can go to from q when reading a single x.

8 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA –The start state Q o of the DFA is the  -closure of the start state q 0 of the NFA –Compute  -closure(  (Q 0, x)) for each valid input character x. This will generate new states. –Systematically compute  -closure(  (Q i, x)) until no new states can be created. –The final states of the DFA are those that contain final states of the NFA.

9 The scanning process: step 3 Step 3: Use subset construction to convert the NFA to a DFA 35 a 46 b 81     27    910 a 1112 b   -closure(1) = {1, 2, 3, 4, 8, 9}

10 The scanning process: step 3 35 a 46 b 81     27    910 a 1112 b  Q 0 = {1,2,3,4,8,9}  (Q 0, a) = {5,7,8,9,2,3,4,10,11} = Q 1  (Q 0, b) = {6,7,8,9,2,3,4} = Q 2  (Q 1, a) = Q 1  (Q 1, b) = {6,7,8,9,2,3,4,12} = Q 3  (Q 2, a) = Q 1  (Q 2, b) = Q 2  (Q 3, a) = Q 1  (Q 3, b) = Q 2

11 The scanning process: step 3 35 a 46 b 81     27    910 a 1112 b  a b a 3 b a b b a

The scanning process: step 4 Step 4: Use Hopcroft's algorithm to minimize the DFA a b a 3 b a b b a  (Q 0, a) = Q 1  (Q 0, b) = Q 2  (Q 2, a) = Q 1  (Q 2, b) = Q 2 States Q0 and Q2 behave the same way, so they can be merged. Note that even though Q3 also behaves the same way, it cannot be merged with Q0 or Q2 because Q3 is a final state while Q0 and Q2 are not. 0 1 a a 3 b b b a

13 In practice flex is a scanner generator that takes a RE specification and follows the described process to generate a DFA. The user additionally specifies –actions to be performed whenever a valid string has been recognized e.g. insert identifier in symbol table –error messages to be generated when the input string is invalid.

14 In practice Errors that are typically detected during scanning include –Unterminated strings –Unterminated comments –Invalid characters