Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

CSE 311 Foundations of Computing I
4b Lexical analysis Finite Automata
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Lexical Analysis: DFA Minimization & Wrap Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Compiler Construction
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
From Cooper & Torczon1 Automating Scanner Construction RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Regular.
2. Lexical Analysis Prof. O. Nierstrasz
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Lexical Analysis — Part II From Regular Expression to Scanner Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Compiler Construction Finite-state automata. 2 Today’s Goals More on lexical analysis: cycle of construction RE → NFA NFA → DFA DFA → Minimal DFA DFA.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Lexical Analysis III : NFA to DFA DFA Minimization Lecture 5 CS 4318/5331 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper.
Lexical Analysis: DFA Minimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Theory of Computation Automata Theory Dr. Ayman Srour.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lexical analysis Finite Automata
Two issues in lexical analysis
Recognizer for a Language
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
COSC 3340: Introduction to Theory of Computation
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Lecture 4: Lexical Analysis II: From REs to DFAs
4b Lexical analysis Finite Automata
Automating Scanner Construction
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Lecture 5 Scanning.
Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st
Presentation transcript:

Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions

Quick Review Previous class:  The scanner is the first stage in the front end  Specifications can be expressed using regular expressions  Build tables and code from a DFA Scanner Generator specifications source codeparts of speech & words code and tables

Goal We will show how to construct a finite state automaton to recognize any RE This Lecture  Convert RE to an nondeterministic finite automaton (NFA)  Requires  -transitions to combine regular subexpressions  Convert an NFA to a deterministic finite automaton (DFA)  Use Subset construction Next Lecture  Minimize the number of states  Hopcroft state minimization algorithm  Generate the scanner code  Additional code can be inserted

More Regular Expressions All strings of 1s and 0s ending in a 1 ( 0 | 1 ) * 1 All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending order Cons  (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z) Cons * a Cons * e Cons * i Cons * o Cons * u Cons * All strings of 1s and 0s that do not contain three 0s in a row:

More Regular Expressions All strings of 1s and 0s ending in a 1 ( 0 | 1 ) * 1 All strings over lowercase letters where the vowels (a,e,i,o, & u) occur exactly once, in ascending order Cons  (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|y|z) Cons * a Cons * e Cons * i Cons * o Cons * u Cons * All strings of 1s and 0s that do not contain three 0s in a row: ( 1 * (  |01 | 001 ) 1 * ) * (  | 0 | 00 )

Non-deterministic Finite Automata Each RE corresponds to a deterministic finite automaton ( DFA ) May be hard to directly construct the right DFA What about an RE such as ( a | b ) * abb? This is a little different S 1 has two transitions on a This is a non-deterministic finite automaton ( NFA ) a, b S0S0 S3S3 S1S1 S2S2 abb

Non-deterministic Finite Automata Each RE corresponds to a deterministic finite automaton ( DFA ) May be hard to directly construct the right DFA What about an RE such as ( a | b ) * abb? This is a little different S 1 has two transitions on a S 0 has a transition on  This is a non-deterministic finite automaton ( NFA ) a, b S0S0 S1S1 S4S4 S2S2 S3S3  abb

Nondeterministic Finite Automata An NFA accepts a string x iff  a path though the transition graph from s 0 to a final state such that the edge labels spell x Transitions on  consume no input To “run” the NFA, start in s 0 and guess the right transition at each step  Always guess correctly  If some sequence of correct guesses accepts x then accept Why study NFA s? They are the key to automating the RE  DFA construction We can paste together NFA s with  -transitions NFA becomes an NFA 

Relationship between NFA s and DFA s DFA is a special case of an NFA DFA has no  transitions DFA ’s transition function is single-valued Same rules will work DFA can be simulated with an NFA  Obviously NFA can be simulated with a DFA ( less obvious ) Simulate sets of possible states Possible exponential blowup in the state space Still, one state per character in the input stream

Automating Scanner Construction To convert a specification into code: 1Write down the RE for the input language 2Build a big NFA 3Build the DFA that simulates the NFA 4Systematically shrink the DFA 5Turn it into code Scanner generators Lex and Flex work along these lines Algorithms are well-known and well-understood Key issue is interface to parser ( define all parts of speech ) You could build one in a weekend!

Automating Scanner Construction RE  NFA ( Thompson’s construction ) Build an NFA for each term Combine them with  -transitions NFA  DFA ( subset construction ) Build the simulation DFA  Minimal DFA Hopcroft’s algorithm DFA  RE (Not part of the scanner construction) All pairs, all paths problem Take the union of all paths from s 0 to an accepting state minimal DFA RENFADFA The Cycle of Constructions

RE  NFA using Thompson’s Construction Key idea NFA pattern for each symbol & each operator Join them with  transitions in precedence order S0S0 S1S1 a NFA for a S0S0 S1S1 a S3S3 S4S4 b NFA for ab  NFA for a | b S0S0 S1S1 S2S2 a S3S3 S4S4 b S5S5    S0S0 S1S1  S3S3 S4S4  NFA for a * a   Ken Thompson, C ACM, 1968

Example of Thompson’s Construction Let’s try a ( b | c ) * 1. a, b, & c 2. b | c 3. ( b | c ) * S0S0 S1S1 a S0S0 S1S1 b S0S0 S1S1 c S2S2 S3S3 b S4S4 S5S5 c S1S1 S6S6 S0S0 S7S7      S1S1 S2S2 b S3S3 S4S4 c S0S0 S5S5    

Example of Thompson’s Construction (cont'd) 4. a ( b | c ) * Of course, a human would design something simpler... S0S0 S1S1 a b | c But, we can automate production of the more complex one... S0S0 S1S1 a  S4S4 S5S5 b S6S6 S7S7 c S3S3 S8S8 S2S2 S9S9     

NFA  DFA with Subset Construction Need to build a simulation of the NFA Two key functions Delta(q i, a) is set of states reachable from each state in q i by a  Returns a set of states, for each n  q i of δ i (n,   -closure(s i ) is set of states reachable from s i by  transitions The algorithm: Start state derived from n 0 of the NFA Take its  -closure q 0 =  -closure(n 0 ) Compute Delta(q,  ) for each   , and take its  -closure Iterate until no more states are added Sounds more complex than it is…

NFA  DFA with Subset Construction The algorithm: q 0  -closure (n 0 ) Q  q 0 } WorkList  q 0 while ( WorkList ≠ ф ) remove q from WorkList for each    t   - closure ( Delta (q,  )) T[q,  ]  t if ( t  Q ) then add t to Q and WorkList Let’s think about why this works The algorithm halts: 1. Q contains no duplicates (test before adding) 2. 2 Q is finite 3. while loop adds to Q, but does not remove from Q (monotone)  the loop halts Q contains all the reachable NFA states It tries each character in each q. It builds every possible NFA configuration.  Q and T form the DFA

NFA  DFA with Subset Construction Example of a fixed-point computation Monotone construction of some finite set Halts when it stops adding to the set Proofs of halting & correctness are similar These computations arise in many contexts Other fixed-point computations Canonical construction of sets of LR(1) items  Quite similar to the subset construction Classic data-flow analysis (& Gaussian Elimination)  Solving sets of simultaneous set equations We will see many more fixed-point computations

q0q0 q1q1 a  q4q4 q5q5 b q6q6 q7q7 c q3q3 q8q8 q2q2 q9q9     Final states NFA  DFA with Subset Construction Applying the subset construction: a ( b | c )* : 

NFA  DFA with Subset Construction The DFA for a ( b | c ) * Ends up smaller than the NFA All transitions are deterministic Use same code skeleton as before s3s3 s2s2 s0s0 s1s1 c b a b c c b

Where are we? Why are we doing this? RE  NFA ( Thompson’s construction )  Build an NFA for each term Combine them with  -moves NFA  DFA ( subset construction )  Build the simulation DFA  Minimal DFA Hopcroft’s algorithm DFA  RE All pairs, all paths problem Union together paths from s 0 to a final state Enough theory for today minimal DFA RENFADFA The Cycle of Constructions