June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
CS5371 Theory of Computation
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Rosen 5th ed., ch. 11 Ref: Wikipedia
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.
Compiler Construction Lexical Analysis. The word lexical means textual or verbal or literal. The lexical analysis implemented in the “SCANNER” module.
Lexical Analysis Constructing a Scanner from Regular Expressions.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Theory of Computation Automata Theory Dr. Ayman Srour.
Deterministic Finite Automata Nondeterministic Finite Automata.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University.
Lecture 2 Lexical Analysis
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical analysis Finite Automata
Non Deterministic Automata
Two issues in lexical analysis
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Chapter 2 FINITE AUTOMATA.
Department of Software & Media Technology
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
Animated Conversion of Regular Expressions to C Code
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Lecture 5 Scanning.
Non Deterministic Automata
Presentation transcript:

June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2

June 13, 2016 Prof. Abdelaziz Khamis 2 Chapter 2 – Part 2: Topics Implementation of A TINY Scanner Nondeterministic Finite Automaton (NFA) From Regular Expression to DFA  From Regular Expression to NFA  From NFA to DFA  Minimizing DFA

June 13, 2016 Prof. Abdelaziz Khamis 3 Implementation of A TINY Scanner TINY Tokens

June 13, 2016 Prof. Abdelaziz Khamis 4 Implementation of A TINY Scanner (Continued) DFA of the TINY Scanner

June 13, 2016 Prof. Abdelaziz Khamis 5 Implementation of A TINY Scanner (Continued) What happened to reserved words?  Recognize them as identifiers first  Then look them up in a table of reserved words  Linear search (TINY) is bad. Binary search (ordered list) is better. Hash table is even better. Hash table with size 1 buckets (perfect hash function) is best.

June 13, 2016 Prof. Abdelaziz Khamis 6 Implementation of A TINY Scanner (Continued) The code that implements the TINY DFA is contained in the scan.h and scan.c files. (see Appendix B, lines: ) The implementation uses the doubly nested case analysis. The principle function of the TINY scanner is getToken (lines ), which returns the next token (recognized according to the TINY DFA) in source file. The tokens are defined as an enumerated type in globals.h ( lines ) The states of the scanner are also defined as an enumerated type, but within the scanner itself (lines ).

June 13, 2016 Prof. Abdelaziz Khamis 7 Implementation of A TINY Scanner (Continued) The only attribute of each token that is computed by the TINY scanner is the lexeme, or string value of the token recognized, and this is placed in tokenString. The scanner makes use of three global variables which are declared in globals.h, and allocated and initialized in main.c : the file variables source and listing, and the integer variable lineno. The procedure reservedLookup ( lines ) performs a lookup of reserved words ( lines ) after an identifier is recognized by the principle loop of the getToken procedure.

June 13, 2016 Prof. Abdelaziz Khamis 8 Implementation of A TINY Scanner (Continued) A flag variable save is used to indicate whether a character is to be added to tokenString ; this is necessary, since white space, comments, and non- consumed lookaheads should not be included. Character input to the scanner is provided by the getNextChar function (lines ), which fetches characters from lineBuf, a 256-character buffer internal to the scanner. The recognition of numbers and identifiers in TINY requires that the transitions to the final state from the states INNUM and INID be non-consuming. This is implemented by the ungetNextChar procedure (lines )

June 13, 2016 Prof. Abdelaziz Khamis 9 Nondeterministic Finite Automaton (NFA) In a typical programming language there are many tokens, and each token will be recognized by its own DFA.  If each of these tokens begins with a different character, then it is easy to tie them together by uniting all of the their start states into a single start state.  If some tokens begin with the same character, such as +, +=, and ++, then we cannot simply draw the following diagram, since it is not a DFA.

June 13, 2016 Prof. Abdelaziz Khamis 10 Nondeterministic Finite Automaton (NFA) Instead, we must arrange it so that there is a unique transition to be made in each state, such as in the following diagram In principle, we should be able to combine all the tokens into one giant DFA. But how to turn token descriptions, given as regular expressions, into such a DFA?

June 13, 2016 Prof. Abdelaziz Khamis 11 Nondeterministic Finite Automaton (NFA) The simplest algorithm for translating a regular expression into a DFA proceeds via an intermediate construction, in which an NFA is derived from the regular expression, and then the NFA is used to construct an equivalent DFA. Deterministic vs. Non-Deterministic  Deterministic: exactly one new state for a given (state, character) pair  Non-deterministic: 0 or more new states for each (state, character) pair Add  -transitions (move without character) Example 2.10, Page 58.

June 13, 2016 Prof. Abdelaziz Khamis 12 From Regular Expression to NFA The construction of an NFA follows the structure of a regular expression. The  -transitions are used to “glue together” the machines of each piece of a regular expression to form a machine that corresponds to the whole expression.  Basic regular expressions (Page 64)  Concatenation (Page 65)  Choice among alternatives (Page 65)  Repetition (Page 66) Examples: 2.12 and 2.13, page 67.

June 13, 2016 Prof. Abdelaziz Khamis 13 From NFA to DFA To convert an NFA (M) to its equivalent DFA (M’), we need to:  Eliminate  -transitions This involves the construction of  -closures The  -closure of a single state and a set of states (Page 69)  Eliminate multiple transitions on a single input character A conversion algorithm: (Subset construction) Page 70.  Compute the  -closure of the start state of M. This becomes the start state of M ’.  For this set, and for each subsequent set, S, compute transitions on input characters a as follows: Compute the set S ’ a = { t | for some s in S there is a transition from s to t on a }. Then compute S ” a, the  -closure of S ’ a, this defines a new state in M ’, together with a new transition S a S ” a.  Continue with this process until no new states or transitions are created.  Mark as accepting those states that contain an accepting state of M. Examples: 2.15, 2.16, and 2.17 (Pages: 70-71)

June 13, 2016 Prof. Abdelaziz Khamis 14 Minimizing DFA An algorithm to minimize the number of states in a DFA:  Create two states, one consisting of all the accepting states and the other consisting of all the non-accepting states.  Consider the transitions on each character a of the alphabet: If all accepting states have transitions on a to accepting states, then this defines an a-transition from the new accepting state to itself. If all accepting states have transitions on a to non-accepting states, then this defines an a-transition from the new accepting state to the new non-accepting state. If there are two accepting states s and t that have transitions on a that land in different sets, then the set of all accepting states must be split according to where their a-transitions land. If any further sets are split, we must return and repeat the process from the beginning. Continue this process of refining the partitions of states until no further splitting of sets occurs. Examples: 2.18 and 2.19 (Page 74)