College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.

Slides:



Advertisements
Similar presentations
Exercises for Chapter 2. Summary for the Quiz Some numbers –199  less than 100  55 Conclusion –There are students who did read textbook at least for.
Advertisements

4b Lexical analysis Finite Automata
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Finite Automata CPSC 388 Ellen Walker Hiram College.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
CPSC 388 – Compiler Design and Construction
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
1 Chapter 1 Automata: the Methods & the Madness Angkor Wat, Cambodia.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
By: Er. Sukhwinder kaur.  What is Automata Theory? What is Automata Theory?  Alphabet and Strings Alphabet and Strings  Empty String Empty String 
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Scanner Introduction to Compilers 1 Scanner.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Finite Automata Chapter 1. Automatic Door Example Top View.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
Deterministic Finite Automata Nondeterministic Finite Automata.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
General Discussion of “Properties” The Pumping Lemma Membership, Emptiness, Etc.
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
CS314 – Section 5 Recitation 2
Theory of Computation Lecture #
Scanner Scanner Introduction to Compilers.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Review: Compiler Phases:
COSC 3340: Introduction to Theory of Computation
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Compiler Construction
Chapter 1 Regular Language
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Presentation transcript:

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation Techniques Dr. Ying JIN Associate Professor Oct. 2007

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -2- Outline 2.1 Overview General Function of a Scanner Some Issues about Scanning 2.2 Finite Automata Definition and Implementation of DFA Non-Determinate Finite Automata Transforming NFA into DFA Minimizing DFA 2.3 Regular Expressions Definition of Regular Expressions Regular Definition From Regular Expression to DFA 2.4 Design and Implementation of a Scanner Developing a Scanner from DFA A Scanner Generator – Lex √ √

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques Regular Expressions Definition of Regular Expressions Regular Definition From Regular Expression to DFA

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -4- Definition of Regular Expressions (RE) - Some Concepts - Formal Definition of RE - Example - Properties of RE - Extensions to RE - Limitations of RE - Using RE to define Lexical Structure

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -5- Some Concepts alphabet( 字母表 ) : a non-empty finite set of symbols , which is denoted as  , one of its elements is called symbol. string( 符号串 ) : finite sequence of symbols, we use or  to represent empty string( 空串 ); 空串集 { } is different from empty set  。 length of a string( 符号串长度 ) : the number of symbols ina string, we use |  | to represent the length of the string  ; concatenate operator for strings( 符号串连接操作 ) : if  and  are strings , we use  as the concatenation of two strings, especially we have  =  =  ;

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -6- Some Operators on Set of Strings product of set of strings ( 符号串集的乘积 ) : if A and B are two sets of strings, AB is called the product of two sets of strings, AB={  |  A ,   B} especially  A=A  =A , where  represents empty set 。 power of set of strings( 符号串集合的方幂 ) : if A is a set of strings, A i is called ith power of A, where i is a non-negative integer( 非负整数 ) 。 A 0 ={ } A 1 = A, A 2 = A A A K = AA......A (k) positive closure( 符号串集合的正闭包 ) : A + =A 1  A 2  A star closure( 符号串集合的星闭包 ) : A * =A 0  A 1  A 2  A

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -7- Formal Definition For a given alphabet , a regular expression for  defines a set of strings of , If we use R  to represent a regular expression for , and L(R  ) to represent the set of strings that R  defines.

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -8- Formal Definition ■  is a regular expression , L(  )={ } ■ is a regular expression , L( )={ } ■ for any c , c is a regular expression, L(c)={c} ■ if A and B are regular expressions, following operators can be used – ( A ) , L( (A) )= L(A) –choice among alternatives A | B , L( A | B )=L(A)  L(B) –concatenation A B , L( A B )= L(A)L(B) –repetation A* , L( A*)= L(A)*

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -9- Example  ={ a,b }. RE 1.ab* 2. a(a|b)* Set of strings 1.{a,ab, abb, abbb, …… } 2. {a, aa, ab, aaa, aab, aba, abb, …… }

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -10- Comparing with DFA Equivalent in describing the set of strings; Can be conversed into each other; DFA is convenient for implementation; RE is convenient for defining and understanding; Both of them can be used to define the lexical structure of programming languages;

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -11- Properties A | B = B | A | 的可交换性 A | (B | C) =(A | B ) C| 的可结合性 A (B C) =(A B )C 连接的可结合性 A (B | C) =A B | A C 连接的可分配性 (A | B ) C =A C | B C 连接的可分配性 A** =A* 幂的等价性 A= A=A 是连接的恒等元素

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -12- Extensions Some extensions can be made to facilitate definition – A + –any symbol: “.” –range: [0-9] [a-z] [A-Z] –not in the range: ~(a|b|c) –optional: r?=( |r)

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -13- Limitations RE can not define such structure like –Pairing 配对, () –Nesting 嵌套, RE can not describe those structures that include finite number of repetitions for example : w c w, w is a string containing a and b; (a|b)* c (a|b)* can not be used, because it cannot guarantee that the strings on both sides of c are the same all the time;

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -14- Regular Definition

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -15- Definition It is inconvenient to define set of long strings with RE, so another formal notation is introduced, which is called “Formal Definition” ; The main idea is that naming some sub-expressions in RE; Example : ( 1|2|…|9)(0|1|2|…9) * NZ_digit= 1|2|…|9 digit = NZ_digit | 0 NZ_digit digit *

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -16- Defining Lexical Structure of ToyL letter = a|…|z|A|…|Z digit = 0|…|9 NZ-digit = 1|…|9 Reserved words: Reserved = var| if | then | else| while| read| write| intIdentifiers: letter digit* Constant: integer: int = NZ-digit digit* | 0 Other symbols: syms = +|-| *|/ | > | < | = | : | ; | ( | ) | { | } Lexical structure: lex = Reserved | identifier |int | syms

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -17- From RE to NFA

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -18- Rules ■  is a regular expression , L(  )={ } ■ is a regular expression , L( )={ } ■ for any c , c is a regular expression, L(c)={c} S0S0 S S0S0 S c

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -19- Rules ■ ( A ), L( (A) )= L(A), no change; ■ A B , L( A B )= L(A)L(B) NFA(A) NFA(B) 

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -20- Rules ■ ( A ), L( (A) )= L(A), no change; ■ A | B , L( A | B )=L(A)  L(B) NFA(A) NFA(B)    

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -21- Rules ■ A* , L( A*)= L(A)* NFA(A)   

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -22- Attention The rules introduced above are effective for those NFAs that have one start state and one terminal state; Any NFA can be extended to meet this requirement; …… NFA ……    

College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -23- Example (a | b)* a b b (a | b) a b  a b b b a  