Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.

Slides:

Advertisements

Similar presentations

Lexical Analysis Dragon Book: chapter 3.

Advertisements

4b Lexical analysis Finite Automata

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.

YES-NO machines Finite State Automata as language recognizers.

Chapter 3 Lexical Analysis Yu-Chen Kuo.

Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.

1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.

CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)

Topics Automata Theory Grammars and Languages Complexities

1.Defs. a)Finite Automaton: A Finite Automaton ( FA ) has finite set of ‘states’ ( Q={q 0, q 1, q 2, ….. ) and its ‘control’ moves from state to state.

Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.

CPSC 388 – Compiler Design and Construction

 We are given the following regular definition: if -> if then -> then else -> else relop -> |>|>= id -> letter(letter|digit)* num -> digit + (.digit.

Topic #3: Lexical Analysis

CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.

Finite-State Machines with No Output

Lexical Analysis Natawut Nupairoj, Ph.D.

1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.

Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.

Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.

1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.

Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.

Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.

Lexical Analyzer (Checker)

Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.

4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.

COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.

1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,

Lexical Analyzer in Perspective

Lexical Analysis: Finite Automata CS 471 September 5, 2007.

CHAPTER 1 Regular Languages

Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.

Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.

CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University

1st Phase Lexical Analysis

Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.

using Deterministic Finite Automata & Nondeterministic Finite Automata

1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.

Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.

Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.

Deterministic Finite Automata Nondeterministic Finite Automata.

Akram Salah ISSR Basic Concepts Languages Grammar Automata (Automaton)

CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.

COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.

Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia

Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.

WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.

Lexical Analyzer in Perspective

Finite automate.

CS510 Compiler Lecture 2.

Chapter 3 Lexical Analysis.

Lecture 5 Transition Diagrams

Finite-State Machines (FSMs)

Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:

Two issues in lexical analysis

Regular Definition and Transition Diagrams

Recognizer for a Language

פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.

Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.

Recognition of Tokens.

Finite Automata.

4b Lexical analysis Finite Automata

Compiler Construction

4b Lexical analysis Finite Automata

Lexical Analysis.

NFAs and Transition Graphs

Presentation transcript:

Overview of Previous Lesson(s)

Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.  Information is put into the symbol table when the declaration of an identifier is analyzed.  Entries in the symbol table contain information about an identifier such as its lexeme, its type, its position in storage, and any other relevant information.  The information is collected incrementally. 3

Over View..  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description of the form that the lexemes of a token may take.  In the case of a keyword as a token, the pattern is just the sequence of characters that form the keyword.  A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. 4

Over View…  Some definitions:  Def: An alphabet is a finite set of symbols.  Ex: {0,1}, presumably φ (uninteresting), ascii, unicode, ebcdic  Def: A string over an alphabet is a finite sequence of symbols from that alphabet. Strings are often called words or sentences.  Ex: Strings over {0,1}: ε, 0, 1,

Over View…  Def: A language over an alphabet is a countable set of strings over the alphabet.  Ex: All grammatical English sentences with five, eight, or twelve words is a language over ascii.  Def: The concatenation of strings s and t is the string formed by appending the string t to s. It is written st.  Def: The length of a string is the number of symbols (counting duplicates) in the string.  Ex: The length of vciit, written |vciit|, is 5. 6

Over View…  Def: A prefix of string S is any string obtained by removing zero or more symbols from the end of s.  Ex: ban, banana, and ε are prefixes of banana.  Def: A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s.  Ex: nana, banana, and ε are suffixes of banana.  Def: substring of s is obtained by deleting any prefix and any suffix from s.  Ex: banana, nan, and ε are substrings of banana. 7

Over View...  Operations on Languages:  L U D is the set of letters and digits, each of which strings is either one letter or one digit.  LD is the set of 520 strings of length two, each consisting of one letter followed by one digit.  L 4 is the set of all 4-letter strings.  L * is the set of ail strings of letters, including ε, the empty string.  L(L U D)* is the set of all strings of letters and digits beginning with a letter.  D + is the set of all strings of one or more digits. 8

Over View…  A regular expression is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings.  The idea is that the regular expressions over an alphabet consist of the alphabet, and expressions using union, concatenation, and *, but it takes more words to say it right.  Each regular expression r denotes a language L(r), which is also defined recursively from the languages denoted by r's subexpressions. 9

Over View…  Ex. Let Σ = {a, b} 1.The regular expression a | b denotes the language {a, b}. 2.(a|b)(a|b) denotes {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Σ. 3.a* denotes the language consisting of all strings of zero or more a's, that is, {ε, a, aa, aaa,... }. 10

Over View…  If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: d 1 - > r 1 d 2 - > r d n - > r n Where the d's are unique and not in Σ and r i is a regular expressions over Σ ∪ {d 1,...,d i-1 }. 11

12

Contents  Recognition of Tokens  Transition Diagrams  Recognition of Reserved Words and Identifiers  Recognizing Whitespace  Recognizing Numbers  Finite Automata  NFA  Transition Tables 13

Recognition of Tokens  Now we see how to build a piece of code that examines the input string and finds a prefix that is a lexeme matching one of the patterns.  Our current goal is to perform the lexical analysis needed for the following grammar.  Recall that the terminals are the tokens & the nonterminals produce terminals. 14

Recognition of Tokens..  A regular definition for the terminals is 15

Recognition of Tokens…  We also want the lexer to remove whitespace so we define a new token ws → ( blank | tab | newline ) +  where blank, tab, and newline are symbols used to represent the corresponding ascii characters.  If the lexer recognizes the token ws, it does not return it to the parser but instead goes on, to recognize the next token, which is then returned. 16

Recognition of Tokens..  Our goal for the lexical analyzer is summarized below: 17

Transition Diagram  As an intermediate step in the construction of a lexical analyzer, we first convert patterns into stylized flowcharts, called "transition diagrams”.  Conversion of RE patterns to Transition Diagram.  Transition diagrams have a collection of nodes or circles, called states  Each state represents a condition that could occur during the process of scanning the input looking for a lexeme that matches one of several patterns. 18

Transition Diagram..  Edges are directed from one state of the transition diagram to another.  Each edge is labeled by a symbol or set of symbols.  Some important conventions:  The double circles represent accepting or final states at which point a lexeme has been found. There is often an action to be done (e.g., returning the token), which is written to the right of the double circle.  If we have moved one (or more) characters too far in finding the token, one (or more) stars are drawn.  An imaginary start state exists and has an arrow coming from it to indicate where to begin the process. 19

Transition Diagram…  A transition diagram that recognizes the lexemes matching the token relop. 20

Recognition of Reserved Words and Identifiers  Recognizing keywords and identifiers presents a problem.  The transition diagram below corresponds to the regular definition given previously. 21

Recognition of Reserved Words and Identifiers  Two questions arises:  How do we distinguish between identifiers and keywords such as then, which also match the pattern in the transition diagram?  What is (gettoken(), installID())?  We will use the method, i.e having the keywords installed into the identifier table prior to any invocation of the lexer.  The table entry will indicate that the entry is a keyword. 22

Recognition of Reserved Words and Identifiers..  installID() checks if the lexeme is already in the table. If it is not present, the lexeme is installed as an id token. In either case a pointer to the entry is returned.  gettoken() examines the lexeme and returns the token name, either id or a name corresponding to a reserved keyword.  So far we have transition diagrams for identifiers (this diagram also handles keywords) and the relational operators.  What remains are whitespace, and numbers. 23

Recognizing Whitespace  Recognizing Whitespace  The delim in the diagram represents any of the whitespace characters, say space, tab, and newline.  The final star is there because we needed to find a non-whitespace character in order to know when the whitespace ends and this character begins the next token.  There is no action performed at the accepting state. 24

Recognizing Numbers  The transition diagram for token number 25

Finite Automata  Finite automata are like the graphs in transition diagrams but they simply decide if an input string is in the language (generated by our regular expression).  Finite automata are recognizers, they simply say "yes" or "no" about each possible input string.  There are two types of Finite automata:  Nondeterministic finite automata (NFA) have no restrictions on the labels of their edges. A symbol can label several edges out of the same state, and ε, the empty string, is a possible label. 26

Finite Automata..  Deterministic finite automata (DFA) have exactly one edge, for each state, and for each symbol of its input alphabet with that symbol leaving that state.  So if you know the next symbol and the current state, the next state is determined. That is, the execution is deterministic, hence the name.  Both deterministic and nondeterministic finite automata are capable of recognizing the same languages. 27

N - Finite Automata  A nondeterministic finite automaton (NFA) consists of: 1. A finite set of states S. 2. A set of input symbols Σ, the input alphabet. We assume that ε, which stands for the empty string, is never a member of Σ. 3. A transition function that gives, for each state, and for each symbol in Σ U {ε} a set of next states. 4. A state S 0 from S that is distinguished as the start state (or initial state) 5. A set of states F, a subset of S, that is distinguished as the accepting states (or final states). 28

N - Finite Automata..  An NFA is basically a flow chart like the transition diagrams we have already seen.  Indeed an NFA can be represented by a transition graph whose nodes are states and whose edges are labeled with elements of Σ ∪ ε.  The differences between a transition graph and previous transition diagrams are:  Possibly multiple edges with the same label leaving a single state.  An edge may be labeled with ε. 29

N - Finite Automata...  Ex: The transition graph for an NFA recognizing the language of regular expression (a | b) * abb  This ex, describes all strings of a's and b's ending in the particular string abb. 30

Transition Tables  Transition Table is an equivalent way to represent an NFA, in which, for each state s and input symbol x (and ε), the set of successor states x leads to from s.  The empty set φ is used when there is no edge labeled x emanating from s. 31 Transition Table for (a | b) * abb

Thank You