Lexical Analysis Lecture 2 Mon, Jan 19, 2004. Tokens A token has a type and a value. Types include ID, NUM, ASSGN, LPAREN, etc. Values are used primarily.

Slides:



Advertisements
Similar presentations
Lexical Analysis. what is the main Task of the Lexical analyzer Read the input characters of the source program, group them into lexemes and produce the.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Chapter 3 Lexical Analysis Yu-Chen Kuo.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
LR Parsing – The Items Lecture 10 Mon, Feb 14, 2005.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
Compiler Construction
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions Finite AutomataReadings: January 11, 2006 CSCE 531 Compiler.
Bottom-up parsing Goal of parser : build a derivation
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
Lexical Analysis Natawut Nupairoj, Ph.D.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
CH3.1 CS 345 Dr. Mohamed Ramadan Saady Algebraic Properties of Regular Expressions AXIOMDESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analyzer in Perspective
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
Finite Automata.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
1 January 18, January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis.
1st Phase Lexical Analysis
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Lecture 8 NFA Subset Construction & Epsilon Transitions
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Scanning & Regular Expressions CPSC 388 Ellen Walker Hiram College.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Lexical Analyzer in Perspective
Finite automate.
Chapter 3 Lexical Analysis.
Lexical Analysis CSE 340 – Principles of Programming Languages
Lecture 5 Transition Diagrams
LR Parsing – The Items Lecture 10 Fri, Feb 13, 2004.
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Recognition of Tokens.
Lexical Analysis Lecture 2 Mon, Jan 17, 2005.
Transition Diagrams Lecture 3 Fri, Jan 21, 2005.
Specification of tokens using regular expressions
Compiler Construction
COMPILERS LECTURE(6-Aug-13)
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis Lecture 2 Mon, Jan 19, 2004

Tokens A token has a type and a value. Types include ID, NUM, ASSGN, LPAREN, etc. Values are used primarily with identifiers and numbers. If we read “count”, the type is ID and the value is “count”. If we read “123.45”, the type is NUM and the value is “123.45”.

Analyzing Tokens Each type of token can be described by a regular expression. (Why?) Therefore, the set of all tokens can be described by a regular expression. (Why?) Regular expressions are accepted by DFAs. Therefore, the tokens can be processed and accepted by a DFA.

Regular Expressions The set of all regular expressions may be defined in two parts. The basic part.  represents the language {  }. a represents the language {a} for every a  . Call these languages L(  ) and L(a), respectively.

Regular Expressions The recursive part. Let r and s denote regular expressions. r | s represents the language L(r)  L(s). rs represents the language L(r)L(s). r* represents the language L(r)*. In other words L(r | s) = L(r)  L(s). L(rs) = L(r)L(s). L(r*) = L(r)*.

Example: Identifiers Identifiers in C++ can be represented by a regular expression. r = A | B | … | Z | a | b | … | z s = 0 | 1 | … | 9 t = r(r | s)*

Regular Expressions A regular definition of a regular expression is a “grammar” of the form d 1  r 1, d 2  r 2, : d n  r n, where each r i is a regular expression over   {d 1, …, d i – 1 }.

Regular Expressions Note that this definition does not allow recursively defined tokens. In other words, d i cannot be defined in terms of d i, even indirectly.

Example: Identifiers We may now describe C++ identifiers as follows. letter  A | B | … | Z | a | b | … | z digit  0 | 1 | … | 9 id  letter(letter | digit)*

Lexical Analysis After writing a regular expression for each kind of token, we may combine them into one big regular expression describing all tokens. id  letter(letter | digit)* num  digit(digit)* relop  | == | != | >= | <= token  id | num | relop | …

Transition Diagrams A regular expression may be represented by a transition diagram. The transition diagram provides a good guide to writing a lexical analyzer program.

Example: Transition Diagram id: letter letter | digit num: digit token: digit letter letter | digit

Transition Diagrams Unfortunately, it is not that simple. At what point may we stop in an accepting state? Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”, “t”. When we stop in an accepting state, we must be able to determine the type of token processed. Did we read the ID token “count” or did we read the IF token “if”?

Transition Diagrams Consider transitions diagrams to accept relational operators ==, !=,, =. == : = = != : ! = <= : < = and so on.

Transition Diagrams Combine them into a single transition diagram. relop: = | ! = =

Transition Diagrams When we reach an accepting state, how can we tell which operator was processed?. In general, we design the diagram so that each kind of token has its own accepting state.

Transition Diagrams If we reach accepting state #3, how do we decide whether to continue to accepting state #4? We read characters until the current character does not match any pattern. Then we “back up” to the previous accepting state (if there is one) and accept the token.

Transition Diagrams relop: = ! < > = = = = other