Lexical Analysis Lecture 2 Mon, Jan 17, 2005.

Slides:



Advertisements
Similar presentations
Lexical Analysis. what is the main Task of the Lexical analyzer Read the input characters of the source program, group them into lexemes and produce the.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
LR Parsing – The Items Lecture 10 Mon, Feb 14, 2005.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
Lexical Analysis Natawut Nupairoj, Ph.D.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
CH3.1 CS 345 Dr. Mohamed Ramadan Saady Algebraic Properties of Regular Expressions AXIOMDESCRIPTION r | s = s | r r | (s | t) = (r | s) | t (r s) t = r.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite.
Lexical Analysis Lecture 2 Mon, Jan 19, Tokens A token has a type and a value. Types include ID, NUM, ASSGN, LPAREN, etc. Values are used primarily.
Lexical Analyzer in Perspective
Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 January 18, January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis.
1st Phase Lexical Analysis
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
Chapter 2 Scanning From Regular Expression to DFA Gang S.Liu College of Computer Science & Technology Harbin Engineering University.
I.E. LEXICAL ANALYSIS I.E. LINEAR ANALYSIS
Lexical Analyzer in Perspective
Finite automate.
CS510 Compiler Lecture 2.
Chapter 3 Lexical Analysis.
Lexical Analysis CSE 340 – Principles of Programming Languages
Lecture 5 Transition Diagrams
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Deterministic Finite Automata
Regular Definition and Transition Diagrams
Lexical Analysis Why separate lexical and syntax analyses?
Deterministic Finite Automata
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Regular Expressions Prof. Busch - LSU.
Lecture 2 Lexical Analysis
Review: Compiler Phases:
Recognition of Tokens.
Transition Diagrams Lecture 3 Fri, Jan 21, 2005.
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Specification of tokens using regular expressions
Compiler Construction
COMPILERS LECTURE(6-Aug-13)
Compiler Construction
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis Lecture 2 Mon, Jan 17, 2005

Tokens A token has a type and a value. Types include ID, NUM, ASSIGN, LPAREN, etc. Values are used primarily with identifiers and numbers. If we read “count”, the type is ID and the value is “count”. If we read “123”, the type is NUM and the value is “123”.

Analyzing Tokens Each type of token can be described by a regular expression. (Why?) Therefore, the set of all tokens can be described by a regular expression. (Why?) Regular expressions are accepted by DFAs. Therefore, the tokens can be processed and accepted by a DFA.

Regular Expressions The set of all regular expressions may be defined in two parts. The basic part.  represents the language {}. a represents the language {a} for every a  . Call these languages L() and L(a), respectively.

Regular Expressions The recursive part. In other words Let r and s denote regular expressions. r | s represents the language L(r)  L(s). rs represents the language L(r)L(s). r* represents the language L(r)*. In other words L(r | s) = L(r)  L(s). L(rs) = L(r)L(s). L(r*) = L(r)*.

Example: Identifiers Identifiers in C++ can be represented by a regular expression. r = A | B | … | Z | a | b | … | z s = 0 | 1 | … | 9 t = r(r | s)*

Regular Expressions A regular definition of a regular expression is a “grammar” of the form d1  r1, d2  r2, : dn  rn, where each ri is a regular expression over   {d1, …, di – 1}.

Regular Expressions Note that this definition does not allow recursively defined tokens. In other words, di cannot be defined in terms of di, even indirectly.

Example: Identifiers We may now describe C++ identifiers as follows. letter  A | B | … | Z | a | b | … | z digit  0 | 1 | … | 9 id  letter(letter | digit)*

Lexical Analysis After writing a regular expression for each kind of token, we may combine them into one big regular expression describing all tokens. id  letter(letter | digit)* num  digit(digit)* relop  < | > | == | != | >= | <= token  id | num | relop | …

Transition Diagrams A regular expression may be represented by a transition diagram. The transition diagram provides a good guide to writing a lexical analyzer program.

Example: Transition Diagram id: letter letter | digit num: digit token:

Transition Diagrams Unfortunately, it is not that simple. At what point may we stop in an accepting state? Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”, “t”. When we stop in an accepting state, we must be able to determine the type of token processed. Did we read the ID token “count” or did we read the IF token “if”?

Transition Diagrams Consider transitions diagrams to accept relational operators ==, !=, <, >, <=, and >=. ==: = !=: ! <=: < and so on.

Transition Diagrams Combine them into a single transition diagram. relop: 1 2 4 3 = | ! = < | >

Transition Diagrams When we reach an accepting state, how can we tell which operator was processed?. In general, we design the diagram so that each kind of token has its own accepting state.

Transition Diagrams If we reach state 3, how do we decide whether to continue to state 4? We read characters until the current character does not match any pattern, i.e., it would lead to the dead state. At that point, we accept the string, minus the last character. Later, processing resumes with the last character.

Transition Diagrams relop: = ! < > other