1 Lexical Analysis. 2 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
CPSC Compiler Tutorial 9 Review of Compiler.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Topic #3: Lexical Analysis
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Lexical Analysis Natawut Nupairoj, Ph.D.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CSC 338: Compiler design and implementation
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis Hira Waseem Lecture
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
Lexical and Syntax Analysis
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
CS30003: Compilers Lexical Analysis Lecture Date: 05/08/13 Submission By: DHANJIT DAS, 11CS10012.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Lexical Analyzer in Perspective
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
1.  It is the first phase of compiler.  In computer science, lexical analysis is the process of converting a sequence of characters into a sequence.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
Overview of Previous Lesson(s) Over View  Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
The Role of Lexical Analyzer
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
System Software Theory (5KS03).
Lexical Analyzer in Perspective
Lexical and Syntax Analysis
A Simple Syntax-Directed Translator
Chapter 3 Lexical Analysis.
CSc 453 Lexical Analysis (Scanning)
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
CSc 453 Lexical Analysis (Scanning)
PROGRAMMING LANGUAGES
Compiler Construction
Lexical and Syntax Analysis
Chapter 3: Lexical Analysis
Review: Compiler Phases:
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSc 453 Lexical Analysis (Scanning)
Faculty of Computer Science and Information System
Presentation transcript:

1 Lexical Analysis

2 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code

3 Today! Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code

4 The Role of Lexical Analyzer The lexical analyzer is the first phase of a compiler.The main task of lexical Analyzer(Scanner) is to read a stream of characters as an input and produce a sequence of tokens that the parser(Syntax Analyzer) uses for syntax analysis.

5 Read Character Token Symbol Table Parser Lexical Analyzer input Push back character Get Next Token The Role of Lexical Analyzer (cont’d)

6 When a lexical analyzer is inserted b/w the parser and input stream,it interacts with two in the manner shown in the above fig.It reads characters from the input,groups them into lexeme and passes the tokens formed by the lexemes,together with their attribute value,to the parser.In some situations,the lexical analyzer has to read some characters ahead before it can decide on the token to be returned to the parser. The Role of Lexical Analyzer (cont’d)

7 For example,a lexical analyzer for Pascal must read ahead after it sees the character >.If the next character is =,then the character sequence >= is the lexeme forming the token for the “Greater than or equal to ” operator.Other wise > is the lexeme forming “Greater than ” operator,and the lexical analyzer has read one character too many.

8 The Role of Lexical Analyzer (cont’d) The extra character has to be pushed back on to the input,because it can be the beginning of the next lexeme in the input.The lexical analyzer and parser form a producer- Consumer pair. The lexical analyzer produces tokens and the parser consumes them.Produced tokens can be held in a token buffer until they are consumed.

9 The Role of Lexical Analyzer (cont’d) The interaction b/w the two is constrained only by the size of the buffer,because the lexical analyzer can not proceed when the buffer is full and the parser can not proceed when the buffer is empty.Commonly,the buffer holds just one token.In this case,the interaction can be implemented simply by making the lexical analyzer be a procedure called by the parser,returning tokens on demand.

10 The Role of Lexical Analyzer (cont’d) The implementation of reading and pushing back character is usually done by setting up an input buffer.A block of character is read into the buffer at a time; a pointer keeps track of the portion of the input that has been analyzed.Pushing back a character is implemented by moving the pointer.

11 The Role of Lexical Analyzer (cont’d) Some times lexical analyzer are divided into two phases,the first is called Scanning and the second is called Lexical Analysis.The scanning is responsible for doing simple tasks,while the lexical analyzer does the more complex operations.For example, a Fortran might use a scanner to eliminate blanks from the input.

12 Issues in Lexical Analysis There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing. Simpler design is perhaps the most important consideration.The separation of lexical analysis from syntax analysis often allows us to simply one or the other of these phases.For example,a parser embodying the conventions for comments and white space is significantly more complex---

13 Issues in Lexical Analysis (cont’d) --then one that can assume comments and white space have already been removed by a lexical analyzer.If we are designing a new language,separating the lexical and syntactic conventions can lead to a cleaner over all language design.

14 Issues in Lexical Analysis (cont’d) 2) Compiler efficiency is improved. A separate lexical analyzer allows us to construct a specialized and potentially more efficient processor for the task.A large amount of time is spent reading the source program and partitioning it into tokens.Specialized buffering techniques for reading input characters and processing tokens can significantly speed up the performance of a compiler.

15 Issues in Lexical Analysis (cont’d) 3) Compiler portability is enhanced. Input alphabet peculiarity and other device- specific anomalies can be restricted to the lexical analyzer.The representation of special or non- standard symbols,such as ↑ in Pascal,can be isolated in the lexical analyzer.

16 Issues in Lexical Analysis (cont’d) Specialized tools have been designed to help automate the construction of lexical analyzers and parsers when they are separated.

17 What exactly is lexing? Consider the code: if (i==j); z=1; else; z=0; endif; This is really nothing more than a string of characters: if_(i==j);\n\tz=1; \nelse; \tz=0;\nendif; During our lexical analysis phase we must divide this string into meaningful sub- strings.

18 Tokens The output of our lexical analysis phase is a streams of tokens. A token is a syntactic category. In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”. In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…

19 Identifying Tokens A sub-string that represents an instance of a token is called a lexeme. The class of all possible lexemes in a token is described by the use of a pattern. For example, the pattern to describe an identifier (a variable) is a string of letters, numbers, or underscores, beginning with a non-number. Patterns are typically described using regular expressions.

20 Implementation A lexical analyzer must be able to do three things: 1. Remove all whitespace and comments. 2. Identify tokens within a string. 3. Return the lexeme of a found token, as well as the line number it was found on.

21 Example if_(i==j);\n\tz=1;\nelse; \tz=0;\nendif; LineTokenLexeme 1BLOCK_COMMANDif 1OPEN_PAREN( 1IDi 1OP_RELATION== 1IDj 1CLOSE_PAREN) 1ENDLINE; 2IDz 2ASSIGN= 2NUMBER1 2ENDLINE; 3BLOCK_COMMANDelse Etc…

22 Lookahead Lookahead will typically be important to a lexical analyzer. Tokens are typically read in from left-to-right, recognized one at a time from the input string. It is not always possible to instantly decide if a token is finished without looking ahead at the next character. For example… Is “i” a variable, or the first character of “if”? Is “=” an assignment or the beginning of “==”?

23 TOKENS The output of our lexical analysis phase is a streams of tokens.A token is a syntactic category. “A name for a set of input strings with related structure” Example: “ ID,” “NUM”, “RELATION“,”IF” In English this would be types of words or punctuation, such as a “noun”, “verb”, “adjective” or “end-mark”. In a program, this could be an “identifier”, a “floating-point number”, a “math symbol”, a “keyword”, etc…

24 Tokens (cont’d) As an example,consider the following line of code,which could be part of a “ C ” program. a [index] = 4 + 2

25 Tokens (cont’d) a ID [ Left bracket index ID ] Right bracket = Assign 4 Num + plus sign 2 Num

26 Tokens (cont’d) Each token consists of one or more characters that are collected into a unit before further processing takes place. A scanner may perform other operations along with the recognition of tokens.For example,it may enter identifiers into the symbol table.

27 Attributes For Tokens The lexical analyzer collects information about tokens into their associated attributes.As a practical matter,a token has usually only a single attribute, a pointer to the symbol-table entry in which the information about the token is kept;the pointer becomes the attribute for the token.

28 Attributes For Tokens (cont’d) Let num be the token representing an integer. when a sequence of digits appears in the input stream,the lexical analyzer will pass num to the parser.The value of the integer will be passed along as an attribute of the token num.Logically, the lexical analyzer passes both the token and the attribute to the parser.

29 Attributes For Tokens (cont’d) If we write a token and its attribute as a tuple enclosed b/w, the input – 60 is transformed into the sequence of tuples

30 Attributes For Tokens (cont’d) The token “+” has no attribute,the second components of the tuples,the attribute,play no role during parsing,but are needed during translation.

31 Attributes For Tokens (cont’d) The token and associated attribute values in the “ C ” statement. E = M + C * 2 Tell me the token and their attribute value ?

32 Attributes For Tokens (cont’d)

33 Pattern “The set of strings described by a rule called a pattern associated with the token.” The pattern is said to match each string in the set. For example, the pattern to describe an identifier (a variable) is a string of letters, numbers, or underscores, beginning with a non-number. The pattern for identifier token, ID, is ID → letter (letter | digit)* Patterns are typically described using regular expressions.(RE)

34 Lexeme “A lexeme is a sequence of characters in the source program that is matched by the pattern for a token” For example, the pattern for the Relation_op token contains six lexemes ( =,,, >=) so the lexical analyzer should return a relation_op token to parser whenever it sees any one of the six.

35 Example Input: count = 123 Tokens: identifier : Rule: “letter followed by …” Lexeme: count assg_op : Rule: = Lexeme: = integer_const : Rule: “digit followed by …” Lexeme: 123

36 Tokens,Patterns.Lexemes , 0, 6.02E23 TokenSample LexemesInformal Description of Pattern const if relation id num const if, >, >= pi,count,D2 const if or >= or > letter followed by letters and digits any numeric constant Classifies Pattern

37 Tokens,Patterns.Lexemes (cont’d) Example: Const pi = ; In the example above,when the character sequence “pi” appears in the source program,a token representing an identifier is retuned to the parser(ID).The returning of a token is often implemented by passing an integer corresponding to the token.