Chapter 2 Chang Chi-Chung 2007.3.15. Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.

Slides:



Advertisements
Similar presentations
Chapter 2-2 A Simple One-Pass Compiler
Advertisements

Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CPSC 388 – Compiler Design and Construction
Intermediate Code Generation
Chapter 6 Type Checking. The compiler should report an error if an operator is applied to an incompatible operand. Type checking can be performed without.
CS 31003: Compilers Introduction to Phases of Compiler.
Chapter 8 Intermediate Code Generation. Intermediate languages: Syntax trees, three-address code, quadruples. Types of Three – Address Statements: x :=
1 Compiler Construction Intermediate Code Generation.
Yu-Chen Kuo1 Chapter 1 Introduction to Compiling.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
Environments and Evaluation
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
CH2.1 CSE4100 Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371.
Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
1 Identifiers  Identifiers are the words a programmer uses in a program  An identifier can be made up of letters, digits, the underscore character (
Topic #3: Lexical Analysis
Lexical Analysis Natawut Nupairoj, Ph.D.
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 2 Basic Elements of Java. Chapter Objectives Become familiar with the basic components of a Java program, including methods, special symbols,
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
The Java Programming Language
The TINY sample language and it’s compiler
Introduction Fan Wu Department of Computer Science and Engineering
Lexical Analyzer (Checker)
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Lexical and Syntax Analysis
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Java Programming: From Problem Analysis to Program Design, 4e Chapter 2 Basic Elements of Java.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Week 6(10.7): The TINY sample language and it ’ s compiler The TINY + extension of TINY Week 7 and 8(10.14 and 10.21): The lexical of TINY + Implement.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Overview of Previous Lesson(s) Over View  Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar. 
Syntax (2).
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Role of Lexical Analyzer
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
©SoftMoore ConsultingSlide 1 Lexical Analysis (a.k.a. Scanning)
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Chapter 2: A Simple One Pass Compiler
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
C HAPTER 2. A S IMPLE S YNTAX -D IRECTED T RANSLATOR DR. NIDJO SANDJOJO, M.Sc.
Lecture 9 Symbol Table and Attributed Grammars
A Simple Syntax-Directed Translator
Constructing Precedence Table
PROGRAMMING LANGUAGES
Compiler Construction
Chapter 2: A Simple One Pass Compiler
Introduction to Java Programming
An overview of Java, Data types and variables
CSE401 Introduction to Compiler Construction
Designing a Predictive Parser
Chap 2. Identifiers, Keywords, and Types
Lexical Elements & Operators
Faculty of Computer Science and Information System
Presentation transcript:

Chapter 2 Chang Chi-Chung

Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens  Recognize Keywords and Identifiers  Store identifier names in a symbol table.

Lexical Analyzer Lexical analyzer Lexer() if (peek == ‘\n’) line = line +1 Parser or Syntax-Directed Translator Parser() token token Attribute

Remove white space and comments for ( ; ; peek = next character ) { if ( peek is a blank or a tab ) do nothing; else if (peek is a newline) line = line + 1; else break; } For white spaces and comments  Eliminated by the lexical analyzer.  Modifying the grammar to incorporate it into the syntax. ( not easy )

Reading Ahead A lexical analyzer may need ahead some characters before it can decide on the token to be returned to the parser.  Examples To distinguish between 1 and 10. To distinguish between t and ture. To distinguish between > and >=. Approachs  To maintain an input buffer.  One-character read-ahead.

Encode constants as tokens For a sequence of digits, the lexical analyzer must pass to the parser a token.  The token consists of the terminal along with an integer- valued attribute computed from the digits. Example   if ( peek holds a digit ) { v = 0; do { v = v * 10 + integer value of digit peek; peek = next input character; } while ( peek holds a digit) return token ; }

Recognize Keywords and Identifiers Keyword  A fixed character string as punctuation marks or to identify constructs.  Example for 、 while 、 if Identifier  Use to name variables, arrays, functions, and the like.  Parser treat identifiers as terminals.  Example count = count + increment; =

Recognize Keywords and Identifiers The lexical analyzer uses a table to hold character strings.  A string table can be implemented by a hash table.  Single Representation  Reserved Words. if ( peek holds a letter ) { collect letters or digits into a buffer b ; s = string formed from the characters in b ; w = token returned by words.get(s) ; if (w is not null) return w; else { Enter the key-value pair (s, ) into words return token ; }

Create a Lexical Analyzer Token scan() { skip white space. (A) handle numbers. (B) handle reserved words and identifiers. (C) Token t = new Token ( peek ); peek = blank; (D) return t ; }

Complete Lexical Analyzer (1) package lexer; public class Token { public final int tag; public Token(int t) { tag = t; } } public class Tag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } public class Num extends Token { public final int value; public Num(int v) { super(Tag.NUM); value = v; } public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); } class Token +int tag class Num +int value class Word +string lexeme

Complete Lexical Analyzer (2) package lexer; import java.io.*; import java.util.*; public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable(); void reserve(Word t) { words.put(t.lexeme, t); } public Lexer() { reserve( new Word(Tag.TRUE, "true") ); reserve( new Word(Tag.FALSE, "false") ); }

Complete Lexical Analyzer (3) public Token scan() throws IOException { for ( ; ; peek = (char) System.in.read() ) { if ( peek == ' ' || peek == '\t' ) continue; else if ( peek == '\n' ) line = line + 1; else break; } if ( Character.isDigit(peek) ) { int v = 0; do { v = v * 10 + Character.digit(peek, 10); peek = (char) System.in.read(); } while ( Character.isDigit(peek) ) return new Num(v); } } C D

Complete Lexical Analyzer (4) public Token scan() throws IOException { if ( Character.isLetter(peek) ) { StringBuffer b = new StringBuffer(); do { b.append(peek); peek = (char) System.in.read() } while ( Character.isLetterOrDigit(peek) ); String s = b.toString(); Word w = (Word) words.get(s); if (w != null) return w; w = new Word(Tag.ID, s); words.put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; } A B

Symbol Tables Symbol tables are data structures  Used by compilers to hold information about source-program constructs. Scope of identifier x  The scope of a particular declaration x Scope  A portion of a program that is the scope of one or more declaration.

Symbol Tables w xint y w y bool zint B0B0 B1B1 B3B3 { int x 1, int y 1 ; { int w 2 ; bool y 2 ; int z 2 ; w 2 ; x 1 ; y 2 ; z 2 ; } w 0 ; x 1 ; y 1 ; }

Symbol Tables package symbols; import java.util.*; public class Env { private Hashtable table; protected Env prev; public Env(Env p) { table = new Hashtable(); prev = p; } public void put(String s, Symbol sym) { table.put(s, sym); } public Symbol get(String s) { for (Env e = this; e != null; e = e.prev) { Symbol found = (Symbol)(e.table.get(s)); if (found != null) return found; } return null; } w xint y w y bool zint B0B0 B1B1 B3B3

The Use of Symbol Tables program → block { top = null; } block → ‘{‘ { saved = top; top = new Env(top); print(“{ “); } decls stmts ‘}’ { top = saved; print(“} “); } decls → decls decl | ε decl → type id ; { s = new Symbol; s.type = type.lexeme; top.put(id.lexeme, s); } stmts → stmts stmt | ε stmt → block | factor ; { print(“; “); } factor → id { s = top.get(id.lexeme); print(id.lexeme); print(“:”); print(s.type); }

Intermediate Code Generation Two most important intermediate representations.  Trees Parse trees, syntax trees (abstract trees) Examples  while ( expr ) stmt  op: while E 1 : expr E 2 : stmt  Linear representations Three-address code x, y, z: names, constants, compiler-generated temporaries. Examples  ifFalse x goto L  ifTrue x goto L  goto L  x = y  x [ y ] = z  x = y [ z ] op E1E1 E2E2 x = y op z

Three-Address Code ifFalse x goto after code to compute expr into x code for stmt 1 after if expr then stmt 1  ifFalse x goto after x = i – j + k t1 = i – j t2 = t1 + k x = t2 temporary x = 2 * a [ i ] t1 = a [ i ] t2 = 2 * t1 x = t2

Intermediate Code Generation Parser or Syntax-Directed Translator Parser() If eq peek assign (int) ‘\n’ line 1 + 1: t1 = (int) ‘\n’ 2: ifFalse peek == t1 goto 4 3: line = line + 1 4: or if (peek == ‘\n’) line = line +1

Static Checking Static checks are consistency checks that are done during compilation.  Syntactic Checking An identifier being declared at most once in a scope. A break statement must have an enclosing loop or switch statement.  Type Checking l-values and r-values  r-values are as values.  l-values are locations.  Examples i = 5 i = i + 1

Syntax Trees Concrete SyntaxAbstract Syntax = || && == != = > + - * / % ! - unary [ ] assign cond rel op not minus access

Syntax Trees seq if while null some tree for an expression some tree for an expression some tree for an expression some tree for an expression