Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
컴파일러 입문 제 4 장 어휘 분석.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis Natawut Nupairoj, Ph.D.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
C Tokens Identifiers Keywords Constants Operators Special symbols.
CSC 338: Compiler design and implementation
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
___________________________________________ COMPILER Theory___________________________________________ Fourth Year (First Semester) Dr. Hamdy M. Mousa.
Lexical Analyzer in Perspective
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
 Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer  Scanner  Lexer.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Scanner Introduction to Compilers 1 Scanner.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Ajmer Singh PGT(IP) Programming Fundamentals. Ajmer Singh PGT(IP) Java Character Set Character set is a set of valid characters that a language can recognize.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
1st Phase Lexical Analysis
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Prologue Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Department of Software & Media Technology
Lexical Analyzer in Perspective
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Scanner Scanner Introduction to Compilers.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
PROGRAMMING LANGUAGES
Lexical analysis Jakub Yaghob
Department of Software & Media Technology
Chapter 3: Lexical Analysis
Review: Compiler Phases:
CS 3304 Comparative Languages
Scanner Scanner Introduction to Compilers.
4b Lexical analysis Finite Automata
Scanner Scanner Introduction to Compilers.
4b Lexical analysis Finite Automata
Lexical Analysis - An Introduction
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Presentation transcript:

Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim

1. Introduction (1) Lexical analysis —Read source program and identify the smallest grammatical units Lexical analyzer = Scanner = Lexer (2011-1) Compiler2 token Lexical Analyzer Source programToken stream

1. Introduction (2) Token —Recognized by FA —Special form: reserved word —General form: identifier, constant, … (2011-1) Compiler3

4 Special Form: language designer 1.Keyword --- begin, end, for, if,... 2.Operator symbols --- +, -, *, /, <, := etc. 3.Delimiters --- ;,,, (, ), [, ] etc. General Form: programmer 1.Identifier --- stk, ptr, sum,... 2.Constant , 3.0, e-10, 'string' etc.

1. Introduction (3) Terminology —Token: smallest grammatical unit, terminal symbol —Token number: integer number for token, efficient string processing —Token value: string value (ID) or numerical value (constant) (2011-1) Compiler5

1. Introduction (4) Example Token Structure - represented by regular expression. ex) id = l ( l + d )* (2011-1) Compiler6 IF A > 10 THEN... Token Number : Token Value : 0 'A'

(2011-1) Compiler7 a = b + 3 ; Token Number : Token Value : a 0 b Lexical Analyzer Parser a = b + 3; (4, a) (23, 0) (4, b) (11, 0) (5, 3) (20, 0) (4, 10) (4, 20)

1. Introduction (5) Symbol table management —Token, token value, attributes of the IDs —Used in Lexical analysis, syntactic analysis, semantic analysis (2011-1) Compiler8

1. Introduction (6) —Lexical analysis steps –Token recognition: insert to the symbol table –Give index of the symbol table to the parser (2011-1) Compiler9 SymbolAttribute … a b … … integer var 1 2 … … 10 20

1. Introduction (7) Etc —Line number in the source program —Blank and comment processing Relationship with syntactic analyzer (2011-1) Compiler10 Lexical analyzerParser Input program Get token Token

2. Token recognition (1) Scanner design steps 1. Describe the structure of tokens in re. 2. or, directly design a transition diagram for the tokens. 3. Program a scanner according to the diagram. 4. Verify the scanner action through regular language theory. (2011-1) Compiler11

2. Token recognition (2) Character classification —letter : a | b | c... | z | A | B | C |…| Z l —digit : 0 | 1 | 2... | 9 d —special character : + | - | * | / |. |, |... (2011-1) Compiler12

2.1 ID recognition (1) State transition diagram for ID recognition (2011-1) Compiler13 S A 1,d,_ 1,_ start

2.1 ID recognition (2) Conversion to regular expression —Regular grammar —Regular expression (2011-1) Compiler14 S  lA |_A A  lA | dA | _A | ε S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l+d+_)A + ε = (l+d+_) *  S = (l+_)(l+d+_) *

2.2 Integer recognition (1) Integer format —Decimal number, octal number, hexadecimal number —Repetitive numbers (2011-1) Compiler15

2.2 Integer recognition (2) State transition diagram for integer recognition (2011-1) Compiler16 d n S start A B 0 C o o D x, X E h h

2.3 Real number recognition (1) Real number format —Fixed-point —Float-point: exponent part (2011-1) Compiler17

2.3 Real number recognition (2) State transition diagram for real number recognition (2011-1) Compiler18 d start e d d. D E d F G d d + - C SB d A d

(2011-1) Compiler19 E = dE + ε= d* F = dE = dd* = d + G = dE = dd* = d + D = dE + '+’F + -G = dd* + '+'d + + -d + = d + + '+'d + + -d + = (ε + '+' + - )d + C = dC + eD + ε = dC + e (ε+ '+' + - )d + + ε = d * (e (ε+ '+' + - )d + + ε) B = dC = d d * (e (ε+ '+' + - )d + + ε) = d + (e (ε+ '+' + - )d + + ε) A = dA +.B = d*.B = d*. d + (e (ε+ '+' + - )d + + ε) S = dA = dd*. d + (e (ε+ '+' + - )d + + ε) = d +. d + (e (ε+ '+' + - )d + + ε) = d +. d + + d +. d + e (ε+ '+' + - )d + Regular Expression S  dA A  dA |.B B  dC C  dC | eD | ε D  dE | +F | -G E  dE | ε F  dE G  dE Regular Grammar

2.4 String recognition (1) String constant —Characters enclosed by “ ” —Example: “This is a string”, “double quote is \” character.” (2011-1) Compiler20

2.4 String recognition (2) State transition diagram for string constant recognition —a = char_set -{“, \} —c = other character (2011-1) Compiler21 a B SA “ start “ C c \

2.4 String recognition (3) Regular grammar Regular expression (2011-1) Compiler22 S  “A A  aA | “B | \C B  ε C  cA A = aA + “B + \C S = “A = aA + “ + \cA = “(a + \c)*” = (a + \c)A + “ = (a + \c)*”

2.5 Comment processing (1) Comment —Express comment between /* and */ State transition diagram for comment recognition —a = char_set - {*} and b = char_set - {*, /} (2011-1) Compiler23 a * / D SB start AC * b * /

2.5 Comment processing (2) Regular grammar Regular expression (2011-1) Compiler24 S  /A A  *B B  aB | *C C  *C | bB | /D D  ε C = *C + bB + /D = * * (bB + /) B = aB + *C = aB + ** * (bB + /) = aB + ** * bB + ** */ = (a + ** * b)B + ** */ = (a + ** * b) * ** * / A = *B = *(a + ** * b) * ** * /  S = /A = /* (a + ** * b) * * * */

3. Lexical analyzer implementation (1) Implementation steps —Regular expressions —NFA —DFA —State minimization —Programming (2011-1) Compiler25

3. Lexical analyzer implementation (2) Implementation —Determine token structure from grammar representation —Token recognition program –Programming language –Lexical analyzer generator (2011-1) Compiler26

3. Lexical analyzer implementation (3) Lexical Analyzer for mini C (Appendix A) —Special symbol: 30 —Word symbol: 7 (2011-1) Compiler27 ! != % %= && ( ) * *= =, = / /= ; >= [ ] { || } const else if int return void while

3. Lexical analyzer implementation (4) —State transition diagram: p. 143, 144 —Lexical analysis program: p. 145~ p. 148 (2011-1) Compiler28