컴파일러 입문 제 4 장 어휘 분석.

Slides:



Advertisements
Similar presentations
Lexical Analysis Dragon Book: chapter 3.
Advertisements

COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
1 Chapter 5 Compilers Source Code (with macro) Macro Processor Expanded Code Compiler or Assembler obj.
Compiler Design Lexical Analysis Syntactical Analysis Semantic Analysis Optimization Code Generation.
1 Terminology l Statement ( 敘述 ) »declaration, assignment containing expression ( 運算式 ) l Grammar ( 文法 ) »a set of rules specify the form of legal statements.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
CS 461 – Sept. 19 Last word on finite automata… –Scanning tokens in a compiler –How do we implement a “state” ? Chapter 2 introduces the 2 nd model of.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
 Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer  Scanner  Lexer.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
ISBN Chapter 4 Lexical and Syntax Analysis.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Lexical Analyzer in Perspective
Lexical and Syntax Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
CSc 453 Lexical Analysis (Scanning)
PROGRAMMING LANGUAGES
Finite-State Machines (FSMs)
Compiler Construction
Compiler Lecture 1 CS510.
Lexical analysis Jakub Yaghob
Recognizer for a Language
Chapter 3: Lexical Analysis
Review: Compiler Phases:
Compilers B V Sai Aravind (11CS10008).
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CMPE 152: Compiler Design August 21/23 Lab
컴파일러 입문 제 4 장 어휘 분석.
Lexical Analysis - An Introduction
High-Level Programming Language
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

컴파일러 입문 제 4 장 어휘 분석

4.1 서 론 Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer  Scanner  Lexer

Token ex) if ( a > 10 ) ... Token Number : 32 7 4 25 5 8 문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value. ex) if ( a > 10 ) ... Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

Token Structure - represented by regular expression. Token classes Special form - language designer 1. Keyword --- const, else, if, int, ... 2. Operator symbols --- +, -, *, /, ++, -- etc. 3. Delimiters --- ;, ,, (, ), [, ] etc. General form - programmer 4. identifier --- stk, ptr, sum, ... 5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc. Token Structure - represented by regular expression. ex) id = (l + _)( l + d + _)*

Interaction of Lexical Analyzer with Parser Lexical Analyzer is the procedure of Syntax Analyzer. L.A.  Finite Automata. S.A.  Pushdown Automata. Token type scanner가 parser에게 넘겨주는 토큰 형태. (token number, token value) ex) if ( x > y ) x = 10 ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)

Parser의 행동(Shift, Reduce, Accept, Error)을 결정. The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing). 1. modular construction - simpler design. 2. compiler efficiency is improved. 3. compiler portability is enhanced. Parsing table Parser의 행동(Shift, Reduce, Accept, Error)을 결정. Token number는 Parsing table의 index.

Symbol table의 용도 L.A와 S.A시 identifier에 관한 정보를 수집하여 저장. Semantic analysis와 Code generation시에 사용. name + attributes ex) Hashed symbol table chapter 12 참조

4.2 토큰 인식 Specification of token structure - RE Specification of PL - CFG Scanner design steps 1. describe the structure of tokens in re. 2. or, directly design a transition diagram for the tokens. 3. and program a scanner according to the diagram. 4. moreover, we verify the scanner action through regular language theory. Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...

4.2.1 Identifier Recognition Transition diagram Regular grammar S  lA | _A A  lA | dA | _A | ε Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*  S = (l + _)( l + d + _)*

4.2.2 Integer number Recognition Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수 : 0이 아닌 수 시작 8진수 : 0으로 시작, 16진수 : 0x, 0X로 시작 Transition diagram n : non-zero digit o : octal digit h : hexa digit

Regular grammar Regular expression C  oC | ε D  hE E  hE | ε S  nA | 0B A  dA | ε B  oC | xD | XD | ε C  oC | ε D  hE E  hE | ε Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+ C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d* S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+ ∴ S = nd* + 0 + 0o+ + 0(x + X)h+ E = hE + ε = h*ε = h* D = hE = hh* = h+ C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d* S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+ ∴ S = nd* + 0 + 0o+ + 0(x + X)h+

4.2.3 Real number Recognition Form : Fixed-point number & Floating-point number Transition diagram Regular grammar S  dA D  dE | +F | -G A  dA | .B E  dE |ε B  dC F  dE C  dC | eD |ε G  dE

Regular expression E = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+ D = dE + '+'F + -G = dd* + '+'d+ + -d + = d+ + '+'d+ + -d+ = (ε + '+' +-)d + C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e = d*(e(ε + '+' +-) d+ + ε) B = dC=dd*(e(ε + '+' +-)d+ +ε) = d++(e(ε + '+' +-) d+ +ε) A = dA + .B = d*.d+(e(ε + '+' +-)d+ + ε) S = dA = dd*. d+(e(ε + '+' +-) d+ +ε) = d+.d+(e(ε + '+' +-) d+ + ε) = d+.d++ d+.d+e(ε + '+' +-) d+ 참고 Terminal +를 ‘+’로 표기.

4.2.4 String Constant Recognition Form : a sequence of characters between a pair of double quotes. Transition diagram where, a = char_set - {", \} and c = char_set Regular grammar S  "A A  aA | "B | \C B  ε C  cA

Regular expression A = aA + " B + \C = aA + " + \cA = (a + \c)A + " S = " A = "(a + \c)*" ∴ S = "(a + \c)* "

4.2.5 Comment Recognition Transition diagram Regular grammar S  /A where, a = char_set - {*} and b = char_set - {*, /}. Regular grammar S  /A A  *B B  aB | *C C  *C | bB | /D D  ε

A program which recognizes a comment statement. do { Regular expression C = *C + bB + /D = **(bB + /) B = aB + ***(bB + /) = aB + ***bB + ***/ = (a + *** b)B + ***/= (a + ***b)****/ A = *B = *(a + ***b)****/  S = /A = /* (a + ***b)****/ A program which recognizes a comment statement. do { while (ch != '*') ch = getchar(); ch = getchar(); } while (ch != '/');