Presentation is loading. Please wait.

Presentation is loading. Please wait.

컴파일러 입문 제 4 장 어휘 분석.

Similar presentations


Presentation on theme: "컴파일러 입문 제 4 장 어휘 분석."— Presentation transcript:

1 컴파일러 입문 제 4 장 어휘 분석

2 4.1 서 론 Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer  Scanner  Lexer

3 Token ex) if ( a > 10 ) ... Token Number : 32 7 4 25 5 8
문법적으로 의미 있는 최소 단위 Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value. ex) if ( a > ) ... Token Number : Token Value : ‘a’

4 Token Structure - represented by regular expression.
Token classes Special form - language designer 1. Keyword --- const, else, if, int, ... 2. Operator symbols --- +, -, *, /, ++, -- etc. 3. Delimiters --- ;, ,, (, ), [, ] etc. General form - programmer 4. identifier --- stk, ptr, sum, ... 5. constant , 3.0, e-10, ‘c’, “string” etc. Token Structure - represented by regular expression. ex) id = (l + _)( l + d + _)*

5 Interaction of Lexical Analyzer with Parser
Lexical Analyzer is the procedure of Syntax Analyzer. L.A.  Finite Automata. S.A.  Pushdown Automata. Token type scanner가 parser에게 넘겨주는 토큰 형태. (token number, token value) ex) if ( x > y ) x = ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)

6 Parser의 행동(Shift, Reduce, Accept, Error)을 결정.
The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing). 1. modular construction - simpler design. 2. compiler efficiency is improved. 3. compiler portability is enhanced. Parsing table Parser의 행동(Shift, Reduce, Accept, Error)을 결정. Token number는 Parsing table의 index.

7 Symbol table의 용도 L.A와 S.A시 identifier에 관한 정보를 수집하여 저장.
Semantic analysis와 Code generation시에 사용. name + attributes ex) Hashed symbol table chapter 12 참조

8 4.2 토큰 인식 Specification of token structure - RE
Specification of PL - CFG Scanner design steps 1. describe the structure of tokens in re. 2. or, directly design a transition diagram for the tokens. 3. and program a scanner according to the diagram. 4. moreover, we verify the scanner action through regular language theory. Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...

9 4.2.1 Identifier Recognition
Transition diagram Regular grammar S  lA | _A A  lA | dA | _A | ε Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*  S = (l + _)( l + d + _)*

10 4.2.2 Integer number Recognition
Form : 10진수, 8진수, 16진수로 구분되어진다. 10진수 : 0이 아닌 수 시작 8진수 : 0으로 시작, 16진수 : 0x, 0X로 시작 Transition diagram n : non-zero digit o : octal digit h : hexa digit

11 Regular grammar Regular expression C  oC | ε D  hE E  hE | ε
S  nA | 0B A  dA | ε B  oC | xD | XD | ε C  oC | ε D  hE E  hE | ε Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+ C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d* S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* o+ + 0(x + X)h+ ∴ S = nd* o+ + 0(x + X)h+ E = hE + ε = h*ε = h* D = hE = hh* = h+ C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d* S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* o+ + 0(x + X)h+ ∴ S = nd* o+ + 0(x + X)h+

12 4.2.3 Real number Recognition
Form : Fixed-point number & Floating-point number Transition diagram Regular grammar S  dA D  dE | +F | -G A  dA | .B E  dE |ε B  dC F  dE C  dC | eD |ε G  dE

13 Regular expression E = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+
D = dE + '+'F + -G = dd* + '+'d+ + -d + = d+ + '+'d+ + -d+ = (ε + '+' +-)d + C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e = d*(e(ε + '+' +-) d+ + ε) B = dC=dd*(e(ε + '+' +-)d+ +ε) = d++(e(ε + '+' +-) d+ +ε) A = dA + .B = d*.d+(e(ε + '+' +-)d+ + ε) S = dA = dd*. d+(e(ε + '+' +-) d+ +ε) = d+.d+(e(ε + '+' +-) d+ + ε) = d+.d++ d+.d+e(ε + '+' +-) d+ 참고 Terminal +를 ‘+’로 표기.

14 4.2.4 String Constant Recognition
Form : a sequence of characters between a pair of double quotes. Transition diagram where, a = char_set - {", \} and c = char_set Regular grammar S  "A A  aA | "B | \C B  ε C  cA

15 Regular expression A = aA + " B + \C = aA + " + \cA = (a + \c)A + "
S = " A = "(a + \c)*" ∴ S = "(a + \c)* "

16 4.2.5 Comment Recognition Transition diagram Regular grammar S  /A
where, a = char_set - {*} and b = char_set - {*, /}. Regular grammar S  /A A  *B B  aB | *C C  *C | bB | /D D  ε

17 A program which recognizes a comment statement. do {
Regular expression C = *C + bB + /D = **(bB + /) B = aB + ***(bB + /) = aB + ***bB + ***/ = (a + *** b)B + ***/= (a + ***b)****/ A = *B = *(a + ***b)****/  S = /A = /* (a + ***b)****/ A program which recognizes a comment statement. do { while (ch != '*') ch = getchar(); ch = getchar(); } while (ch != '/');


Download ppt "컴파일러 입문 제 4 장 어휘 분석."

Similar presentations


Ads by Google