Download presentation
Presentation is loading. Please wait.
Published byVivien Farmer Modified over 8 years ago
1
Compiler Chapter 4. Lexical Analysis Dept. of Computer Engineering, Hansung University, Sung-Dong Kim
2
1. Introduction (1) Lexical analysis —Read source program and identify the smallest grammatical units Lexical analyzer = Scanner = Lexer (2011-1) Compiler2 token Lexical Analyzer Source programToken stream
3
1. Introduction (2) Token —Recognized by FA —Special form: reserved word —General form: identifier, constant, … (2011-1) Compiler3
4
4 Special Form: language designer 1.Keyword --- begin, end, for, if,... 2.Operator symbols --- +, -, *, /, <, := etc. 3.Delimiters --- ;,,, (, ), [, ] etc. General Form: programmer 1.Identifier --- stk, ptr, sum,... 2.Constant --- 526, 3.0, 0.1234e-10, 'string' etc.
5
1. Introduction (3) Terminology —Token: smallest grammatical unit, terminal symbol —Token number: integer number for token, efficient string processing —Token value: string value (ID) or numerical value (constant) (2011-1) Compiler5
6
1. Introduction (4) Example Token Structure - represented by regular expression. ex) id = l ( l + d )* (2011-1) Compiler6 IF A > 10 THEN... Token Number : 29 1 20 2 34 Token Value : 0 'A' 0 10 0
7
(2011-1) Compiler7 a = b + 3 ; Token Number : 4 23 4 11 5 20 Token Value : a 0 b 0 3 0 Lexical Analyzer Parser a = b + 3; (4, a) (23, 0) (4, b) (11, 0) (5, 3) (20, 0) (4, 10) (4, 20)
8
1. Introduction (5) Symbol table management —Token, token value, attributes of the IDs —Used in Lexical analysis, syntactic analysis, semantic analysis (2011-1) Compiler8
9
1. Introduction (6) —Lexical analysis steps –Token recognition: insert to the symbol table –Give index of the symbol table to the parser (2011-1) Compiler9 SymbolAttribute … a b … … integer var 1 2 … … 10 20
10
1. Introduction (7) Etc —Line number in the source program —Blank and comment processing Relationship with syntactic analyzer (2011-1) Compiler10 Lexical analyzerParser Input program Get token Token
11
2. Token recognition (1) Scanner design steps 1. Describe the structure of tokens in re. 2. or, directly design a transition diagram for the tokens. 3. Program a scanner according to the diagram. 4. Verify the scanner action through regular language theory. (2011-1) Compiler11
12
2. Token recognition (2) Character classification —letter : a | b | c... | z | A | B | C |…| Z l —digit : 0 | 1 | 2... | 9 d —special character : + | - | * | / |. |, |... (2011-1) Compiler12
13
2.1 ID recognition (1) State transition diagram for ID recognition (2011-1) Compiler13 S A 1,d,_ 1,_ start
14
2.1 ID recognition (2) Conversion to regular expression —Regular grammar —Regular expression (2011-1) Compiler14 S lA |_A A lA | dA | _A | ε S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l+d+_)A + ε = (l+d+_) * S = (l+_)(l+d+_) *
15
2.2 Integer recognition (1) Integer format —Decimal number, octal number, hexadecimal number —Repetitive numbers (2011-1) Compiler15
16
2.2 Integer recognition (2) State transition diagram for integer recognition (2011-1) Compiler16 d n S start A B 0 C o o D x, X E h h
17
2.3 Real number recognition (1) Real number format —Fixed-point —Float-point: exponent part (2011-1) Compiler17
18
2.3 Real number recognition (2) State transition diagram for real number recognition (2011-1) Compiler18 d start e d d. D E d F G d d + - C SB d A d
19
(2011-1) Compiler19 E = dE + ε= d* F = dE = dd* = d + G = dE = dd* = d + D = dE + '+’F + -G = dd* + '+'d + + -d + = d + + '+'d + + -d + = (ε + '+' + - )d + C = dC + eD + ε = dC + e (ε+ '+' + - )d + + ε = d * (e (ε+ '+' + - )d + + ε) B = dC = d d * (e (ε+ '+' + - )d + + ε) = d + (e (ε+ '+' + - )d + + ε) A = dA +.B = d*.B = d*. d + (e (ε+ '+' + - )d + + ε) S = dA = dd*. d + (e (ε+ '+' + - )d + + ε) = d +. d + (e (ε+ '+' + - )d + + ε) = d +. d + + d +. d + e (ε+ '+' + - )d + Regular Expression S dA A dA |.B B dC C dC | eD | ε D dE | +F | -G E dE | ε F dE G dE Regular Grammar
20
2.4 String recognition (1) String constant —Characters enclosed by “ ” —Example: “This is a string”, “double quote is \” character.” (2011-1) Compiler20
21
2.4 String recognition (2) State transition diagram for string constant recognition —a = char_set -{“, \} —c = other character (2011-1) Compiler21 a B SA “ start “ C c \
22
2.4 String recognition (3) Regular grammar Regular expression (2011-1) Compiler22 S “A A aA | “B | \C B ε C cA A = aA + “B + \C S = “A = aA + “ + \cA = “(a + \c)*” = (a + \c)A + “ = (a + \c)*”
23
2.5 Comment processing (1) Comment —Express comment between /* and */ State transition diagram for comment recognition —a = char_set - {*} and b = char_set - {*, /} (2011-1) Compiler23 a * / D SB start AC * b * /
24
2.5 Comment processing (2) Regular grammar Regular expression (2011-1) Compiler24 S /A A *B B aB | *C C *C | bB | /D D ε C = *C + bB + /D = * * (bB + /) B = aB + *C = aB + ** * (bB + /) = aB + ** * bB + ** */ = (a + ** * b)B + ** */ = (a + ** * b) * ** * / A = *B = *(a + ** * b) * ** * / S = /A = /* (a + ** * b) * * * */
25
3. Lexical analyzer implementation (1) Implementation steps —Regular expressions —NFA —DFA —State minimization —Programming (2011-1) Compiler25
26
3. Lexical analyzer implementation (2) Implementation —Determine token structure from grammar representation —Token recognition program –Programming language –Lexical analyzer generator (2011-1) Compiler26
27
3. Lexical analyzer implementation (3) Lexical Analyzer for mini C (Appendix A) —Special symbol: 30 —Word symbol: 7 (2011-1) Compiler27 ! != % %= && ( ) * *= + ++ +=, - -- -= / /= ; >= [ ] { || } const else if int return void while
28
3. Lexical analyzer implementation (4) —State transition diagram: p. 143, 144 —Lexical analysis program: p. 145~ p. 148 (2011-1) Compiler28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.