Download presentation
Presentation is loading. Please wait.
Published byMarshall Kelley Modified over 8 years ago
1
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
2
LEX Input: tiny.l Output: lex.yy.c or lexyy.c Procedure yylex Table-driven implementation of a DFA Similar to “getToken” (2010-1) Compiler2 Lex Scanner (C code) RE + action
3
LEX Convention (1) Metacharacters Quotes: actual characters For not metacharacters: “if”, if For metacharacters: “(” Backslash \(\* = “\*” \n, \t (aa|bb)(a|b)*c? = (“aa”|“bb”)(“a”|“b”)* “c”? (2010-1) Compiler3
4
LEX Convention (2) [...] : any one of them [abxz]: any one of the characters a, b, x, z (aa|bb)(ab)*c? Hyphen Ranges of characters [0-9] 4(2010-1) Compiler
5
LEX Convention (3). Represents a set of characters Any character except a newline ^ Complementary sets [^0-9abc]: any character that is not a digit and is not one of the letter a, b, c (2010-1) Compiler5
6
LEX Convention (4) Square bracket Most of the metacharacters lose their special status [-+] == (“+”|“-”) [+-]: from “+”, all characters [.”?]: any of the three characters., ”, ? [\^\\]: ^ or \ (2010-1) Compiler6
7
LEX Convention (5) Curly bracket Names of regular expressions (2010-1) Compiler7 nat = [0-9]+ signedNat = (“+”|“-”)? nat nat [0-9]+ signedNat (“+”|“-”)? {nat}
8
Format of LEX Input (1) Input file = regular expression + C code Definitions Any C code that must be inserted to any function - %{…}% Names of regular expressions Rules Regular expressions + C code (action) Auxiliary routines (optional) C code + main program (if needed) (2010-1) Compiler8
9
Format of LEX Input (2) Layout (2010-1) Compiler9 {definitions} % {rules} % {auxiliary routines}
10
(2010-1) Compiler10 Example 1: scanner that adds line numbers to text %{ /* a Lex program that adds line numbers to lines of text, printing the new text to the standard output */ #include int lineno = 1; %} line.*\n % {line} {printf(“%5d %s”,lineno++,yytext); } % main() { yylex(); return 0; }
11
(2010-1) Compiler11 %{ /* a Lex program that changes all numbers from decimal to hexadecimal notation, printing a summary statistic stderr */ #include int count = 0; %} digit [0-9] number {digit}+ % {number} { int n = atoi(yytext); printf(“%x”, n); if (n > 9) count++; } %
12
main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } 12(2010-1) Compiler
13
13 %{ /* Selects only lines that end or begin with the letter ‘a’. Deletes everything else. */ #include %} ends_with_a.*a\n begins_with_a a.*\n % {ends_with_a} ECHO; {begins_with_a} ECHO;.*\n ; % main() { yylex(); return 0; }
14
Summary (1) Ambiguity resolution The principles of longest substring Substring with equal length: first-match first-serve No match: copy the next character and continue (2010-1) Compiler14
15
Summary (2) Insertion of C Code %{ … %}: exact copy Auxiliary procedure section: exact copy at the end Any code following a RE (action): at the appropriate place in yylex (2010-1) Compiler15
16
Lex Internal Names lex.yy.c: Lex output file name or lexyy.c yylex: Lex scanning routine yytext: String matched on current action yyin: Lex input file (default: stdin) yyout: Lex output file (default: stdout) input: Lex buffered input routine ECHO: Lex default action (print yytext to yyout) (2010-1) Compiler16
17
%{ #include “globals.h” #include “util.h” #include “scan.h” /* lexeme of identifier or reserved word */ char tokenString[MAXTOKENLEN+1]; */ digit[0-9] number{digit}+ letter[a-zA-Z] identifier{letter}+ newline\n whitespace[ \t] % LEX for TINY (2010-1) Compiler17
18
“if”{ return IF; } “then”{ return THEN; } “else”{ return ELSE; } “end”{ return END; } “repeat”{ return REPEAT; } “until”{ return UNTIL; } “read”{ return READ; } “write”{ return WRITE; } “:=”{ return ASSIGN; } “=”{ return EQ; } “<”{ return LT; } “+”{ return PLUS; } “-”{ return MINUS; } “*”{ return TIMES; } “/”{ return OVER; } “(”{ return LPAREN; } “)”{ return RPAREN; } “;”{ return SEMI; } (2010-1) Compiler18
19
(2010-1) Compiler19 {number}{ return NUM; } {identifier}{ return ID; } {newline}{ lineno++; } {whitespace}{ /* skip whitespace */ } “{”{ char c; do { c = input(); if (c == ‘\n’) lineno++; } while (c != ‘}’); }.{ return ERROR; } %
20
(2010-1) Compiler20 TokenType getToken(void) {static int firstTime = TRUE; TokenType currentToken; if (firstTime) { firstTime = FALSE; lineno++; yyin = source; yyout = listing; } currentToken = yylex(); strncpy(tokenString, yytext, MAXTOKENLEN); if (TraceScan) { fprintf(listing, “\t%d: “, lineno); printToken(currentToken, tokenString); } return currentToken; }
21
YACC LALR(1) parser generator Yet another compiler compiler (2010-1) Compiler21 Parser Generator synta x spec. parser
22
YACC Basics (1) Input/output Specification file format (2010-1) Compiler22 Yacc filename.y y.tab.c ytab.c filename.tab.c {definitions} % {rules} % {auxiliary routines}
23
YACC Basics (2) Definitions Information about tokens, data types, grammar rules C code output file Rules Modified BNF format C code Auxiliary routines Procedure and function declarations main() yyparse() yylex() (2010-1) Compiler23
24
(2010-1) Compiler24 %{ #include %} %token NUMBER % command : exp {printf(“%d\n”,$1);} exp : exp ‘+’ term {$$ = $1 + $3;} | exp ‘-’ term {$$ = $1 - $3;} | term {$$ = $1;} ; term : term ‘*’ factor {$$ = $1 * $3;} | factor {$$ = $1;} ; factor : NUMBER {$$ = $1;} | ‘(’ exp ‘)’ {$$ = $2;} ; %
25
(2010-1) Compiler25 main() { return yyparse(); } int yylex(void) { int c; while((c = getchar()) == ‘ ‘); /* blank 제거 */ if (isdigit(c)) { ungetc(c,stdin); scanf(“%d”,&yylval); return(NUMBER); } if (c == ‘\n’) return 0; /* 파싱 정지 */ return(c); } void yyerror(char *s) { fprintf(stderr,”%s\n”,s); /* 에러메시지 출력 */ return 0; }
26
YACC Options (1) -d Header file generation yacc –d filename.y y.tab.h, ytab.h, filename.tab.h Other file #include y.tab.h Call yylex() (2010-1) Compiler26
27
YACC Options (2) -v option Verbose option yacc –d filename.y y.output (2010-1) Compiler27
28
(2010-1) Compiler28 state 0 $accept : command $end NUMBER shift 5 ( shift 6. error command goto 1 exp goto 2 term goto 3 factor goto 4 state 1 $accept : command_$end $end accept. error state 2 command : exp_ (1) exp : exp_+ term exp : exp_- term + shift 7 - shift 8. reduce 1 state 3 exp : term_ (4) term : term_* factor * shift 9. reduce 4 state 4 term : factor_ (6). reduce 6
29
(2010-1) Compiler29 state 7 exp : exp +_term NUMBER shift 5 ( shift 6. error term goto 11 factor goto 4 state 8 exp : exp -_term NUMBER shift 5 ( shift 6. error term goto 12 factor goto 4 state 5 factor : NUMBER_ (7). reduce 7 state 6 factor : (_exp ) NUMBER shift 5 ( shift 6. error exp goto 10 term goto 3 factor goto 4
30
(2010-1) Compiler30 state 11 exp : exp + term_ (2) term : term_* factor * shift 9. reduce 2 state 12 exp : exp – term_ (3) term : term_* factor * shift 9. reduce 3 state 13 term : term * factor_ (5). reduce 5 state 9 term : term *_factor NUMBER shift 5 ( shift 6. error factor goto 13 state 10 exp : exp_+ term exp : exp_- term factor : ( exp_) + shift 7 - shift 8 ) shift 14. error
31
(2010-1) Compiler31 state 14 factor : ( exp )_ (8). reduce 8 8/127 terminals, 4/600 nonterminals 9/300 grammar rules, 15/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 9/601 working sets used memory: states, etc. 36/2000, parser 11/4000 9/601 distinct lookahead sets 6 extra closures 18 shift entries, 1 exceptions 8 goto entries 4 entries saved by goto default Optimizer space used: input 50/2000, output 218/4000 218 table entries, 202 zero maximum spread: 257, maximum offset: 43
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.