Sung-Dong Kim, Dept. of Computer Engineering, Hansung University LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University
LEX Input: tiny.l Output: lex.yy.c or lexyy.c Procedure yylex Table-driven implementation of a DFA Similar to “getToken” RE + action Scanner (C code) Lex (2011-1) Compiler
LEX Convention (1) Metacharacters Quotes: actual characters Backslash For not metacharacters: “if”, if For metacharacters: “(” Backslash \(\* = “\*” \n, \t (aa|bb)(a|b)*c? = (“aa”|“bb”)(“a”|“b”)* “c”? (2011-1) Compiler
LEX Convention (2) [...] : any one of them Hyphen [abxz]: any one of the characters a, b, x, z (aa|bb)(ab)*c? Hyphen Ranges of characters [0-9] (2011-1) Compiler
LEX Convention (3) . ^ Represents a set of characters Any character except a newline ^ Complementary sets [^0-9abc]: any character that is not a digit and is not one of the letter a, b, c (2011-1) Compiler
LEX Convention (4) Square bracket Most of the metacharacters lose their special status [-+] == (“+”|“-”) [+-]: from “+”, all characters [.”?]: any of the three characters ., ”, ? [\^\\]: ^ or \ (2011-1) Compiler
LEX Convention (5) Curly bracket Names of regular expressions nat = [0-9]+ signedNat = (“+”|“-”)? nat nat [0-9]+ signedNat (“+”|“-”)? {nat} (2011-1) Compiler
Format of LEX Input (1) Input file = regular expression + C code Definitions Any C code that must be inserted to any function - %{…}% Names of regular expressions Rules Regular expressions + C code (action) Auxiliary routines (optional) C code + main program (if needed) (2011-1) Compiler
Format of LEX Input (2) Layout {definitions} %% {rules} {auxiliary routines} (2011-1) Compiler
Example 1: scanner that adds line numbers to text %{ /* a Lex program that adds line numbers to lines of text, printing the new text to the standard output */ #include <stdio.h> int lineno = 1; %} line .*\n %% {line} {printf(“%5d %s”,lineno++,yytext); } main() { yylex(); return 0; } (2011-1) Compiler
Example 2: prints the count of # of replacements %{ /* a Lex program that changes all numbers from decimal to hexadecimal notation, printing a summary statistic stderr */ #include <stdlib.h> #include <stdio.h> int count = 0; %} digit [0-9] number {digit}+ %% {number} { int n = atoi(yytext); printf(“%x”, n); if (n > 9) count++; } (2011-1) Compiler
fprintf(stderr, “number of replacements = %d”, count); return 0; } main() { yylex(); fprintf(stderr, “number of replacements = %d”, count); return 0; } (2011-1) Compiler
Example 3: prints all input lines that begin or end with the ‘a’ %{ /* Selects only lines that end or begin with the letter ‘a’. Deletes everything else. */ #include <stdio.h> %} ends_with_a .*a\n begins_with_a a.*\n %% {ends_with_a} ECHO; {begins_with_a} ECHO; .*\n ; main() { yylex(); return 0; } (2011-1) Compiler
Summary (1) Ambiguity resolution The principles of longest substring Substring with equal length: first-match first-serve No match: copy the next character and continue (2011-1) Compiler
Summary (2) Insertion of C Code %{ … %}: exact copy Auxiliary procedure section: exact copy at the end Any code following a RE (action): at the appropriate place in yylex (2011-1) Compiler
Lex Internal Names lex.yy.c: Lex output file name or lexyy.c yylex: Lex scanning routine yytext: String matched on current action yyin: Lex input file (default: stdin) yyout: Lex output file (default: stdout) input: Lex buffered input routine ECHO: Lex default action (print yytext to yyout) (2011-1) Compiler
LEX for TINY %{ #include “globals.h” #include “util.h” #include “scan.h” /* lexeme of identifier or reserved word */ char tokenString[MAXTOKENLEN+1]; */ digit [0-9] number {digit}+ letter [a-zA-Z] identifier {letter}+ newline \n whitespace [ \t] %% (2011-1) Compiler
“repeat” { return REPEAT; } “until” { return UNTIL; } “if” { return IF; } “then” { return THEN; } “else” { return ELSE; } “end” { return END; } “repeat” { return REPEAT; } “until” { return UNTIL; } “read” { return READ; } “write” { return WRITE; } “:=” { return ASSIGN; } “=” { return EQ; } “<” { return LT; } “+” { return PLUS; } “-” { return MINUS; } “*” { return TIMES; } “/” { return OVER; } “(” { return LPAREN; } “)” { return RPAREN; } “;” { return SEMI; } (2011-1) Compiler
{number} { return NUM; } {identifier} { return ID; } {newline} { lineno++; } {whitespace} { /* skip whitespace */ } “{” { char c; do { c = input(); if (c == ‘\n’) lineno++; } while (c != ‘}’); } . { return ERROR; } %% (2011-1) Compiler
TokenType getToken(void) { static int firstTime = TRUE; TokenType currentToken; if (firstTime) { firstTime = FALSE; lineno++; yyin = source; yyout = listing; } currentToken = yylex(); strncpy(tokenString, yytext, MAXTOKENLEN); if (TraceScan) { fprintf(listing, “\t%d: “, lineno); printToken(currentToken, tokenString); return currentToken; (2011-1) Compiler
YACC LALR(1) parser generator Yet another compiler compiler syntax spec. parser (2011-1) Compiler
YACC Basics (1) Input/output Specification file format Yacc filename.y y.tab.c ytab.c filename.tab.c {definitions} %% {rules} {auxiliary routines} (2011-1) Compiler
YACC Basics (2) Definitions Rules Auxiliary routines Information about tokens, data types, grammar rules C code output file Rules Modified BNF format C code Auxiliary routines Procedure and function declarations main() yyparse() yylex() (2011-1) Compiler
#include <stdio.h> #include <ctype.h> %} %token NUMBER %% %{ #include <stdio.h> #include <ctype.h> %} %token NUMBER %% command : exp {printf(“%d\n”,$1);} exp : exp ‘+’ term {$$ = $1 + $3;} | exp ‘-’ term {$$ = $1 - $3;} | term {$$ = $1;} ; term : term ‘*’ factor {$$ = $1 * $3;} | factor {$$ = $1;} factor : NUMBER {$$ = $1;} | ‘(’ exp ‘)’ {$$ = $2;} (2011-1) Compiler
while((c = getchar()) == ‘ ‘); /* blank 제거 */ if (isdigit(c)) { main() { return yyparse(); } int yylex(void) { int c; while((c = getchar()) == ‘ ‘); /* blank 제거 */ if (isdigit(c)) { ungetc(c,stdin); scanf(“%d”,&yylval); return(NUMBER); if (c == ‘\n’) return 0; /* 파싱 정지 */ return(c); void yyerror(char *s) { fprintf(stderr,”%s\n”,s); /* 에러메시지 출력*/ return 0; (2011-1) Compiler
YACC Options (1) -d Header file generation yacc –d filename.y y.tab.h, ytab.h, filename.tab.h Other file #include y.tab.h Call yylex() (2011-1) Compiler
YACC Options (2) -v option Verbose option yacc –d filename.y y.output (2011-1) Compiler
$accept : command_$end $end accept . error state 2 command : exp_ (1) NUMBER shift 5 ( shift 6 . error command goto 1 exp goto 2 term goto 3 factor goto 4 state 1 $accept : command_$end $end accept . error state 2 command : exp_ (1) exp : exp_+ term exp : exp_- term + shift 7 - shift 8 . reduce 1 state 3 exp : term_ (4) term : term_* factor * shift 9 . reduce 4 state 4 term : factor_ (6) . reduce 6 (2011-1) Compiler
state 5 factor : NUMBER_ (7) . reduce 7 state 6 factor : (_exp ) NUMBER shift 5 ( shift 6 . error exp goto 10 term goto 3 factor goto 4 state 7 exp : exp +_term NUMBER shift 5 ( shift 6 . error term goto 11 factor goto 4 state 8 exp : exp -_term term goto 12 (2011-1) Compiler
state 9 term : term *_factor NUMBER shift 5 ( shift 6 . error factor goto 13 state 10 exp : exp_+ term exp : exp_- term factor : ( exp_) + shift 7 - shift 8 ) shift 14 state 11 exp : exp + term_ (2) term : term_* factor * shift 9 . reduce 2 state 12 exp : exp – term_ (3) . reduce 3 state 13 term : term * factor_ (5) . reduce 5 (2011-1) Compiler
8/127 terminals, 4/600 nonterminals state 14 factor : ( exp )_ (8) . reduce 8 8/127 terminals, 4/600 nonterminals 9/300 grammar rules, 15/1000 states 0 shift/reduce, 0 reduce/reduce conflicts reported 9/601 working sets used memory: states, etc. 36/2000, parser 11/4000 9/601 distinct lookahead sets 6 extra closures 18 shift entries, 1 exceptions 8 goto entries 4 entries saved by goto default Optimizer space used: input 50/2000, output 218/4000 218 table entries, 202 zero maximum spread: 257, maximum offset: 43 (2011-1) Compiler