Yacc YACC BNF grammar example.y Other modules example.tab.c Executable C compiler +linker Executable Other modules
Yacc: what is it? Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file). The grammars accepted are LALR(1) grammars with disambiguating rules. A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.
Structure of Yacc Usually Lex/Yacc work together yylex(): to get the next token To call the parser, the function yyparse()is invoked
How the parser works The parser produced by Yacc consists of a finite state machine with a stack A move of the parser is done as follows: Calls to yylex to obtain the next token when needed Using the current state, and the lookahead token, the parser decides on its next action (shift, reduce, accept or error) and carries it out
Skeleton of a yacc specification (.y file) {declarations} %% {rules} {user code} Rules: <production> action Grammar type 2 productions Action: C code that specifies what to do when a production is reduced
Skeleton of a yacc specification (.y file) %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [PRODUCTION RULES SECTION] < C auxiliary subroutines> This part will be embedded into *.c contains token declarations. Tokens are recognized in lexer. define how to “understand” the input language, and what actions to take for each “sentence”. any user code. For example, a main function to call the parser function yyparse()
Structure of yacc file Definition section Rules section User code declarations of tokens type of values used on parser stack Rules section list of grammar rules with semantic routines User code
The declaration section Terminal and non terminals %token symbol %type symbol Operator precedence and operator associability %noassoc symbol %left symbolo %right symbol Axiom %start symbol
The declaration section: terminals They are returned by the yylex()function which is called be the yyparse() They become #define in the generated file They are numbered starting from 257. But a concrete number can be associated with a token %token T_Key 345 Terminals that consist of a single character can be directly used (they are implicit). The corresponding tokens have values <257
The declaration section:examples expressions.y %{ #include <stdio.h> %} %token NUMBER, PLUS, MINUS, MUL, DIV, L_PAR, R_PAR %start expr …
The declaration section:examples patterns.l %{ #include "expressions_tab.h" %} digit [0-9] %% [ \t]+ ; {digit}+ {yylval=atoi(yytext); return NUMBER;} "+" return PLUS; "-" return MINUS; "*" return MUL; "/" return DIV; "(" return L_PAR; ")" return R_PAR; . {printf("token erroneous\n");}
The declaration section:examples . . . %token NUMBER, +, -, *, /, (, ) YACC: . . . digit [0-9] %% [ \t]+ ; {digit}+ {yylval=atoi(yytext); return NUMBER;} "+" return ’+’; "-" return ’-’; "*" return ’*’; "/" return ’/’; "(" return ’(’; ")" return ’)’; Lex:
Flex/Yacc communication file.l header file.y lex file.l yacc -d file.y lex.yy.c file.tab.h file.tab.c cc lex.yy.c -c cc file.tab.c -c lex.yy.o file.tab.o gcc lex.yy.o file.tab.o -o calc calc
Lex/Yacc: lex file Generated by Yacc no main() %{ #include "expressions.tab.h" %} digit [0-9] %option noyywrap %% [ \t]+ ; {digito}+ {yylval=atoi(yytext); /*printf("lex: %s, %d\n ",yytext, yylval);*/ return NUMERO;} "+" return PLUS; "-" return MINUS; . {printf("token erroneous\n");} Generated by Yacc no main()
Flex/Yacc communication expressions.tab.h #ifndef YYSTYPE #define YYSTYPE int #endif #define NUMBER 258 #define PLUS 259 #define MINUS 260 #define MUL 261 #define DIV 262 #define L_PAR 263 #define R_PAR 264
The Production Rules Section %% production : symbol1 symbol2 … { action } | symbol3 symbol4 … { action } | … production: symbol1 symbol2 { action }
Semantic values %% statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | NUMBER { $$ = $1; } statement expression number + - 2 3 4 5 According these two productions, 5 + 4 – 3 + 2 is parsed into:
Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM
$1 Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM
$2 Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM $2
$3 Defining Values Default: $$ = $1; expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM $3 Default: $$ = $1;
The declaration section Support for arbitrary value types %union{ int intval; char *str; }
The declaration section Use of union terminal declaration %token <intval> NATURAL non terminal declaration %type <type> NO_TERMINAL in productions expr: NAT ´+´ NAT {$$=$<intval>1+$<intval>3}; In the lex file [-+]?{digit}+ { yyval.intval=atoi(yytext); return INTEGER;}
Ambiguity By default yacc does the following: s/r: chooses reduce over shift r/r: reduce the production that appears first Better to solve the conflicts by setting precedence
Error recovery Yacc detects errors To inform of errors a function needs to be implemented int yyerror (char *s) {fprintf (stderr, “%s”,s)}; Panic mode recovery E: IF ´(´ cond ´)´ | IF ´(´ error ´)´ {yyerror(“condition missing”);
Error recovery After detecting an error, the parser will scan ahead looking for three legal tokens. yyerrork resets the parser to its normal mode yyclearin allows the token that caused the error to be discarded