Structure of a YACC File Has the same three-part structure as Lex Each part is separated by a % symbol The three parts are even identical: – definition section – rules section – code section (copied directly into the generated program)
Definiton Section Declare tokens used in the grammar and types of values used on the stack here Tokens that are single quoted characters like “=“ or “+” need not be declared. Literal C code can be included in a block in this section using %{…%}
Declaring Tokens The tokens that are used in the grammar must be declared Include lines like the one below in the definition section: %token CHARSTRING INT IDENTIFIER %token LPAREN RPAREN
The Rules Section The rules of the grammar are placed here. Here is an example of the basic syntax: Expr INTEGER + INTEGER | INTEGER - INTEGER expr : INTEGER + INTEGER {action} | INTEGER – INTEGER {action} ; YACC grammar definition
YACC Actions Simiar to Lex, actions can be defined that will be performed whenever a production is applied in the stream of tokens. These are usually included after the production whose action is to be defined. Since every symbol in the grammar has a corresponding value, it will be necessary to access those values. Accessing the YACC stack will be the way to do this.
Accessing the Stack Since YACC generates an LR parser, it will push the symbols that it reads along with their values on a stack until it is ready to reduce. To access these values, include a dollar sign with a number to get at each value in the production in the action definition.
Accessing the Stack expr : INTEGER + INTEGER {$$ = $1 + $3} | INTEGER – INTEGER {$$ = $1 - $3} ; Refers to the value of the left nonterminal
Where do Tokens and Their Values Come From? Typically from the lexer. yyparseyylex YACC LEX
Revisiting Lex The Lex file will have to be modified to work with the YACC parser in two main places. In the definition section, include this statement: #include “y.tab.h” That is a header file automatically created by YACC when the parser is generated. The actions for the rules need to be changed too.
Revisiting Lex Actions For tokens with a value, assign that value to yylval. YACC can read the value from that variable. Include a return statement for the token name (this is the same name that is defined at the top of the YACC file). if {return IF;} [1-9][0-9]* {yylval = atoi(yytext); return INTEGER;}
The %union Declaration Different tokens have different data types. INTEGER are integers, FLOAT are floats, CHARACTERSTRING are char *, IDENTIFIER are pointers to the entry in the symbol table for that identifier. The %union will allow the parser to apply the right data type to the right token.
The %union Declaration %union { int intValue; float floatValue; } %token INTEGER %token FLOAT YACC Definition Section … {yylval.intValue = atoi(yytext); return INTEGER;} … {yylval.floatValue = atof(yytext); return FLOAT;} Lex Rules Section
References That Might Be Useful Levine J R, Manson T, Brown D, “Lex & Yacc”, (2Ed, O'Reilly, 1992) Stephen C. Johnson, “Yacc: Yet Another Compiler-Compiler”, er.htm er.htm Bert Hubert, “Lex and YACC primer/HOWTO”, HOWTO.html#toc6 HOWTO.html#toc6