Syntax Specification with YACC © Allan C. Milne Abertay University v
Agenda. What is Yacc/Bison? Yacc/Lex Example. The Parser. Parsing Approaches.
What Is Yacc? Yet Another Compiler Compiler. A parser generator for LALR(1) grammars. The generated parser also includes associated semantic actions. Yacc generates C code as the implementation of the parser/semantic processor.
What Is Bison? An implementation of a YACC– compatible parser generator. Includes extensions and C++ support. Distributed as open software by Gnu.
Expressing A Grammar In Yacc. Yacc uses a variation on the BNF meta-language. For a full description of the Yacc grammar see chapter 3 of the Bison manual. Using this approach normally requires two specifications: –The BNF syntactic structure defined as a Yacc program; and –The lexical token structure defined as a Lex program.
The Yacc Specification. %token tANT tBAT tCAT tDOG % AnimalList : ‘(‘ ‘)’ | ‘(‘ Animal ‘)’ | '(' MoreAnimals ‘,’ Animal ')' ; MoreAnimals : Animal | MoreAnimals ',' Animal ; Animal : tANT | tBAT | tCAT | tDOG ; %
Terminal Tokens Represented By … A C-style character literal. A C-style string literal: –In Bison only, not available in Yacc; –May be aliased using %token LE “<=”. An identifier: –ERROR is reserved for error recovery; –Convention is to use uppercase; prefix with ‘t’ to clarify its meaning; –Must be defined in %token statement.
The Lex Specification. %{ #include "AnimalList.tab.h" void yyerror (char*); %} % ant return tANT; bat return tBAT; cat return tCAT; dog return tDOG; [,()] return *yytext; [ \t\n] ;. yyerror ("Invalid character found."); % int yywrap () { return 1; }
Note The Differences … Lex is used here as a scanner rather than a self- contained tool. The ‘#include’ exposes the ‘%token’ symbols defined as integers. No main() method; the yylex() method is called repeatedly by the Yacc parser. Return an integer representing the token found; ASCII codes reserved for corresponding characters.
The Parsing Phase. This phase of a translator is primarily concerned with determining the syntactic structure of an input program. It also often acts as the controlling phase of lexical analysis and semantic processing.
Parsing Approaches. Parsing a program is finding a derivation sequence for the program. There are two general approaches to this: –Top-down; start from the distinguished symbol and find substitutions that result in the target program. –Bottom-up; start with the input program and try to reduce this back to the distinguished symbol. This latter is the approach used by Yacc.