Parsers CSE2303 Formal Methods I Lecture 12
Overview Recursive Descent Parsers LR Parsers Bison
Parsers A parser for a grammar is a program. –Input is a string. –Decides whether the input can be generated by the grammar. Two main types –top-down parsers –bottom-up parsers
Difficulties with Top-down Parsers Left Recursive Grammars i.e A … Aw Error Handling Backtracking –Allocating/Deallocating resources –Undoing actions
LR Parser Bottom-up Parser Scan input Left to Right Construct the Rightmost derivation in reverse Implemented using –A Finite Automaton and a Stack.
Pros and Cons Benefits –Can construct a LR parser to recognise most CFGs –The parsers are efficient –Detect syntactical errors as soon as possible Disadvantages –Can’t build LR parser for every CFG –Need a Parser Generator
Bison (yacc) Parser generator –It writes a LR parser Assumption –Grammar is an LALR grammar Needs –set of production rules –An action for each rule Produces –A C program
main() { yyparse(); } int yyparse() { yychar = yylex(); } Parse input int yylex() { return …; } Get next token Return an int representing next token int yyerror() { return 0; } If input cannot be parsed Input
Process Write a bison program and a flex program E.g. example.y and example.l Run them through bison and flex bison -d example.y flex example.l Compile the program with the flag -lfl example.y example.l Contains yyparse() Contains definitions lex.yy.c flex bison Contains yylex()
A Bison Program … definitions … % … rules … % … subroutines … bison does not handle carriage returns
Sections … definition section –Code between %{ … %} copied. –Definitions used to define tokens, types, etc. … rule section –Pairs of production rules and actions. –The productions rules are from a CFG. … subroutine section –Consists of users subroutines. –Copied after the end of the bison generated code –must contain yyerror()
%. {return yytext[0];} \n {return 0;} % simple.l int main() { yyparse(); } simple.c % S: B B {printf(“S -> BB\n”);} ; B: ‘a’ B {printf(“B -> aB\n”);} | ‘b’ {printf(“B -> b\n”);} ; % int yyerror(char* s) { printf(“%s\n”, s); return 0; } simple.y gcc –o simple simple.c lex.yy.c -lfl
Evaluation of 4+2*3 S E E T | T + E T F | F * T F INT yylval = 4 S E INT +TE F*T T F F yylval = 2 yylval = 3 =3 =6 =3 =2 =4 =6 =4 =10
%{ #include “” %} % -?[0-9]+ { yylval = atoi(yytext); return INT; } [ \t]. {return yytext[0];} \n {return 0;} % plusTimes.l %token INT % S: E {printf(“%d\n”, $1);} ; E: T {$$ = $1;} | T ‘+’ E {$$ = $1 + $3;} ; T: F {$$ = $1;} | F ‘*’ T {$$ = $1 * $3;} ; F: INT {$$ = $1;} ; % … plusTimes.y
%{ #include “” %} Real -?([0-9]+|([0-9]*\.[0-9]+)) % {Real} { yylval.dval = atof(yytext); return REAL; } [ \t]. {return yytext[0];} \n {return 0;} % RealPlusTimes.l %union{double dval;} %token REAL %type E T F % S: E {printf(“%g\n”, $1);} ; E: T {$$ = $1;} | T ‘+’ E {$$ = $1 + $3;} ; T: F {$$ = $1;} | F ‘*’ T {$$ = $1 * $3;} ; F: REAL {$$ = $1;} ; % … RealPlusTimes.y
More Information Check the courseware web site. Man pages –login and type: xman bison Library –“lex & yacc”, by John Levine et al. –“Principles of Compiler Design”, by A.V. Aho and J.D. Ullman –“The Unix Programming Environment”, by Kernighan & Pike. (see chapter 8)
Preparation Read –Chapter in the Text Book.