Syntax error handling –Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime: reference a NULL pointer –Goals of error-handling in a parser To detect and report the presence of errors To recover from an error and detect subsequent errors To not slow down the processing of correct programs
Error recovery strategies Panic mode recovery –On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing token is found. Phrase-level recovery –On discovering an error, perform a local fix to allow the parser to continue.
Error recovery in predictive parsing –Recovery in a non-recursive predictive parser is easier than in a recursive descent parser. –Panic mode recovery If a terminal on stack, pop the terminal. If a non-terminal on stack, shift the input until the terminal can expand. –Phrase-level recovery Carefully filling in the blank entries about what to do.
–Error recover in LR parsing Canonical LR parsers never make extra reductions when recognizing an error. SLR and LALR may make extra reductions, but will never shift an erroneous input symbol on the stack. Panic mode recovery –Scan down stack until a state representing a major program construct is found. Input symbols are discarded until one is found that is in the follow of the nonterminal. Trying to isolate the phrase containing the error. Phrase level recovery –Implement an error recovery routine for each error entry in the table.
–Writing a parser with YACC (Yet Another Compiler Compiler). Generates LALR parsers Work with lex. YACC calls yylex to get next token. –YACC and lex must agree on the values for each token. Produce y.tab.c file by “yacc yaccfile”, which contains a routine yyparse(). yyparse() returns 0 if the program is ok, non-zero otherwise YACC file format: declarations % translation rules % supporting C-routines
The declarations part specifies tokens, non-terminals symbols, other C constructs. –To specify token AAA BBB %token AAA BBB –To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token %token EOFnumber 0 %token SEMInumber 101 –Non-terminals do not need to be declared unless you want to associated it with a type (will be discussed later).
Translations rules specify the grammar productions exp : exp PLUSnumber exp | exp MINUSnumber exp | exp TIMESnumber exp | exp DIVIDEnumber exp | LPARENnumber exp RPARENnumber | ICONSTnumber ; exp : exp PLUSnumber exp ; exp : exp MINUSnumber exp ;
Yacc environment –Yacc processes the specification file and produce a y.tab.c file. –An integer function yyparse() is produced by Yacc. Calls yylex() to get tokens. Return non-zero when an error is found. Return 0 if the program is accepted. –Need main() and and yyerror() functions. –Example: yyerror(str) char *str; { printf("yyerror: %s at line %d\n", str, yyline); } main() { if (!yyparse()) {printf("accept\n");} else printf("reject\n"); }
–YACC builds a LALR parser for the grammar. May have shift/reduce and reduce/reduce conflicts if there are problems with the grammar. Default conflict resolution: –shift/reduce --> shift –reduce/reduce --> first production in the state –should always avoid reduce/reduce conflicts ‘yacc -v *.y’ will generate a report in file ‘y.output’. See example1.y The programmer MUST resolve all conflicts (unless you really know what you are doing). –modify the grammar. See example2.y –Use precedence and associativity of operators.
Use precedence and associativity of operators. –Using keywords %left, %right, %nonassoc in the declarations section. All tokens on the same line are the same precedence level and associativity. The lines are listed in order of increasing precedence. %left PLUSnumber, MINUSnumber %left TIMESnumber, DIVIDEnumber –See example3.y
Symbol attributes –Each symbol can be associated with some attributes. Data structure of the attributes can be specified in the union in the declarations. (see example4.y). %union { int semantic_value; } %token ICONSTnumber 119 %type exp %type term %type item Semantic actions associate with productions can be specified
Semantic actions –Semantic actions associate with productions can be specified. item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ; $$ is the attribute associated with the left handside of the production $1 is the attribute associated with the first symbol in the right handside, $2 for the second symbol, … –An action can be in anyway in the production, it is also counted as a symbol. –Checkout example5.y for examples with multiple types associated with different symbol.