Download presentation
Presentation is loading. Please wait.
1
CUP: An LALR Parser Generator for Java
Lecture 7 CUP: An LALR Parser Generator for Java
2
1. Introduction and Example
CUP: a system for generating LALR parsers from simple specifications. like yacc, but written in Java, uses specifications including embedded Java code, and produces parser code in Java.
3
Grammar File Format in summary
preamble declarations to specify how the parser is to be generated, and supply parts of the runtime code. package_decl import_decl user_code parser_code init_code scan_code Terminal and Nonterminal declarations (name and type) precedence and associativity of terminals grammar rules
4
An example CUP Grammar (1. preamble)
/* CUP specification for a simple expression evaluator (no actions) */ import java_cup.runtime.*; // Preliminaries to set up and use the scanner. init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :};
5
2. Declaration of terminals and nonterminals
/* Terminals (tokens returned by the scanner). */ terminal PLUS, MINUS, TIMES, DIVIDE, MOD; terminal UMINUS, LPAREN, RPAREN; SEMI, terminal Integer NUMBER; /* Non terminals */ non terminal expr_list, expr_part; nonterminal Integer expr, term, factor; // no type ==> no value associated with the symbol
6
3. Precedences and association
precedence left PLUS, MINUS; precedence left TIMES, DIVIDE, MOD; precedence left UMINUS; // Rules: Terminals appearing at the same line has the same precedence. A < B iff the line A appears is above the line that B occurs possible associativity: left, right and nonassoc.
7
4. The production rules /* The grammar without actions*/
expr_list ::= expr_list expr_part | expr_part; expr_part ::= expr SEMI; expr ::= expr PLUS expr | expr MINUS expr | expr TIMES expr | expr DIVIDE expr | expr MOD expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;
8
How to use CUP prepare a grammar file (say, parser.cup) invoke:
java java_cup.Main <parser.cup or java java_cup.Main parser.cup Then two files produced: sym.java // contains constant decls, one for each terminal (and nonterminal); used by scanner to refer to terminals. parser.java // implement the parser; containing two classes.
9
Grammar rules with action codes
/* The grammar */ expr_list ::= expr_list expr_part | expr_part; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI ;
10
Grammar rules with action codes
expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 TIMES expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIVIDE expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | expr:e1 MOD expr:e2 {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} | NUMBER:n {: RESULT = n; :} | MINUS expr:e {: RESULT = new Integer(0 - e.intValue()); :} %prec UMINUS | LPAREN expr:e RPAREN {: RESULT = e; :} ;
11
Variables available on the action code
RESULT : bound to the value of lhs node D : name assigned to each symbol on the rhs. Dleft, Dright : int variables for left and right position of the phrase in the input stream. Example: expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer( e1.intValue() + e2.intValue()); :} // here e1left and e1rigth are both usable.
12
Relationship for related CUP classes
MyScanner sym Scanner $CUP$Parser$action Symbol lr_parser myParser myPackage (generated by CUP) java_cup.runtime
13
Typical client code /* create a parsing object */
parser parser_obj = new parser(new MyScanner( new FileReader(“myFileName”)); /* open input files, etc. here */ Symbol parse_tree = null; try { if (do_debug_parse) parse_tree = parser_obj.debug_parse(); else parse_tree = parser_obj.parse(); } catch (Exception e) { /* do cleanup here - - possibly rethrow e */ } finally { /* do close out here */ }
14
The java_cup.runtime.Symbol class
public class Symbol { public Symbol(int id, int l, int r, Object o) public Symbol(int id, Object o) public Symbol(int id, int l, int r) … public int sym; // kind of Symbol public int parse_state; // used while staying in stack boolean used_by_parser = false; public int left, right; // left right position in input stream public Object value; // filed for storing semantic value. public String toString(); }
15
The scanner interface package java_cup.runtime;
public interface Scanner { /** Return the next token, or <code>null</code> on end-of-file. */ public Symbol next_token() throws java.lang.Exception; }
16
java_cup.runtime.lr_parser
public abstract class lr_parser { public lr_parser() { } public lr_parser(Scanner s) { this(); setScanner(s) } private Scanner _scanner; public void setScanner(Scanner s) { _scanner = s; } public Scanner getScanner() { return _scanner; } public void user_init() throws java.lang.Exception {} // { %initWithCode } in subclass
17
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token(); return (sym!=null) ? sym : new Symbol(EOF_sym()); } public void report_fatal_error(String message, Object info) throws java.lang.Exception … public Symbol parse() throws java.lang.Exception
18
actual code of parse() public Symbol parse() throws java.lang.Exception { /* the current action code */ int act; // the Symbol/stack element returned by a reduce Symbol lhs_sym = null; /* information about production being reduced with */ short handle_size, lhs_sym_num; /* set up direct reference to tables to drive the parser */ production_tab = production_table(); action_tab = action_table(); reduce_tab = reduce_table();
19
/* initialize the action encapsulation object */
init_actions(); // { $generatedByCUP } /* do user initialization */ user_init(); // { $initWithCode } /* get the first token */ cur_token = scan(); /* push dummy Symbol with start state to get us underway */ stack.removeAllElements(); stack.push(new Symbol(0, start_state())); tos = 0;
20
/* continue until we are told to stop */
for (_done_parsing = false; !_done_parsing ; ) { /* Check current token for freshness. */ if (cur_token.used_by_parser) throw new Error( … ); /* current state is always on the top of the stack */ /* look up action out of the current state with the current input */ act = get_action(((Symbol)stack.peek()).parse_state, cur_token.sym);
21
/* decode the action -- > 0 encodes shift */
if (act > 0) { /* shift */ cur_token.parse_state = act-1; cur_token.used_by_parser = true; stack.push(cur_token); tos++; /* advance to the next Symbol */ cur_token = scan(); } /* if its less than zero, then it encodes a reduce action */ else if (act < 0) { /* reduce, first arg is action number */
22
else if (act < 0) { /* reduce, first arg is action number */
lhs_sym = do_action((-act)-1, this, stack, tos); /* look up information about the production */ lhs_sym_num = production_tab[(-act)-1][0]; handle_size = production_tab[(-act)-1][1]; /* pop the handle off the stack */ for (int i = 0; i < handle_size; i++) { stack.pop(); tos--; } /* look up the state to go to from the one popped back to */ act = get_reduce(((Symbol)stack.peek()).parse_state, lhs_sym_num); /* shift to that state */ lhs_sym.parse_state = act;lhs_sym.used_by_parser = true; stack.push(lhs_sym); tos++; }
23
/* finally if the entry is zero, we have an error */
else if (act == 0) { /* call user syntax error reporting routine */ syntax_error(cur_token); /* try to error recover, arg is debug or not */ if (!error_recovery(false)) { /* if that fails give up with a fatal syntax error */ unrecovered_syntax_error(cur_token); /* just in case that wasn't fatal enough, end parse */ done_parsing(); } else { // continue to parse lhs_sym = (Symbol)stack.peek(); } } } return lhs_sym; }
24
Where the user codes are put in the generated class
public class parser extends lr_parser{ $parserCode public Symbol scan() { $scanWithCode } problic user_init() { $initWithCode } …} class lr_parser { public Symbol parse() { … user_init() ; // getFirstToken() … scan() … } class CUP$Parser$action { $actionCode … Symbol CUP$parser$do_action { … $actionCodeInRules …} }
25
Usage of various user code components
action code : // action code {: … :} declaration of additional variables used in grammar rules, e.g., symbol table,.. helper methods used in rules parser code // parser with {: … :} for customizing the generated parser e.g., new error_report methods, scan(), … user init code // init with {: … :} initialize your scanner, tables, data structures required for semantic actions scan code // scan with {: :} tell how to get a token from your scanner
26
Running CUP -package name // default is the empty package
-parser name // name.java is the generated file // default is parser. -symbols name // default is sym -interface Outputs the symbol constant code as an interface rather than as a class. -nonterms // include also noterminal constant symbols -expect number // of conflicts to abort parsing -compact_red // enable table compaction for reduction -nowarn // warnings suppressed
27
-nosummary -progress -dump_grammar -dump_states -dump_tables -dump -time -debug -nopositions // => elefr, eright cause error -noscanner // obsolete -version
28
Scanner Interface 1. Your scanner must implement the Scanner interface: package java_cup.runtime; public interface Scanner { public Symbol next_token() throws java.lang.Exception; } 2. lr_parser use scan() to fetch next token : public Symbol scan() throws java.lang.Exception { Symbol sym = getScanner().next_token(); return (sym!=null) ? sym : new Symbol(EOF_sym()); } You can override scan() in your parser code ( $scanWithCode )
29
3. The generated parser also contains a constructor :
public parser(Scanner) which calls setScanner() with it. In most cases, the init with and scan with directives may be omitted. You can simply create the parser with a reference to the desired scanner: parser parser_obj = new parser(new my_scanner()); or set the scanner after the parser is created: parser parser_obj = new parser(); parser_obj.setScanner(new my_scanner());
30
4. Integration with JLex Put the following three lines in the directives region. %implements java_cup.runtime.Scanner %function next_token %type java_cup.runtime.Symbol or simply: %cup Invoking the parser with the JLex scanner is then simply: parser parser_obj = new parser( new Yylex( some_InputStream_or_Reader)); Note CUP handles the JLex/JFlex convention of returning null on EOF without a problem, so an %eofval directive is not required in the JLex specification (added in CUP 0.10k).
31
Syntax Error Handling Programs may contain errors at different levels:
Lexical, such as misspelling an identifier, keyword, or operator. Syntactic, such as an arithmetic expression with unbalanced parentheses. Semantic, such as an operator applied to an incompatible operand. Logical, such as an infinitely recursive call. Goal of the error handler in a parser: Report the presence of errors clearly and accurately. Recover from each error quickly enough to be able to detect subsequent errors. Do not significantly slow down the processing of correct programs.
32
Error recovery strategies
Panic mode recovery. On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing tokens (normally delimiters such as semicolon or keyword end) is found. Simple but may skip a considerable amount of input. Phrase level recovery. Perform a local correction on the remaining input: replace a prefix of the remaining input by some string that allows the parse to continue. For example, replace a comma by a semicolon, delete an extra semicolon, or insert a missing semicolon. First used in the top town parsing.
33
Error Recovery Strategies
Error productions. Use the language grammar augmented by some error production rules to detect common errors. These extra rules generate erroneous constructs. Global correction. Still perform a local correction but with global knowledge leading to smallest replacements.
34
Error recovery in LL(1) parsers
Use panic mode recovery. When an error occurs looking for nonterminal A, scan until an element of FOLLOW(A) is found, then pop A and continue. If we can't match a terminal on the top of stack: 1. pop the terminal 2. print a message saying the terminal was inserted 3. continue the parse
35
Error recovery in shift-reduce parsers
The problem encounter an invalid token bad pieces of tree hanging from stack incorrect entries in symbol table (type error… ) We want to parse the rest of the file Restarting the parser find a restartable state on the stack move to a consistent place in the input print an informative message (line number)
36
Error Recovery in CUP and YACC
Supports a special terminal symbol: error. need not be declared. can matches an erroneous input sequence. error show synchronization point When an error is discovered pops the stack until error transition is legal skips input tokens until it matches a certain number (3) of tokens error productions can have actions
37
Error recovery in yacc stmtList ::= stmt SEMI | stmtList stmt SEMI ;
can be augmented with error stmtList ::= stmt SEMI | stmtList stmt SEMI error SEMI ; This should throw out the erroneous statement synchronize at SEMI. invoke syntax_error()
38
Example production 2: stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ... | error SEMI ; If error occurs 1. syntax_error() reported, and 2. [recovery:] throw out the erroneous expr (or while_statement or if_statement) statement 3. synchronize at SEMI. An error is considered to be recovered iff a sufficient number of tokens past the error symbol can be successfully parsed. (= error_sync_size() method of the parser and defaults to 3). Other natural places for errors missing parentheses or brackets extra operator or missing operator
39
a1 a2 a3 ; … Error Recovery in CUP S error . SEMI error+ SEMI
S . expr SEMI S . while_stmt SEMI S . if_stmt SEMI S . error SEMI S error SEMI . action(s,a1) is undefined Stack input s a1 a2 a3 ; … X pop() until error transition legal match error
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.