CS 3304 Comparative Languages

CS 3304 Comparative Languages
Lecture 8a: Using ANTLR 9 February 2012

Introduction Discuss how to use ANTLR. Some material is taken from:
A book by Terence Parr, “The Definitive ANTLR Reference: Building Domain-Specific Languages” An article by R. Mark Volkmann, “ANTLR 3”

Why ANTLR? ANTLR is a parser generator used to implement language interpreters, compilers, and other translators. Most often used to build translators and interpreters for domain-specific languages (DSLs). DSLs are usually very high-level languages used for specific tasks and particularly effective in a specific domain. DSLs provide a more natural, high-fidelity, robust, and maintainable means of encoding a problem compared to a general-purpose language.

Definitions Lexer: converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more). Parser: processes a stream of tokens, possibly creating an AST. Abstract Syntax Tree (AST): an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times. Tree Parser: processes an AST. StringTemplate: a library that supports using templates with placeholders for outputting text (ex. Java source code).

Overall ANTLR Flow A translator maps each input sentence of a language to an output sentence. The overall translation problem consists of smaller problems mapped to well-deﬁned translation phases (lexing, parsing, and tree parsing). The communication between phases uses well-deﬁned data types and structures (characters, tokens, trees, and ancillary structures). Often the translation requires multiple passes so an intermediate form is needed to pass the input between phases. Abstract Syntax Tree (AST) is a highly processed, condensed version of the input.

How to Write ANTLR Grammar I
Write the grammar using one or more files. A common approach is to use three grammar files, each focusing on a specific aspect of the processing: The first is the lexer grammar, which creates tokens from text input. The second is the parser grammar, which creates an AST from. tokens The third is the tree parser grammar, which processes an AST.

How To Write ANTLR Grammar II
This results in three relatively simple grammar files as opposed to one complex grammar file. Optionally write StringTemplate templates for producing output. Debug the grammar using ANTLRWorks. Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language “actions” specified in the grammar. Write an application that uses the the generated classes.

ANTLR Grammar: Program.g
grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;

Using ANTLR From Generate Menu select Generate Code menu item.
In gencode subdirectory three files are generated: Program.tokens: The list of token-name, token-type assignments ProgramLexer.java: The lexer (scanner) generated from Program.g. ProgramParser.java: The parser generated from Program.g. Create a tester class (with main), e.g. RunProgram.java. Compile and run: javac RunProgram.java ProgramParser.java ProgramLexer.java java RunProgram Make sure that the ANTLR jar file is in your class path or included in your Java installation. ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).

Main Class: RunProgram.java
import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ); parser.program(); }

Evaluate: ProgramEvaluation.g I
grammar { import java.util.HashMap; { HashMap symbolTable = new HashMap(); program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;

Evaluate: ProgramEvaluation.g II
expression returns [int value] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpression returns [int value] : e=atom ('*' e=atom {$value *= $e.value;}

Evaluate: ProgramEvaluation.g III
atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

Tree Grammar A parse tree, which represents the sequence of rule invocations used to match an input stream. Abstract Syntax Tree (AST) is an intermediate representation, a tree of some flavor and records not only the input symbols but also the relationship between those symbols as dictated by the grammatical structure. All nodes in the AST are input symbol nodes. Example: * 5

Building AST with Grammars
Add AST construction rules to the parser grammar that indicate what tree shape you want to build. ANTLR will build a tree node for every token matched on the input stream: options { output=AST; ASTLabelType=CommonTree; } To specify a tree structure, simply indicate which tokens should be considered operators (subtree roots): ! which tokens should be excluded from the tree. ^ which tokens should be considered operators (subtree roots). Rewrite rules: -> Tree rewrite syntax.

Rewrite Rules The rewrite rule makes a tree with the operator at the root and the identifier as the first and only child: statement: expression NEWLINE > expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE > ; Symbol -> begins each rewrite rule. Rewrite rules for AST construction are parser-grammar-to- tree-grammar mappings. When an error occurs within a rule, ANTLR catches the exception, reports the error, attempts to recover (possibly by consuming more tokens), and then returns from the rule.

Tree Parser: ProgramTree.g I
grammar ProgramTree; options { output=AST; ASTLabelType=CommonTree; } program: ( statement {System.out.println($statement.tree.toStringTree());} )+ ; statement: expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)*

Tree Parser: ProgramTree.g II
expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)* ; multiplicationExpression: atom ('*'^ atom)* atom: INT | ID | '('! expression ')'! ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ {skip();} ;

Building Tree Grammars
The ANTLR notation for a tree grammar is identical to the notation for a regular grammar except for the introduction of a two-dimensional tree construct. Make a tree grammar by cutting and pasting from the parser grammar by removing recognition grammar elements to the left of the -> operator, leaving the AST rewrite fragments. ANTLR grammar file: tree grammar ProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; }

Tree Parser: ProgramWalker.g I
tree grammar ProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; { import { HashMap symbolTable = new HashMap(); program: statement+ ;

Tree Parser: ProgramWalker.g II
statement: expression {System.out.println($expression.value);} | ^('=' ID expression) {symbolTable.put($ID.text, new Integer($expression.value));} ; expression returns [int value] : ^('+' a=expression b=expression) {$value = a+b;} | ^('-' a=expression b=expression) {$value = a-b;} | ^('*' a=expression b=expression) {$value = a*b;} | ID { Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | INT {$value = Integer.parseInt($INT.text);}

Main Program import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class RunProgramWalker { public static void main(String[] args) throws Exception { ProgramTreeParser parser = new ProgramTreeParser( new CommonTokenStream( new ProgramTreeLexer( new ANTLRInputStream(System.in)))); ProgramTreeParser.program_return r = parser.program(); ProgramWalker walker = new ProgramWalker( new CommonTreeNodeStream((CommonTree)r.getTree())); walker.program(); }

Program Output x = 4 y = 3 z = x – y z w = z * (x + y) - 10 w (= x 4) (= y 3) (= z (- x y)) (= w (- (* z (+ x y)) 10)) 1 -3

Summary ANTLR is a free, open source parser generator tool
ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated, i.e. ANTLR supports LL(*). Check online documentation at: e

CS 3304 Comparative Languages

Similar presentations

Presentation on theme: "CS 3304 Comparative Languages"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 3304 Comparative Languages

Similar presentations

Presentation on theme: "CS 3304 Comparative Languages"— Presentation transcript:

Similar presentations

About project

Feedback