CS 3304 Comparative Languages

Slides:



Advertisements
Similar presentations
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Advertisements

1 Pass Compiler 1. 1.Introduction 1.1 Types of compilers 2.Stages of 1 Pass Compiler 2.1 Lexical analysis 2.2. syntactical analyzer 2.3. Code generation.
CPSC Compiler Tutorial 9 Review of Compiler.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Context-Free Grammars Lecture 7
CS 153: Concepts of Compiler Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
ANTLR with ASTs. Abstract Syntax Trees ANTLR can be instructed to produce ASTs for the output of the parser ANTLR uses a prefix notation for representing.
ANTLR.
ANTLR Andrew Pangborn & Zach Busser. ANTLR in a Nutshell ANother Tool for Language Recognition generates lexers generates parsers (and parse trees)‏ Java-based,
Invitation to Computer Science 5th Edition
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
BUILD ON THE POLYGLOT COMPILER FRAMEWORK MIHAL BRUMBULLI 7th Workshop “SEERE” Montenegro-Risan 9-14 September 2007 SimJ Programming Language.
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
. n COMPILERS n n AND n n INTERPRETERS. -Compilers nA compiler is a program thatt reads a program written in one language - the source language- and translates.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
INTRODUCTION TO COMPILERS(cond….) Prepared By: Mayank Varshney(04CS3019)
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Lecture 9 Symbol Table and Attributed Grammars
Chapter 3 – Describing Syntax
Compiler Design (40-414) Main Text Book:
Chapter 1 Introduction.
Introduction to Compiler Construction
CS 3304 Comparative Languages
A Simple Syntax-Directed Translator
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction (CS-636)
Overview of Compilation The Compiler Front End
CS 3304 Comparative Languages
Chapter 1 Introduction.
PROGRAMMING LANGUAGES
-by Nisarg Vasavada (Compiled*)
课程名 编译原理 Compiling Techniques
Compiler Lecture 1 CS510.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Compiler Construction
CS416 Compiler Design lec00-outline September 19, 2018
Compiler Design 22. ANTLR AST Traversal (AST as Input, AST Grammars)
CS 3304 Comparative Languages
Course supervisor: Lubna Siddiqui
Introduction CI612 Compiler Design CI612 Compiler Design.
CS 3304 Comparative Languages
Compilers B V Sai Aravind (11CS10008).
CS 3304 Comparative Languages
Syntax-Directed Translation
CS 3304 Comparative Languages
ANTLR v3 Overview (for ANTLR v2 users)
CS416 Compiler Design lec00-outline February 23, 2019
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Introduction to ANTLR Jin Tianxing
Presentation transcript:

CS 3304 Comparative Languages Lecture 8a: Using ANTLR 9 February 2012

Introduction Discuss how to use ANTLR. Some material is taken from: A book by Terence Parr, “The Definitive ANTLR Reference: Building Domain-Specific Languages” http://www.pragprog.com/titles/tpantlr/ An article by R. Mark Volkmann, “ANTLR 3” http://jnb.ociweb.com/jnb/jnbJun2008.htm

Why ANTLR? ANTLR is a parser generator used to implement language interpreters, compilers, and other translators. Most often used to build translators and interpreters for domain-specific languages (DSLs). DSLs are usually very high-level languages used for specific tasks and particularly effective in a specific domain. DSLs provide a more natural, high-fidelity, robust, and maintainable means of encoding a problem compared to a general-purpose language.

Definitions Lexer: converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more). Parser: processes a stream of tokens, possibly creating an AST. Abstract Syntax Tree (AST): an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times. Tree Parser: processes an AST. StringTemplate: a library that supports using templates with placeholders for outputting text (ex. Java source code).

Overall ANTLR Flow A translator maps each input sentence of a language to an output sentence. The overall translation problem consists of smaller problems mapped to well-defined translation phases (lexing, parsing, and tree parsing). The communication between phases uses well-defined data types and structures (characters, tokens, trees, and ancillary structures). Often the translation requires multiple passes so an intermediate form is needed to pass the input between phases. Abstract Syntax Tree (AST) is a highly processed, condensed version of the input.

How to Write ANTLR Grammar I Write the grammar using one or more files. A common approach is to use three grammar files, each focusing on a specific aspect of the processing: The first is the lexer grammar, which creates tokens from text input. The second is the parser grammar, which creates an AST from. tokens The third is the tree parser grammar, which processes an AST.

How To Write ANTLR Grammar II This results in three relatively simple grammar files as opposed to one complex grammar file. Optionally write StringTemplate templates for producing output. Debug the grammar using ANTLRWorks. Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language “actions” specified in the grammar. Write an application that uses the the generated classes.

ANTLR Grammar: Program.g grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;

Using ANTLR From Generate Menu select Generate Code menu item. In gencode subdirectory three files are generated: Program.tokens: The list of token-name, token-type assignments ProgramLexer.java: The lexer (scanner) generated from Program.g. ProgramParser.java: The parser generated from Program.g. Create a tester class (with main), e.g. RunProgram.java. Compile and run: javac RunProgram.java ProgramParser.java ProgramLexer.java java RunProgram Make sure that the ANTLR jar file is in your class path or included in your Java installation. ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).

Main Class: RunProgram.java import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ); parser.program(); }

Evaluate: ProgramEvaluation.g I grammar ProgramEvaluation; @header { import java.util.HashMap; } @members { HashMap symbolTable = new HashMap(); program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;

Evaluate: ProgramEvaluation.g II expression returns [int value] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpression returns [int value] : e=atom ('*' e=atom {$value *= $e.value;}

Evaluate: ProgramEvaluation.g III atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

Tree Grammar A parse tree, which represents the sequence of rule invocations used to match an input stream. Abstract Syntax Tree (AST) is an intermediate representation, a tree of some flavor and records not only the input symbols but also the relationship between those symbols as dictated by the grammatical structure. All nodes in the AST are input symbol nodes. Example: 3 + 4 * 5

Building AST with Grammars Add AST construction rules to the parser grammar that indicate what tree shape you want to build. ANTLR will build a tree node for every token matched on the input stream: options { output=AST; ASTLabelType=CommonTree; } To specify a tree structure, simply indicate which tokens should be considered operators (subtree roots): ! which tokens should be excluded from the tree. ^ which tokens should be considered operators (subtree roots). Rewrite rules: -> Tree rewrite syntax.

Rewrite Rules The rewrite rule makes a tree with the operator at the root and the identifier as the first and only child: statement: expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; Symbol -> begins each rewrite rule. Rewrite rules for AST construction are parser-grammar-to- tree-grammar mappings. When an error occurs within a rule, ANTLR catches the exception, reports the error, attempts to recover (possibly by consuming more tokens), and then returns from the rule.

Tree Parser: ProgramTree.g I grammar ProgramTree; options { output=AST; ASTLabelType=CommonTree; } program: ( statement {System.out.println($statement.tree.toStringTree());} )+ ; statement: expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)*

Tree Parser: ProgramTree.g II expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)* ; multiplicationExpression: atom ('*'^ atom)* atom: INT | ID | '('! expression ')'! ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ {skip();} ;

Building Tree Grammars The ANTLR notation for a tree grammar is identical to the notation for a regular grammar except for the introduction of a two-dimensional tree construct. Make a tree grammar by cutting and pasting from the parser grammar by removing recognition grammar elements to the left of the -> operator, leaving the AST rewrite fragments. ANTLR grammar file: tree grammar ProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; }

Tree Parser: ProgramWalker.g I tree grammar ProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; } @header { import java.util.HashMap; @members { HashMap symbolTable = new HashMap(); program: statement+ ;

Tree Parser: ProgramWalker.g II statement: expression {System.out.println($expression.value);} | ^('=' ID expression) {symbolTable.put($ID.text, new Integer($expression.value));} ; expression returns [int value] : ^('+' a=expression b=expression) {$value = a+b;} | ^('-' a=expression b=expression) {$value = a-b;} | ^('*' a=expression b=expression) {$value = a*b;} | ID { Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | INT {$value = Integer.parseInt($INT.text);}

Main Program import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class RunProgramWalker { public static void main(String[] args) throws Exception { ProgramTreeParser parser = new ProgramTreeParser( new CommonTokenStream( new ProgramTreeLexer( new ANTLRInputStream(System.in)))); ProgramTreeParser.program_return r = parser.program(); ProgramWalker walker = new ProgramWalker( new CommonTreeNodeStream((CommonTree)r.getTree())); walker.program(); }

Program Output x = 4 y = 3 z = x – y z w = z * (x + y) - 10 w (= x 4) (= y 3) (= z (- x y)) (= w (- (* z (+ x y)) 10)) 1 -3

Summary ANTLR is a free, open source parser generator tool ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated, i.e. ANTLR supports LL(*). Check online documentation at: http://www.antlr.org/ http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Hom e