An Introduction To Antlr

Slides:



Advertisements
Similar presentations
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Advertisements

1 Classes and Objects in Java Basics of Classes in Java.
Final and Abstract Classes
1 Inheritance Classes and Subclasses Or Extending a Class.
1 Applets Programming Enabling Application Delivery Via the Web.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Chapter 4 Parameters and Overloading. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 4-2 Learning Objectives Parameters Call-by-value Call-by-reference.
Chapter 1 C++ Basics. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 1-2 Learning Objectives Introduction to C++ Origins, Object-Oriented.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 12 Introduction to ASP.NET.
Copyright © 2002 Pearson Education, Inc. Slide 1.
11 Copyright © 2005, Oracle. All rights reserved. Using Arrays and Collections.
10 Copyright © 2005, Oracle. All rights reserved. Reusing Code with Inheritance and Polymorphism.
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Programming Language Concepts
Chapter 2-2 A Simple One-Pass Compiler
Lesson 6 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lecture 15 Linked Lists part 2
1 Linked Lists III Template Chapter 3. 2 Objectives You will be able to: Write a generic list class as a C++ template. Use the template in a test program.
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
1 Symbol Tables. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Semantic Analysis and Symbol Tables
Object Oriented Programming with Java
8 VM code generation Aspects of code generation Address allocation
1 Advanced C Programming from Expert C Programming: Deep C Secrets by Peter van der Linden CIS*2450 Advanced Programming Concepts.
25 seconds left…...
1 Week 9 Questions / Concerns Hand back Test#2 What’s due: Final Project due next Thursday June 5. Final Project check-off on Friday June 6 in class. Next.
We will resume in: 25 Minutes.
Abstract Class, Packages and interface from Chapter 9
1 Assignment 3 Jianguo Lu. 2 Task: check whether the a program is syntactically correct /** this is a comment line in the sample program **/ INT f2(INT.
Semantics of PLs via Interpreters: Getting Started CS784: Programming Languages Prabhaker Mateti.
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
SableCC SableCC is developed by professors and graduate students at McGill University and is open source (licensed under the Apache License, Version 2.0)‏
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
ANTLR Andrew Pangborn & Zach Busser. ANTLR in a Nutshell ANother Tool for Language Recognition generates lexers generates parsers (and parse trees)‏ Java-based,
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Grammatica :: Parser Gen anurag naidu Winter Compiler Construction.
McLab Tutorial Part 3 – McLab Frontend Frontend organization Introduction to Beaver Introduction to JastAdd 6/4/2011 Frontend-1McLab.
CPS 506 Comparative Programming Languages Syntax Specification.
1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Writing Parsers with Ruby
Yacc. Yacc 2 Yacc takes a description of a grammar as its input and generates the table and code for a LALR parser. Input specification file is in 3 parts.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Design issues for Object-Oriented Languages
Comp 411 Principles of Programming Languages Lecture 3 Parsing
Announcements/Reading
CS 3304 Comparative Languages
Programming Languages Translator
CS510 Compiler Lecture 4.
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Compiler Design 22. ANTLR AST Traversal (AST as Input, AST Grammars)
CPSC 388 – Compiler Design and Construction
CSE 3302 Programming Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Presentation transcript:

An Introduction To Antlr

Content What is Antlr? Why use Antlr? How to use Antlr? Components of Antlr grammar file Writing Lexer Class Writing Parser Class What does Antlr generates? Predicates Automatic Parse Tree generation Tree Parsing Conclusion

What is Antlr? ANTLR, ANother Tool for Language Recognition, is a pred-LL(k) parser and translator generator tool. It generates front end of compilers, and source-to- source translators grammatical descriptions in Java, C++, Python, C#.

Why Antlr? Antlr supports writing grammars in EBNF LL[k] that is very handy in compare to LR grammars. The generated code of Antlr is much more readable than others LR/LL parser, which makes debugging much more easy. Re-entrant parser Re-usability Antlr can outputs multiple languages. Lower Memory requirement as it doesn’t simulate a push down automata like LALR(yacc/bison)

Why Antlr?(contd.) Antlr generates code in Object oriented languages. So It allows to inherit the basic functionality and add your own functionality. Antlr supports exception handling, makes easy error recovery. Same meta-language specification for lexer/parser/tree parser. Antlr allows to build AST from input token stream.

How to use Antlr? calc.g is my antlr grammer file containing both lexer and parser. It contains the parser with name CalcParser class CalcParser extends Parser; It contains the lexer with name CalcLexer class CalcLexer extends Lexer; Now invoke ANTLR on the grammer file to generate the lexer and the parser code java -cp $ANTLR_HOME/antlr.jar antlr.Tool calc.g Compile the generated code gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcLexer.cpp gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall CalcParser.cpp

How to use Antlr? Compile the main function with instance of lexer and parser class. Example of main() functions body CalcLexer lexer(cin); CalcParser parser(lexer); parser.expr(); gcc -c -g -I. -I$ANTLR_HOME/lib/cpp -Wall main.cpp Link the generated obj files with antlr static library to create the parser executable gcc main.o CalcLexer.o CalcParser.o $ANTLR_HOME/lib/cpp/src/libantlr.a -lstdc++

Writing Parser Class All parser rules must be associated with a parser class. A parser specification in a grammar file often looks like: { optional class code preamble } class YourParserClass extends Parser; options section tokens section { optional parser class members } parser rules

Options section The section is preceded by the ‘options’ keyword and contains a series of option/value assignments. options { importVocab = lexerVocab; k = 2; buildAst = true; defaultErrorHandler = true; }

Token section Token section contains all the keywords that parser will use in parser rules. For example: tokens { "void"; "char"; "short"; "int"; ..... }

Rule Section The structure of an input stream of atoms is specified by a set of mutually-referential rules. Each rule has a a name, optionally a set of arguments, optionally an init-action, optionally a return value, and an alternative or alternatives. Each alternative contains a series of elements that specify what to match and where.

Rule Section(contd.) The basic form of an ANTLR rule is: rulename : alternative_1 | alternative_2 ... | alternative_n ; If parameters are required for the rule, use the following form: rulename[formal parameters] : ... ;

Rule Section(contd.) If you want to return a value from the rule, use the returns keyword: rulename[formal parameters] returns [type id] : ... ; If you want to pass arguments to any rule reference use the following from: rulename : alternative_1[arg1, arg2] ; If the rule reference return any value, to capture that value simply assign that value to a variable using assignment.

Rule Section Init-action can also be specified for rule. rulename { type id; } : id=alternative_1[arg1, arg2] ... ; Init-action can also be specified for rule. // init-action : ....;

Rule Section User action can follow any rule reference. It excutes after that rule reference have matched except in non guessing mode. rule : rule_ref1 { // user code } rule_ref2 { // user code } ; ANTLR supports extended BNF notation according to the following four subrule syntax.

Rule Section(contd.) ( P1 | P2 | ... | Pn ) ( P1 | P2 | ... | Pn )*

Writing Lexer Class All lexer rules must be associated with a lexer class. A lexer specification in a grammar file often looks like: { optional class code preamble } class YourLexerClass extends Lexer; options section tokens section { optional lexer class members } lexer rules

What does Antlr Generate? Antlr will generate the following files from calc.g grammer. CalcLexer.hpp CalcLexer.cpp CalcParser.hpp CalcParser.cpp CalcLexerTokenTypes.hpp CalcLexerTokenTypes.txt For every rule, Antlr defines a function call inside the parser/lexer class. For example, the code for rules expr looks very much like this:

Rule Section(contd.) void CalcParser::expr() { try { // for error handling mexpr(); { // ( ... )* for (;;) { if ((LA(1) == PLUS)) { match(PLUS); } else { goto _loop14; _loop14:; } // ( ... )* match(SEMI); catch (ANTLR_USE_NAMESPACE(antlr)RecognitionException& ex) { // report error consume this token and forward the token stream pointer from where // parser can resume parsing

Predicates Antlr provides two types of predicates to resolve ambiguities between alternatives. Semantic predicate A semantic predicate specifies a condition that must be met (at run- time) before parsing may proceed. It is specified as {...}? Example: stat : {isTypeName(LT(1))}? ID ID ";" // declaration "type varName;" | ID "=" expr ";" // assignment ;

Predicate(contd.) Syntatic predicate Semantic predicate allows you to use arbitrary lookahead when parsing decisions cannot be deterministic with finite lookahead. It is specified as ( prediction block ) => production. Example: stat: ( list "=" )=> list "=" list | list ;

Automatic Parse Tree Generation ANTLR comes with it’s own tree data structure. Antlr tree is a Nery Tree. With each node containing a list of child nodes Each node has a token with tokenId and Value How to generate In options region buildAST = true; With each rule specify the parent with ^ e.g assign: lvalue “=“^ expr’; expr: term (“+”^ term)*; term: ID (“*”^ ID)*;

Accessing Parse Tree The tree is available from parser Object via member function getAST() after parsing myParser.topRule(myLexer); AST *parseTree = myParser.getAST(); The parse Tree information can be accessed via the following member functions of parse Tree int getType(); // type of the token std::string getText(); // text of the token int getNumberOfChildren(); AST *getFirstChild(); AST *getNextSibling();

Customizing AST parser Put ‘!’ to prevent automatic AST generation Add customized tree generation Term: explicit_mult | implicit_mult ; explicit_mult: ID MULT^ ID; imlicit_mult !: left:ID right:ID { #implicit_mult = #(#[MULT,”*”], #left, #right); }

Tree Parser The browser for the AST tree can also be generated by ANTLR The parser/browser needs to be derived from TreeParser class myTreeParser extends TreeParser; rule written similar to parser with # denoting a node information in in-order form e.g poly : #(ADD term poly) | term ; term : INT | ID | #(EXP ID INT) | #(MULT INT #(EXP ID INT)) Action can be added with each rule. The rule can create a new modified AST.

Conclusion ANTLR is a newer and powerful substitute of old yacc parser generator The input language is BNF based and is better organized than yacc input Lot of free language parser code is already available in this language Re-entrant parser in true OOPs. Each rule available as separate parser entry point so the parser is more re-usable. Already in use at Interra. In e2Vera and Tiger. We should use antlr for new projects Will probably have some porting issues as it heavily depends on Exception handling and templates.