6-1 6 Syntactic analysis  Aspects of syntactic analysis  Tokens  Lexer  Parser  Applications of syntactic analysis  Compiler generation tool ANTLR.

Slides:



Advertisements
Similar presentations
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Advertisements

8 VM code generation Aspects of code generation Address allocation
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Slides prepared by Rose Williams, Binghamton University Chapter 1 Getting Started 1.1 Introduction to Java.
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Context-Free Grammars Lecture 7
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 2 Chang Chi-Chung Lexical Analyzer The tasks of the lexical analyzer:  Remove white space and comments  Encode constants as tokens.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Chapter 2 A Simple Compiler
Introduction to scripting
ANTLR.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1-1 1 Syntax  Informal vs formal specification  Regular expressions  Backus Naur Form (BNF)  Extended Backus Naur Form (EBNF)  Case study: Calc syntax.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Semantic Analysis (Generating An AST) CS 471 September 26, 2007.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
7-1 7 Contextual analysis  Aspects of contextual analysis  Scope checking  Type checking  Case study: Fun contextual analyser  Representing types.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Lexical Analysis Hira Waseem Lecture
Lab 3: Using ML-Yacc Zhong Zhuang
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
Introduction to programming in the Java programming language.
5-1 5 Compilation  Overview  Compilation phases –syntactic analysis –contextual analysis –code generation  Abstract syntax trees  Case study: Fun language.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
CPS 506 Comparative Programming Languages Syntax Specification.
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
Syntax (2).
CS 614: Theory and Construction of Compilers Lecture 7 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
The Role of Lexical Analyzer
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
MiniJava Compiler A multi-back-end JIT compiler of Java.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Parsing 2 of 4: Scanner and Parsing
Project 1: Part b SPECIFICATION
CS 3304 Comparative Languages
Tutorial On Lex & Yacc.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Syntax-Directed Translation
CSE401 Introduction to Compiler Construction
CS 3304 Comparative Languages
Syntax-Directed Translation
CS 3304 Comparative Languages
Faculty of Computer Science and Information System
Presentation transcript:

6-1 6 Syntactic analysis  Aspects of syntactic analysis  Tokens  Lexer  Parser  Applications of syntactic analysis  Compiler generation tool ANTLR  Case study: Calc  Case study: Fun syntactic analyser Programming Languages 3 © 2012 David A Watt, University of Glasgow

6-2 Aspects of syntactic analysis (1)  Syntactic analysis checks that the source program is well-formed and determines its phrase structure.  Syntactic analysis can be decomposed into: –a lexer (which breaks the source program down into tokens) –a parser (which determines the phrase structure of the source program).

6-3 Aspects of syntactic analysis (2)  Recall: The syntactic analyser inputs a source program and outputs an AST.  Inside the syntactic analyser, the lexer channels a stream of tokens to the parser: source program AST syntactic analysis syntactic errors lexer token stream lexical errors parser

6-4 Tokens  Tokens are textual symbols that influence the source program’s phrase structure, e.g.: –literals –identifiers –operators –keywords –punctuation (parentheses, commas, colons, etc.)  Each token has a tag and a text. E.g.: –the addition operator might have tag PLUS and text ‘+’ –a numeral might have tag NUM, and text such as ‘1’ or ‘37’ –an identifier might have tag ID, and text such as ‘x’ or ‘a1’.

6-5 Separators  Separators are pieces of text that do not influence the phrase structure, e.g.: –spaces –comments.  An end-of-line is: –a separator in most PLs –a token in Python (since it delimits a command).

6-6 Case study: Calc tokens  Complete list of Calc tokens: LPAR ‘(’ RPAR ‘)’ ASSN ‘=’ PUT ‘put’ SET ‘set’ ID ‘…’ PLUS ‘+’ MINUS ‘ ‒ ’ TIMES ‘*’ NUM ‘…’ EOL ‘\n’ EOF ‘’ tag text

6-7 Example: Calc tokens  Calc source program and token stream: set x = 7 put x*(x+1) ASSN ‘=’ SET ‘set’ EOL ‘\n’ ID ‘x’ NUM ‘7’ ID ‘x’ NUM ‘1’ PUT ‘put’ TIMES ‘*’ ID ‘x’ LPAR ‘(’ PLUS ‘+’ RPAR ‘)’ EOL ‘\n’ EOF ‘’

6-8 Lexer (1)  The lexer converts source code to a token stream.  At each step, the lexer inspects the next character of the source code and acts accordingly (see next slide).  When no source code remains, the lexer outputs an EOF token.

6-9 Lexer (2)  E.g., if the next character of the source code is: –a space: discard it. –the start of a comment: scan the rest of the comment, and discard it. –a punctuation mark: output the corresponding token. –a digit: scan the remaining digits, and output the corresponding token (a numeral). –a letter: scan the remaining letters, and output the corresponding token (which could be an identifier or a keyword).

6-10 Parser  The parser converts a token stream to an AST.  There are many possible parsing algorithms.  Recursive-descent parsing is particularly simple and attractive.  Given a suitable grammar for the source language, we can quickly and systematically write a recursive-descent parser for that language.

6-11 Recursive-descent parsing (1)  A recursive-descent parser consists of: –a family of parsing methods N(), one for each nonterminal symbol N of the source language’s grammar –an auxiliary method match().  These methods “consume” the token stream from left to right.

6-12 Recursive-descent parsing (2)  Method match(t) checks whether the next token has tag t. –If yes, it consumes that token. –If no, it reports a syntactic error.  For each nonterminal symbol N, method N() checks whether the next few tokens constitute a phrase of class N. –If yes, it consumes those tokens (and returns an AST representing the parsed phrase). –If no, it reports a syntactic error.

6-13 Example: Calc parser (1)  Parsing methods: prog() com() expr() prim() var() parses a program parses a command parses an expression parses a primary expression parses a variable

6-14 Example: Calc parser (2)  Illustration of how the parsing methods work: ASSN ‘=’ Token stream SET ‘set’ EOL ’\n’ ID ‘x’ NUM ‘7’ expr() prim() var() prog() com()... consumed by var() consumed by expr() consumed by com()

6-15 Example: Calc parser (3)  Illustration (continued): ID ‘x’ NUM ‘1’ PUT ‘put’ TIMES ‘*’ ID ‘x’ LPAR ‘(’ PLUS ‘+’ RPAR ‘)’ EOL ‘\n’ EOF ‘’ expr() prim() var() prim() expr() prim() com ()

6-16 Example: Calc parser (4)  Recall the EBNF production rule for commands: com=‘ put ’ expr eol |‘ set ’ var ‘ = ’ expr eol  Parsing method for commands (outline): void com () { if (…) { match(PUT); expr(); match(EOL); } if the next token is ‘put’

6-17 Example: Calc parser (5)  Parsing method for commands (continued): else if (…) { match(SET); var(); match(ASSN); expr(); match(EOL); } else … } if the next token is ‘set’ report a syntactic error

6-18 Example: Calc parser (6)  Recall the EBNF production rule for programs: prog=com * eof  Parsing method for programs (outline): void prog () { while (…) { com(); } match(EOF); } while the next token is ‘set’ or ‘put’

6-19 General rules for recursive-descent parsing (1)  Consider the EBNF production rule: N=RE  The corresponding parsing method is: void N () { match the pattern RE }

6-20 General rules for recursive-descent parsing (2)  To match the pattern t (where t is a terminal symbol): match(t) ;  To match the pattern N (where N is a nonterminal symbol): N() ;  To match the pattern RE 1 RE 2 : match the pattern RE 1 match the pattern RE 2

6-21 General rules for recursive-descent parsing (3)  To match the pattern RE 1 | RE 2 : if ( the next token can start RE 1 ) match the pattern RE 1 else if ( the next token can start RE 2 ) match the pattern RE 2 else report a syntactic error  Note: This works only if no token can start both RE 1 and RE 2. –In particular, this does not work if a production rule is left-recursive, e.g., N = X | N Y.

6-22 General rules for recursive-descent parsing (4)  To match the pattern RE * : while ( the next token can start RE ) match the pattern RE  Note: This works only if no token can both start and follow RE.

6-23 Applications of syntactic analysis  Syntactic analysis has a variety of applications: –in compilers –in XML applications (parsing XML documents and converting them to tree form) –in web browsers (parsing and rendering HTML documents) –in natural language applications (parsing and translating NL documents).

6-24 Compiler generation tools  A compiler generation tool automates the process of building compiler components.  The input to a compiler generation tool is a specification of what the compiler component is to do. E.g.: –The input to a parser generator is a grammar.  Examples of compiler generation tools: –lex and yacc (see Advanced Programming) –JavaCC –SableCC –ANTLR.

6-25 The compiler generation tool ANTLR  ANTLR (ANother Tool for Language Recognition) is the tool we shall use here. See  ANTLR can automatically generate a lexer and recursive-descent parser, given a grammar as input: –The developer starts by expressing the source language’s grammar in ANTLR notation (which resembles EBNF). –Then the developer enhances the grammar with actions and/or tree-building operations.  ANTLR can also generate contextual analysers (see §7) and code generators (see §8).

6-26 Case study: Calc grammar in ANTLR (1)  Calc grammar expressed in ANTLR notation: grammar Calc; prog :com* EOF ; com :PUT expr EOL |SET var ASSN expr EOL ; var :ID ;

6-27 Case study: Calc grammar in ANTLR (2)  Calc grammar (continued): expr :prim ( PLUS prim | MINUS prim | TIMES prim )* ; prim :NUM |var |LPAR expr RPAR ;

6-28 Case study: Calc grammar in ANTLR (3)  Calc grammar (continued – lexicon): PUT:'put' ; SET:'set' ; ASSN:'=' ; PLUS:'+' ; MINUS:'-' ; TIMES:'*' ; LPAR:'(' ; RPAR:')' ; ID:'a'..'z' ; NUM:'0'..'9'+ ; EOL:'\r'? '\n' ; SPACE:(' ' | '\t')+{skip();} ; This says that a SPACE is a separator. Tokens and separators have upper-case names.

6-29 Case study: Calc driver (1)  Put the above grammar in a file named Calc.g.  Feed this as input to ANTLR: …$ java org.antlr.Tool Calc.g  ANTLR automatically generates the following classes: –Class CalcLexer contains methods that convert an input stream (source code) to a token stream. –Class CalcParser contains parsing methods prog(), com(), …, that consume the token stream.

6-30 Case study: Calc driver (2)  Write a driver program that calls CalcParser ’s method prog() : public class CalcRun { public static void main (String[] args) { InputStream source = new InputStream(args[0]); CalcLexer lexer = new CalcLexer( new ANTLRInputStream(source)); CommonTokenStream tokens = new CommonTokenStream(lexer); CalcParser parser = new CalcParser(tokens); parser.prog(); } } creates an input stream creates a lexer runs the lexer, creating a token stream creates a parser runs the parser

6-31 Case study: Calc grammar in ANTLR (6)  When compiled and run, CalcRun performs syntactic analysis on the source program, reporting any syntactic errors.  However, CalcRun does nothing else!

6-32 Enhancing a grammar in ANTLR  Normally we want to make the parser do something useful.  To do this, we enhance the ANTLR grammar with either actions or tree-building operations.  An ANTLR action is a segment of Java code: { code }  An ANTLR tree-building operation has the form: -> ^(t x y z) where t is a token and x, y, z are subtrees. t y x z

6-33 Case study: Calc grammar in ANTLR with actions (1)  Suppose that we want CalcRun to perform actual calculations: –The command “ put expr” should evaluate the expression expr and then print the result. –The command “ set var = expr” should evaluate the expression expr and then store the result in the variable var.

6-34 Case study: Calc grammar in ANTLR with actions (2)  We can augment the Calc grammar with actions to do this: –Create storage for variables ‘a’, …, ‘z’. –Declare that expr will return a value of type int. Add actions to compute its value. And similarly for prim. –Add an action to the put command to print the value returned by expr. –Add an action to the set command to store the value returned by expr at the variable’s address in the store.

6-35 Case study: Calc grammar in ANTLR with actions (3)  Augmented Calc grammar: grammar { private int[] store = new int[26]; } prog :com* EOF ; storage for variables ‘a’, …, ‘z’

6-36 Case study: Calc grammar in ANTLR with actions (4)  Augmented Calc grammar (continued): com :PUT v=expr EOL{ println(v); } |SET ID ASSN v=expr EOL{ int a = $ID.text.charAt(0) - 'a'; store[a] = v; } ; $ID.text is the text of ID (a string of letters) $ID.text.charAt(0) is the 1 st letter. $ID.text.charAt(0)-'a' is in the range

6-37 Case study: Calc grammar in ANTLR with actions (5)  Augmented Calc grammar (continued): exprreturns [int val] :v1=prim{ $val = v1; } ( PLUS v2=prim{ $val += v2; } | MINUS v2=prim{ $val -= v2; } | TIMES v2=prim{ $val *= v2; } )* ;

6-38 Case study: Calc grammar in ANTLR with actions (5)  Augmented Calc grammar (continued): primreturns [int val] :NUM{ $val = parseInt( $NUM.text); } |ID{ int a = $ID.text.charAt(0) - 'a'; $val = store[a]; } |LPAR v=expr RPAR{ $val = v; } ;

6-39 Case study: Calc grammar in ANTLR with actions (6)  Run ANTLR as before: …$ java org.antlr.Tool Calc.g  ANTLR inserts code into the CalcParser class.  ANTLR inserts the above actions into the com(), expr(), and prim() methods of CalcParser.

6-40 Case study: Calc grammar in ANTLR with actions (7)  When compiled and run, CalcRun again performs syntactic analysis on the source program, but now it also performs the actions: …$ javac CalcLexer.java CalcParser.java \ CalcRun.java …$ java CalcRun test.calc set c = 8 set e = 7 put c*2 put e*8 set m = (c*2) + (e*8) put m

6-41 ANTLR notation  At the top of the expr production rule, “ expr returns [int val] ” declares that parsing an expr will return an integer result named value.  Within actions in the expr production rule, “ $val = … ” sets the result.  In any production rule, “ v=expr ” sets a local variable v to the result of parsing the expr.

6-42 AST building with ANTLR  What if the parser is required to build an AST?  Start with an EBNF grammar of the source language, together with a summary of the ASTs to be generated.  Express the grammar in ANTLR’s notation. Then add tree-building operations to specify the translation from source language to ASTs.  Recall: An ANTLR tree-building operation has the form: -> ^(t x y z) t y x z

6-43 Case study: Fun grammar in ANTLR (1)  Fun grammar (outline): grammar Fun; prog :var_decl* proc_decl+ EOF ; var_decl :type ID ASSN expr ; type :BOOL |INT ;

6-44 Case study: Fun grammar in ANTLR (2)  Fun grammar (continued): com :ID ASSN expr |IF expr COLON seq_com DOT |… ; seq_com :com* ;

6-45 Case study: Fun grammar in ANTLR (3)  Fun grammar (continued): expr:sec_expr … sec_expr :pri_expr ( (PLUS | MINUS | TIMES | DIV) pri_expr )* ; pri_expr :NUM |ID |LPAR expr RPAR |… ;

6-46 Case study: Fun grammar in ANTLR with AST building (1)  Augmented Fun grammar (outline): grammar Fun; options { output = AST; …; } tokens { PROG; SEQ; …; } lists special tokens to be used in the AST (in addition to lexical tokens) states that this grammar will generate an AST

6-47 Case study: Fun grammar in ANTLR with AST building (2)  Augmented Fun grammar (outline): prog :var_decl* proc_decl+ EOF -> ^(PROG var_decl* proc_decl+) ; proc- decl PROG var- decl … … builds an AST like this:

6-48 Case study: Fun grammar in ANTLR with AST building (3)  Augmented Fun grammar (continued): com :ID ASSN expr-> ^(ASSN ID expr) |IF expr COLON seq_com DOT-> ^(IF expr seq_com) |… ; seq_com :com*-> ^(SEQ com*) ; … SEQ com ASSN expr ID IF seq- com expr

6-49 Case study: Fun grammar in ANTLR with AST building (4)  Augmented Fun grammar (continued): expr:sec-expr … sec_expr :prim_expr ( (PLUS^ | MINUS^ | TIMES^ | DIV^) prim_expr )* ; prim_expr :NUM-> NUM |ID-> ID |LPAR expr RPAR-> expr |… ; builds an AST like this: TIMES expr 2 expr 1

6-50 Case study: Fun syntactic analyser (1)  Put the above grammar in a file named Fun.g.  Run ANTLR to generate a lexer and a parser: …$ java org.antlr.Tool Fun.g  ANTLR creates the following classes: –Class FunLexer contains methods that convert an input stream (source code) to a token stream. –Class FunParser contains parsing methods prog(), var_decl(), com(), …, that consume the token stream.  The prog() method now returns an AST.

6-51 Case study: Fun syntactic analyser (2)  Program to run the Fun syntactic analyser: public class FunRun { public static void main (String[] args) { InputStream source = new FileInputStream(args[0]); FunLexer lexer = new FunLexer( new ANTLRInputStream(source)); CommonTokenStream tokens = new CommonTokenStream(lexer); FunParser parser = new FunParser(tokens); CommonTree ast = (CommonTree) parser.prog().getTree(); } } runs the parser gets the resulting AST