CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak.

Slides:



Advertisements
Similar presentations
Compilers and Language Translation
Advertisements

Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
1 Chapter 5 Compilers Source Code (with macro) Macro Processor Expanded Code Compiler or Assembler obj.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Fall 2007CS 2251 Miscellaneous Topics Deque Recursion and Grammars.
1 Terminology l Statement ( 敘述 ) »declaration, assignment containing expression ( 運算式 ) l Grammar ( 文法 ) »a set of rules specify the form of legal statements.
CS 153: Concepts of Compiler Design October 27 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
CS 153: Concepts of Compiler Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
Introducing Java.
Invitation to Computer Science 5th Edition
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
CS 153: Concepts of Compiler Design August 24 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 1: A First Program Using C#. Programming Computer program – A set of instructions that tells a computer what to do – Also called software Software.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
Syntax Specification and BNF © Allan C. Milne Abertay University v
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Programming Languages Third Edition Chapter 6 Syntax.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
CS 152: Programming Language Paradigms April 2 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak
CS453 LectureIntroduction1 CS453 Compiler Construction Instructor:Wim Bohm Computer Science Building 470 TA: tba
CS 153: Concepts of Compiler Design August 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 16 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
CPS 506 Comparative Programming Languages Syntax Specification.
D Goforth COSC Translating High Level Languages.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
CS 153: Concepts of Compiler Design October 10 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Introduction to Compiling
CS 153: Concepts of Compiler Design December 7 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CS 152: Programming Language Paradigms April 16 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak.
CS 154 Formal Languages and Computability March 8 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
Programming Languages Concepts Chapter 1: Programming Languages Concepts Lecture # 4.
Software Engineering Algorithms, Compilers, & Lifecycle.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
CS 153: Concepts of Compiler Design September 14 Class Meeting
Compiler Design (40-414) Main Text Book:
CS 153: Concepts of Compiler Design August 24 Class Meeting
CS 153: Concepts of Compiler Design August 29 Class Meeting
CS 432: Compiler Construction Lecture 3
CS 153: Concepts of Compiler Design October 17 Class Meeting
CS 153: Concepts of Compiler Design December 5 Class Meeting
CS 153: Concepts of Compiler Design November 30 Class Meeting
CMPE 152: Compiler Design December 5 Class Meeting
CMPE 152: Compiler Design January 25 Class Meeting
CMPE 152: Compiler Design September 11/13 Lab
CMPE 152: Compiler Design August 23 Class Meeting
CMPE 152: Compiler Design August 21/23 Lab
High-Level Programming Language
CMPE 152: Compiler Design March 21 Class Meeting
Chapter 10: Compilers and Language Translation
CMPE 152: Compiler Design December 4 Class Meeting
CMPE 152: Compiler Design March 19 Class Meeting
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
CMPE 152: Compiler Design August 27 Class Meeting
Presentation transcript:

CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak 1

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 2 Backus Naur Form (BNF)  A text-based way to describe source language syntax. Named after John Backus and Peter Naur.  Text-based means it can be read by a program.  Example: A compiler-compiler that can automatically generate a parser for a source language after reading (and parsing) the language’s syntax rules written in BNF.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 3 Backus Naur Form (BNF), cont’d  Uses certain meta-symbols. Symbols that are part of BNF itself but are not necessarily part of the syntax of the source language. ::= “is defined as” | “or” Surround names of nonterminal (not literal) items

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 4 BNF Example: U.S. Postal Address ::= | ::= |. ::= Sr. | Jr. | ::= | ::=, ::= ::= A|B|C|D|E|F|G|H|I|J|K|L|M |N|O|P|Q|R|S|T|U|V|W|X|Y|Z ::= … etc. Mary Jane 123 Easy Street San Jose, CA 95192

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 5 BNF: Optional and Repeated Items  To show optional items in BNF, use the vertical bar |.  Example: “An expression is a simple expression optionally followed by a relational operator and another simple expression.” ::= |

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 6 BNF: Optional and Repeated Items  BNF uses recursion for repeated items.  Example: “A digit sequence is a digit followed by zero or more digits.” Right recursive Left recursive ::= |

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 7 BNF Example: Pascal Number ::= | ::= ::=. |. | ::= | ::= E | e ::= + | - ::= | Repetition via recursion. The sign is optional.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 8 BNF Example: Pascal IF Statement ::= IF THEN | IF THEN ELSE  It should be straightforward to write a parsing method from either the syntax diagram or the BNF.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 9  A grammar defines a language.  Grammar = the set of all the BNF rules (or syntax diagrams)  Language = the set of all the legal strings of tokens according to the grammar  Legal string of tokens = a syntactically correct statement Grammars and Languages

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 10  A statement is in the language (it’s syntactically correct) if it can be derived by the grammar.  Each grammar rule “produces” a token string in the language.  The sequence of productions required to arrive at a syntactically correct token string is the derivation of the string. Grammars and Languages

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 11 Grammars and Languages, cont’d  Example: A very simplified expression grammar:  What strings (expressions) can we derive from this grammar? ::= | | ( ) ::= 0|1|2|3|4|5|6|7|8|9 ::= + | *

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 12 Derivations and Productions  Is (1 + 2)*3 valid in our expression language?  Yes! The expression is valid. ::= 1  ( )*3 ::=  ( + 2 )*3 ::= +  ( + 2 )*3 ::= 2  ( 2 )*3 ::=  ( )*3 ::=  ( )*3 ::= ( )  ( )*3 ::= *  *3 ::= 3  3 ::=  ::=  GRAMMAR RULEPRODUCTION ::= | | ( ) ::= 0|1|2|3|4|5|6|7|8|9 ::= + | *

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 13 Extended BNF (EBNF)  Extended BNF (EBNF) adds meta-symbols { } and [ ] { } Surround items to be repeated zero or more times. [ ] Surround optional items.  Originally developed by Niklaus Wirth. Inventor of Pascal. Early user of syntax diagrams.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 14 Extended BNF (EBNF)  Repetition (one or more): BNF: EBNF: ::= | ::= { }

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 15 Extended BNF, cont’d  Optional items. BNF: EBNF: ::= IF THEN | IF THEN ELSE ::= IF THEN [ ELSE ]

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 16 Extended BNF, cont’d  Optional items. BNF: EBNF: ::= | ::= [ ]

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 17 Compiler-Compilers  Professional compiler writers generally do not write scanners and parsers from scratch.  A compiler-compiler is a tool for writing compilers.  It can include: A scanner generator A parser generator Parse tree utilities

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 18 Compiler-Compilers, cont’d  Feed a compiler-compiler a grammar written in a textual form such as BNF or EBNF.  The compiler-compiler outputs code written in a high-level language that implements a scanner, parser, and a parse tree.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 19 Popular Compiler-Compilers  Yacc “Yet another compiler-compiler” Generates a bottom-up parser written in C. GNU version: Bison  Lex Generates a scanner written in C. GNU version: Flex  JavaCC Generates a scanner and a top-down parser written in Java.  JJTree Preprocessor for JavaCC grammars. Generates Java code to build and process parse trees. The code generated by a compiler-compiler can be in a high level language such as C or Java. However, you may find the code to be ugly and hard to read.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 20 JavaCC Compiler-Compiler  Feed JavaCC the grammar for a source language and it will automatically generate a scanner and a parser. Specify the source language tokens with regular expressions  JavaCC generates a scanner for the source language. Specify the source language syntax rules with Extended BNF  JavaCC generates a parser for the source language.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 21 JavaCC Compiler-Compiler, cont’d  The generated scanner and parser are written in Java.  Note: JavaCC calls the scanner the “tokenizer”.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 22 Download and Install  Download and install JavaCC and JJTree  Download the examples from the book Generating Parsers with JavaCC ml ml

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 23 The JavaCC Eclipse Plug-in  How to install the plug-in: 1. Start Eclipse. 2. Help  Install New Software Click the Add button 4. Name: SF Eclipse JavaCC Location: 5. Click the OK button. 6. Select “SF JavaCC Eclipse Plug-in” Or, visit

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 24 JavaCC Regular Expressions  Literals  Character classes  Character ranges  Alternates Token name Token string

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 25 JavaCC Regular Expressions  Negation  Repetition  Quantifiers

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 26 Example: helloworld1.jj options { BUILD_PARSER=false; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReader sr = new java.io.StringReader(args[0]); SimpleCharStream scs = new SimpleCharStream(sr); HelloWorldTokenManager mgr = new HelloWorldTokenManager(scs); for (Token t = mgr.getNextToken(); t.kind != EOF; t = mgr.getNextToken()) { debugStream.println("Found token: " + t.image); } } } TOKEN : { | } Literals  Recognize tokens “hello” and “world”. Main Java method of the tokenizer Demo

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 27 Example: helloworld2.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReader sr = new java.io.StringReader(args[0]); SimpleCharStream scs = new SimpleCharStream(sr); HelloWorldTokenManager mgr = new HelloWorldTokenManager(scs); while (mgr.getNextToken().kind != EOF) {} } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } }  Recognize tokens “hello” and “world”  Ignore case.  Ignore spaces and commas between tokens.  Use lexical actions. Character class Lexical actions Demo

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 28 options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | { debugStream.println("IDENTIFIER token: " + matchedToken.image); } } Example: helloworld3.jj  Recognize words other than “hello” and “world” as identifiers. Demo Why aren’t “hello” and “world” recognized as identifiers?

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 29 Example: helloworld4.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | ( | | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | }  Use private tokens DIGIT and LETTER. Private tokens Demo

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 30 Example: helloworld5.jj options { BUILD_PARSER=false; IGNORE_CASE=true; DEBUG_TOKEN_MANAGER=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | ( | | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | }  What’s the tokenizer doing?  Turn on debugging output. Demo

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 31 Review: DFA for a Pascal Identifier or Number digit E E [other] 3 - digit letter [other] letter private static final int matrix[][] = { /* letter digit + -. E other */ /* 0 */ { 1, 4, 3, 3, ERR, 1, ERR }, /* 1 */ { 1, 1, -2, -2, -2, 1, -2 }, /* 2 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 3 */ { ERR, 4, ERR, ERR, ERR, ERR, ERR }, /* 4 */ { -5, 4, -5, -5, 6, 9, -5 }, /* 5 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 6 */ { ERR, 7, ERR, ERR, ERR, ERR, ERR }, /* 7 */ { -8, 7, -8, -8, -8, 9, -8 }, /* 8 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 9 */ { ERR, 11, 10, 10, ERR, ERR, ERR }, /* 10 */ { ERR, 11, ERR, ERR, ERR, ERR, ERR }, /* 11 */ { -12, 11, -12, -12, -12, -12, -12 }, /* 12 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, }; Negative numbers in the matrix are accepting states.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 32 The JavaCC Tokenizer  The JavaCC tokenizer (scanner) is a DFA created from the regular expressions.  The generated Java code implements state-transitions with switch statements instead of a matrix.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 33 The JavaCC Tokenizer, cont’d  Example (from HelloWorldTokenManager.java ): static private int jjMoveStringLiteralDfa0_0() { switch(curChar) { case 72: case 104: return jjMoveStringLiteralDfa1_0(0x4L); case 87: case 119: return jjMoveStringLiteralDfa1_0(0x8L); default : return jjMoveNfa_0(0, 0); }

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 34 Assignment #5  Use JavaCC to create a simple Java tokenizer (i.e., scanner). Write the.jj file. Due Friday, October 31.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 35 Compiler Team Project  Write a compiler for a procedure-oriented source language that will generate code for the Java virtual machine (JVM).  The source language should be a procedural, non-object-oriented language or subset thereof, or a language that the team invents. Tip: Start with a simple language! No Scheme, Lisp, or Lisp-like languages.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 36 Compiler Team Project  The object language must be Jasmin, the assembly language for the Java virtual machine. You will be provided an assembler that translates Jasmin assembly language programs into.class files for the JVM.  You must use the JavaCC compiler-compiler.  You can also use any Java code from the Writing Compilers and Interpreters book.  Compile and run source programs written in your language.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 37 Compiler Team Project Deliverables  Java and JavaCC source files of a working compiler. The JavaCC.jj (or.jjt ) grammar files. Do not include the Java files that JavaCC generated.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 38 Compiler Team Project Deliverables, cont’d  Written report (5-10 pp. single spaced) Include: Syntax diagrams for key source language constructs. Include: Code diagrams for key source language constructs. Optional: UML diagrams for key compiler components.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 39 Compiler Team Project Deliverables, cont’d  Instructions on how to build your compiler. If it’s not standard or not obvious.  Instructions on how to run your compiler (scripts OK).  Sample source programs to compile and execute.  Sample output from executing your source programs.

Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 40 Post Mortem Report  Private individual post mortem report (up to 1 page from each student) What did you learn from this course? An assessment of your accomplishments for your project. An assessment of each of your project team members.