Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak.

Similar presentations


Presentation on theme: "CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak."— Presentation transcript:

1 CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak 1

2 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 2 Backus Naur Form (BNF)  A text-based way to describe source language syntax. Named after John Backus and Peter Naur.  Text-based means it can be read by a program.  Example: A compiler-compiler that can automatically generate a parser for a source language after reading (and parsing) the language’s syntax rules written in BNF.

3 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 3 Backus Naur Form (BNF), cont’d  Uses certain meta-symbols. Symbols that are part of BNF itself but are not necessarily part of the syntax of the source language. ::= “is defined as” | “or” Surround names of nonterminal (not literal) items

4 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 4 BNF Example: U.S. Postal Address ::= | ::= |. ::= Sr. | Jr. | ::= | ::=, ::= ::= A|B|C|D|E|F|G|H|I|J|K|L|M |N|O|P|Q|R|S|T|U|V|W|X|Y|Z ::= … etc. Mary Jane 123 Easy Street San Jose, CA 95192

5 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 5 BNF: Optional and Repeated Items  To show optional items in BNF, use the vertical bar |.  Example: “An expression is a simple expression optionally followed by a relational operator and another simple expression.” ::= |

6 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 6 BNF: Optional and Repeated Items  BNF uses recursion for repeated items.  Example: “A digit sequence is a digit followed by zero or more digits.” Right recursive Left recursive ::= |

7 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 7 BNF Example: Pascal Number ::= | ::= ::=. |. | ::= | ::= E | e ::= + | - ::= | Repetition via recursion. The sign is optional.

8 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 8 BNF Example: Pascal IF Statement ::= IF THEN | IF THEN ELSE  It should be straightforward to write a parsing method from either the syntax diagram or the BNF.

9 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 9  A grammar defines a language.  Grammar = the set of all the BNF rules (or syntax diagrams)  Language = the set of all the legal strings of tokens according to the grammar  Legal string of tokens = a syntactically correct statement Grammars and Languages

10 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 10  A statement is in the language (it’s syntactically correct) if it can be derived by the grammar.  Each grammar rule “produces” a token string in the language.  The sequence of productions required to arrive at a syntactically correct token string is the derivation of the string. Grammars and Languages

11 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 11 Grammars and Languages, cont’d  Example: A very simplified expression grammar:  What strings (expressions) can we derive from this grammar? ::= | | ( ) ::= 0|1|2|3|4|5|6|7|8|9 ::= + | *

12 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 12 Derivations and Productions  Is (1 + 2)*3 valid in our expression language?  Yes! The expression is valid. ::= 1  ( 1 + 2 )*3 ::=  ( + 2 )*3 ::= +  ( + 2 )*3 ::= 2  ( 2 )*3 ::=  ( )*3 ::=  ( )*3 ::= ( )  ( )*3 ::= *  *3 ::= 3  3 ::=  ::=  GRAMMAR RULEPRODUCTION ::= | | ( ) ::= 0|1|2|3|4|5|6|7|8|9 ::= + | *

13 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 13 Extended BNF (EBNF)  Extended BNF (EBNF) adds meta-symbols { } and [ ] { } Surround items to be repeated zero or more times. [ ] Surround optional items.  Originally developed by Niklaus Wirth. Inventor of Pascal. Early user of syntax diagrams.

14 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 14 Extended BNF (EBNF)  Repetition (one or more): BNF: EBNF: ::= | ::= { }

15 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 15 Extended BNF, cont’d  Optional items. BNF: EBNF: ::= IF THEN | IF THEN ELSE ::= IF THEN [ ELSE ]

16 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 16 Extended BNF, cont’d  Optional items. BNF: EBNF: ::= | ::= [ ]

17 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 17 Compiler-Compilers  Professional compiler writers generally do not write scanners and parsers from scratch.  A compiler-compiler is a tool for writing compilers.  It can include: A scanner generator A parser generator Parse tree utilities

18 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 18 Compiler-Compilers, cont’d  Feed a compiler-compiler a grammar written in a textual form such as BNF or EBNF.  The compiler-compiler outputs code written in a high-level language that implements a scanner, parser, and a parse tree.

19 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 19 Popular Compiler-Compilers  Yacc “Yet another compiler-compiler” Generates a bottom-up parser written in C. GNU version: Bison  Lex Generates a scanner written in C. GNU version: Flex  JavaCC Generates a scanner and a top-down parser written in Java.  JJTree Preprocessor for JavaCC grammars. Generates Java code to build and process parse trees. The code generated by a compiler-compiler can be in a high level language such as C or Java. However, you may find the code to be ugly and hard to read.

20 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 20 JavaCC Compiler-Compiler  Feed JavaCC the grammar for a source language and it will automatically generate a scanner and a parser. Specify the source language tokens with regular expressions  JavaCC generates a scanner for the source language. Specify the source language syntax rules with Extended BNF  JavaCC generates a parser for the source language.

21 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 21 JavaCC Compiler-Compiler, cont’d  The generated scanner and parser are written in Java.  Note: JavaCC calls the scanner the “tokenizer”.

22 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 22 Download and Install  Download and install JavaCC and JJTree http://javacc.java.net/  Download the examples from the book Generating Parsers with JavaCC http://generatingparserswithjavacc.com/examples.ht ml http://generatingparserswithjavacc.com/examples.ht ml

23 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 23 The JavaCC Eclipse Plug-in  How to install the plug-in: 1. Start Eclipse. 2. Help  Install New Software... 3. Click the Add button 4. Name: SF Eclipse JavaCC Location: http://eclipse-javacc.sourceforge.net/http://eclipse-javacc.sourceforge.net/ 5. Click the OK button. 6. Select “SF JavaCC Eclipse Plug-in” Or, visit http://sourceforge.net/projects/eclipse-javacc/ http://sourceforge.net/projects/eclipse-javacc/

24 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 24 JavaCC Regular Expressions  Literals  Character classes  Character ranges  Alternates Token name Token string

25 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 25 JavaCC Regular Expressions  Negation  Repetition  Quantifiers

26 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 26 Example: helloworld1.jj options { BUILD_PARSER=false; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReader sr = new java.io.StringReader(args[0]); SimpleCharStream scs = new SimpleCharStream(sr); HelloWorldTokenManager mgr = new HelloWorldTokenManager(scs); for (Token t = mgr.getNextToken(); t.kind != EOF; t = mgr.getNextToken()) { debugStream.println("Found token: " + t.image); } } } TOKEN : { | } Literals  Recognize tokens “hello” and “world”. Main Java method of the tokenizer Demo

27 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 27 Example: helloworld2.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReader sr = new java.io.StringReader(args[0]); SimpleCharStream scs = new SimpleCharStream(sr); HelloWorldTokenManager mgr = new HelloWorldTokenManager(scs); while (mgr.getNextToken().kind != EOF) {} } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } }  Recognize tokens “hello” and “world”  Ignore case.  Ignore spaces and commas between tokens.  Use lexical actions. Character class Lexical actions Demo

28 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 28 options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | { debugStream.println("IDENTIFIER token: " + matchedToken.image); } } Example: helloworld3.jj  Recognize words other than “hello” and “world” as identifiers. Demo Why aren’t “hello” and “world” recognized as identifiers?

29 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 29 Example: helloworld4.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | ( | | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | }  Use private tokens DIGIT and LETTER. Private tokens Demo

30 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 30 Example: helloworld5.jj options { BUILD_PARSER=false; IGNORE_CASE=true; DEBUG_TOKEN_MANAGER=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) {... } SKIP : { } TOKEN : { { debugStream.println("HELLO token: " + matchedToken.image); } | { debugStream.println("WORLD token: " + matchedToken.image); } | ( | | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | }  What’s the tokenizer doing?  Turn on debugging output. Demo

31 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 31 Review: DFA for a Pascal Identifier or Number 69104711 digit + + - E E. 58 12 [other] 3 - digit 1 2 0 letter [other] letter private static final int matrix[][] = { /* letter digit + -. E other */ /* 0 */ { 1, 4, 3, 3, ERR, 1, ERR }, /* 1 */ { 1, 1, -2, -2, -2, 1, -2 }, /* 2 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 3 */ { ERR, 4, ERR, ERR, ERR, ERR, ERR }, /* 4 */ { -5, 4, -5, -5, 6, 9, -5 }, /* 5 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 6 */ { ERR, 7, ERR, ERR, ERR, ERR, ERR }, /* 7 */ { -8, 7, -8, -8, -8, 9, -8 }, /* 8 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, /* 9 */ { ERR, 11, 10, 10, ERR, ERR, ERR }, /* 10 */ { ERR, 11, ERR, ERR, ERR, ERR, ERR }, /* 11 */ { -12, 11, -12, -12, -12, -12, -12 }, /* 12 */ { ERR, ERR, ERR, ERR, ERR, ERR, ERR }, }; Negative numbers in the matrix are accepting states.

32 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 32 The JavaCC Tokenizer  The JavaCC tokenizer (scanner) is a DFA created from the regular expressions.  The generated Java code implements state-transitions with switch statements instead of a matrix.

33 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 33 The JavaCC Tokenizer, cont’d  Example (from HelloWorldTokenManager.java ): static private int jjMoveStringLiteralDfa0_0() { switch(curChar) { case 72: case 104: return jjMoveStringLiteralDfa1_0(0x4L); case 87: case 119: return jjMoveStringLiteralDfa1_0(0x8L); default : return jjMoveNfa_0(0, 0); }

34 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 34 Assignment #5  Use JavaCC to create a simple Java tokenizer (i.e., scanner). Write the.jj file. Due Friday, October 31.

35 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 35 Compiler Team Project  Write a compiler for a procedure-oriented source language that will generate code for the Java virtual machine (JVM).  The source language should be a procedural, non-object-oriented language or subset thereof, or a language that the team invents. Tip: Start with a simple language! No Scheme, Lisp, or Lisp-like languages.

36 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 36 Compiler Team Project  The object language must be Jasmin, the assembly language for the Java virtual machine. You will be provided an assembler that translates Jasmin assembly language programs into.class files for the JVM.  You must use the JavaCC compiler-compiler.  You can also use any Java code from the Writing Compilers and Interpreters book.  Compile and run source programs written in your language.

37 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 37 Compiler Team Project Deliverables  Java and JavaCC source files of a working compiler. The JavaCC.jj (or.jjt ) grammar files. Do not include the Java files that JavaCC generated.

38 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 38 Compiler Team Project Deliverables, cont’d  Written report (5-10 pp. single spaced) Include: Syntax diagrams for key source language constructs. Include: Code diagrams for key source language constructs. Optional: UML diagrams for key compiler components.

39 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 39 Compiler Team Project Deliverables, cont’d  Instructions on how to build your compiler. If it’s not standard or not obvious.  Instructions on how to run your compiler (scripts OK).  Sample source programs to compile and execute.  Sample output from executing your source programs.

40 Computer Science Dept. Fall 2015: October 21 CS 153: Concepts of Compiler Design © R. Mak 40 Post Mortem Report  Private individual post mortem report (up to 1 page from each student) What did you learn from this course? An assessment of your accomplishments for your project. An assessment of each of your project team members.


Download ppt "CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak."

Similar presentations


Ads by Google