Download presentation
Presentation is loading. Please wait.
Published byDerick Thornton Modified over 9 years ago
1
CS 153: Concepts of Compiler Design October 27 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak www.cs.sjsu.edu/~mak 1
2
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak Tesla Motors Headquarters Visit Palo Alto Friday afternoon, November 14 See Piazza for details! 2
3
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 3 Review: JavaCC Compiler-Compiler Feed JavaCC the grammar for a source language and it will automatically generate a scanner and a parser. Specify the source language tokens with regular expressions JavaCC generates a scanner for the source language. Specify the source language syntax rules with Extended BNF JavaCC generates a parser for the source language.
4
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 4 Review: JavaCC Compiler-Compiler, cont’d The generated scanner and parser are written in Java. Note: JavaCC calls the scanner the “tokenizer”. _
5
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 5 Review: JavaCC Regular Expressions Literals Character classes Character ranges Alternates Token name Token string
6
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 6 Review: JavaCC Regular Expressions, cont’d Negation Repetition Quantifiers
7
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 7 JavaCC Parser Specification Use JavaCC regular expressions to specify tokens. Use EBNF to specify JavaCC production rules. Phone number example from Chapter 3 of the JavaCC book. Example phone number: 408-123-4567 EBNF: ::= 0|1|2|3|4|5|6|7|8|9 ::= ::= - -
8
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 8 JavaCC Parser Specification, cont’d EBNF : JavaCC : TOKEN : { ){4}> | ){3}> | } void PhoneNumber() : {} { "-" "-" } Token specifications Production rule Java statements can go in here! ::= 0|1|2|3|4|5|6|7|8|9 ::= ::= - - phone.jj Terminal Literal Terminal Nonterminal
9
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 9 JavaCC Production Rule Methods JavaCC generates a top-down recursive-descent parser. Each production rule becomes a Java method of the parser class. You can pass parameters to the methods. void PhoneNumber() : { StringBuffer sb = new StringBuffer(); } { AreaCode(sb) "-" {sb.append(token.image);} "-" {sb.append(token.image);} {System.out.println("Number: " + sb.toString());} } void AreaCode(StringBuffer buf) : {} { {buf.append(token.image);} } Java statement. phone_method_param.jj w/ and w/o parser debug Syntactic action.
10
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 10 Grammar Problems Be very careful when specifying grammars! JavaCC will not be able to generate a correct parser for a faulty grammar. Common grammar faults include choice conflict left recursion _
11
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 11 Choice Conflict Suppose we want to parse both local phone numbers and long-distance phone numbers: Local: 123-4567 Long-distance: 201-456-7890 ::= - ::= - - ::= Choice conflict! While attempting to parse “123-4567”, the parser cannot tell whether the initial “123” is a or an since they are both. phone_choice.jj
12
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 12 Choice Conflict Resolution: Left Factoring One way to resolve a choice conflict is by left factoring. Factor out the common head from the productions. void PhoneNumber() : {} { Head() "-" ( LocalNumber() | LongDistanceNumber() ) } void LocalNumber() : {} { } void LongDistanceNumber() : {} { "-" } void Head() : {} { } phone_left_factored.jj How does this fix the problem?
13
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 13 Lookahead A top-down parser naturally “looks ahead” one token. This token tells the parser which nonterminal it will parse next. “ IF ” : next parse an IF statement “ REPEAT ” : next parse a REPEAT statement A choice conflict occurs if a one-token lookahead is not sufficient to determine which nonterminal to parse next. Next parse a local number or a long-distance number?
14
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 14 Backtracking The parser cannot backtrack. Suppose the parser has parsed “123-” It decides that’s an area code, so it must be parsing a long-distance number. Now it sees “4567”. Oops! It cannot backtrack and reparse “123-” as the prefix to a local number. _
15
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 15 Choice Conflict Resolution: Lookahead Another way to resolve a choice conflict is by telling the parser to look ahead more than just one token. To decide between parsing a local number and a long-distance telephone number: One-token lookahead is insufficient: “123” Two-token lookahead is insufficient: “123-” Three-token lookahead will distinguish a local number from a long-distance number: “123-4567” void PhoneNumber() : {} { ( LOOKAHEAD(3) LocalNumber() | LongDistanceNumber() ) } By looking ahead three tokens, the parser can successfully choose between LocalNumber() and LongDistanceNumber(). phone_lookahead.jj
16
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 16 Lookahead Global lookahead Major performance penalty. Avoid if possible! Syntactic lookahead Semantic lookahead Nested lookahead Too convoluted! Minimize the need for these. Why would you design a grammar that needed these?
17
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 17 Lookahead Lookahead will slow down parsing. Try to design grammars that do not require more than one token of lookahead. For example, Pascal only requires one-token lookahead.
18
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 18 Left Recursion Suppose we want to parse very simple expressions like “1+2”, “1+2+3”, “9+4+7+2”, etc. ::= + | ::= Left recursion! The nonterminal refers to itself recursively such that the recursion will never end. Because the recursive reference is at the left end of the rule, no tokens are consumed. expression_left_recursion.jj ::= +
19
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 19 Left Recursion Resolution: Iteration Resolve left recursion by replacing it with iteration. Instead of: ::= + | ::= Use EBNF: ::= { + } ::= void Expression() : {} { Term() ("+" Term())* { System.out.println("Parsed expression"); } } expression_iteration.jj
20
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 20 Right Recursion Right recursion: ::= + | ::= Right recursion is not a problem for JavaCC. Because there are non-recursive references to the left of the recursive reference, tokens are consumed by the scanner. The parser continues to make forward progress. The recursion ends as soon as the parser sees a token that doesn’t fit the production rule.
21
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 21 Right Recursion expression_right_recursion.jj However, there may be choice conflicts. Does a start + or simply ? How much lookahead do we need?
22
Computer Science Dept. Fall 2014: October 27 CS 153: Concepts of Compiler Design © R. Mak 22 JJDoc JJDoc produces documentation for your grammar. Right-click in the.jj edit window. It generates an HTML file from a.jj grammar file. Read Chapter 5 of the JavaCC book. Ideal for your project documentation! Demo
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.