Download presentation
Presentation is loading. Please wait.
1
CS 432: Compiler Construction Lecture 2
Department of Computer Science Salisbury University Fall 2017 Instructor: Dr. Sophie Wang 7/28/2018
2
Key Steps for Success Whenever you develop a complex program such as a compiler or an interpreter, key first steps for success are: Design and implement a proper framework. Develop initial components that are well-integrated with the framework and with each other. Test the framework and the component integration by running simple end-to-end tests. Early component integration is critical, even if the initial components are greatly simplified and don’t do very much. 7/28/2018
3
Key Steps for Success, cont’d
Test your framework and components and get them working together as early as possible. The framework and the initial components then form the basis upon which you can do further development. You should always be building on code that already works. 7/28/2018
4
Goals and Approach A source language-independent framework.
Initial Pascal source language-specific components integrated into the front end of the framework. Initial compiler components integrated into the back end of the framework. Simple end-to-end runs that exercise the components by generating source program listings from the common front end and messages from the compiler back end.
5
Language-Independent Framework Components
The framework consists of three packages: frontend, intermediate, and backend. Framework components are source language independent interfaces and classes that define the framework.
6
Three Java Packages FROM: TO: UML package and class diagrams. Package
7/28/2018
7
UML convention An arrow with an open arrowhead represents a reference or dependency by one class to another. A dashed arrow is a transient reference that exists only during a method call. A solid arrow with a hollow diamond at the owner’s end indicates that one class “owns” or “aggregates” another using a reference that lasts the lifetime of an object. The name of the field that holds the reference labels the arrow. A solid arrow with a closed hollow arrowhead points from a subclass to its superclass. The name of an abstract class is in italics. The name of an abstract method is also in italics.
8
Front End Class Relationships
field “owns a” abstract class transient relationship class + public - private # protected ~ package These four framework classes should be source language-independent. 7/28/2018
9
UML convention Below the class name, a class diagram can optionally include sections for the fields and for the methods. Field names that are arrow labels do not appear again inside the field section. A character before each field or method name indicates access control: + public - private # protected ~ package A colon separates each field name or method name from the field type or the return type, respectively. To save space, class diagrams usually don’t show constructors and field getter and setter methods.
10
Front End Fields and Methods
abstract method “subclass of” 7/28/2018
11
The Abstract Parser Class
Listing 2-1 7/28/2018
12
The Abstract Parser Class
Fields iCode and symTab refer to the intermediate code and the symbol table. Field scanner refers to the scanner. Abstract parse() and getErrorCount() methods. To be implemented by language-specific parser subclasses. “Convenience methods” currentToken() and nextToken() simply call the currentToken() and nextToken() methods of Scanner. 7/28/2018
13
The Abstract Scanner Class
List 2-3 7/28/2018
14
The Abstract Scanner Class
Private field currentToken refers to the current token, which protected method currentToken() returns. Method nextToken() calls abstract method extractToken(). To be implemented by language-specific scanner subclasses. Convenience methods currentChar() and nextChar() call the corresponding methods of Source. 7/28/2018
15
The Token Class List 2-4 7/28/2018
16
The Token Class Field text is the string that comprises the token.
Field value is for tokens that have a value, such as a number. Field type is the token type. Fields lineNum and position tell where the token is in the source file. Default method extract() will be overridden by language-specific token subclasses. Convenience methods currentChar(), nextChar(), and peekChar() call the corresponding methods of the Source class. 7/28/2018
17
The Source Class List 2-2 7/28/2018
18
The Source Class Field reader is the reader of the source. Field line stores a single line from the source file. Fields lineNum and currentPos keep track of the position of the current character. Method currentChar() returns the current source character. Method nextChar() returns the next character. 7/28/2018
19
Current Character vs. Next Character
Suppose the source line contains ABCDE and we’ve already read the first character. currentChar() A nextChar() B C D E eol 7/28/2018
20
Message While it is translating a source program, the parser may need to report some status information, such as an error message whenever it finds a syntax error. However, you don’t want the parser to worry about where it should send the message or what the recipient does with it. Similarly, whenever the source component reads a new line, it can send a message containing the text of the line and the line number. The recipient may want to use these messages to produce a source listing, but you don’t want the source component to care about that.
21
Messages from the Front End
The Parser generates messages. Syntax error messages Parser summary number of source lines parsed number of syntax errors total parsing time The Source generates messages. For each source line: line number contents of the line _ 7/28/2018
22
Front End Messages, cont’d
We want the message producers (Parser and Source) to be loosely-coupled from the message listeners. The producers shouldn’t care who listens to their messages. The producers shouldn’t care what the listeners do with the messages. The listeners should have the flexibility to do whatever they want with the messages. Producers implement the MessageProducer interface. Listeners implement the MessageListener interface. _ 7/28/2018
23
Front End Messages, cont’d
A listener registers its interest in the messages from a producer. Whenever a producer generates a message, it “sends” the message to all of its registered listeners. A message producer can delegate message handling to a MessageHandler. This is the Observer Design Pattern. _ 7/28/2018
24
Design Note Observer Design Pattern allows message producers and message listeners to remain loosely coupled. Loose coupling in this case means that a producer’s responsibilities are limited to generating messages and notifying the listeners. The producer doesn’t need to care who the listeners are or what they do with the messages. Without any code changes, it can add or remove listeners and accommodate any type of new listener that implements the MessageListener interface. Changes to the producers or to the listeners will not affect each other, as would be the case if they were tightly coupled. A message producer class can use the MessageHandler helper class to do the work of maintaining and notifying its listeners.
25
This is an example of delegation, a software engineering technique where one class asks another class to handle some task. Delegation also limits a class’s responsibilities and supports loose coupling, and the delegate (the MessageHandler class in this case) can be used by other classes. This is more flexible than implementing the task in a superclass and forcing the producer classes to extend the superclass. In general, favor composition (with a delegate) over inheritance.
26
Design Note Dashed arrow with a closed hollow arrowhead points from a class to an interface that the class implements.
27
Message Implementation
Message producers implement the MessageProducer interface. Message listeners implement the MessageListener interface. A message producer can delegate message handling to a MessageHandler. Each Message has a message type and a body. “implements” This appears to be a lot of extra work, but it will be easy to use and it will pay back large dividends. multiplicity “zero or more” 7/28/2018
28
Two Message Types SOURCE_LINE message PARSER_SUMMARY message
the source line number text of the source line PARSER_SUMMARY message number of source lines read number of syntax errors total parsing time By convention, the message producers and the message listeners agree on the format and content of the messages. 7/28/2018
29
Intermediate Tier According to our conceptual design, the intermediate code and the symbol table are the interface between the front and back ends. For now, simply define two placeholder interfaces as framework components, ICode and SymTab, in the intermediate package.
31
Interface Icode and SymTab
Listing 2-14 Listing 2-15
32
Back End The conceptual design states that the back end will support either a compiler or an interpreter the Backend class in the backend package is also a message producer. the Backend class in package backend implements the MessageProducer interface and delegates message handling to the MessageHandler helper class. The abstract process() method requires references to the intermediate code and to the symbol table. A compiler would implement process() to generate object code. Listing 2-16
33
Initial Back End Implementations
The back end of the framework supports compilers. For now, the stubbedout implementations of the abstract Backend framework class: one for the compiler.
34
Initial Back End Subclasses
The CodeGenerator and Executor subclasses will only be (do-nothing) stubs for now. Strategy Design Pattern 7/28/2018
35
The Code Generator Class
All the process() method does for now is send the COMPILER_SUMMARY message. number of instructions generated (none for now) code generation time (nearly no time at all for now) public void process(ICode iCode, SymTab symTab) throws Exception { long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int instructionCount = 0; // Send the compiler summary message. sendMessage(new Message(COMPILER_SUMMARY, new Number[] {instructionCount, elapsedTime})); } 7/28/2018
36
The Executor Class All the process() method does for now is send the INTERPRETER_SUMMARY message. number of statements executed (none for now) number of runtime errors (none for now) execution time (nearly no time at all for now) public void process(ICode iCode, SymTab symTab) throws Exception { long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int executionCount = 0; int runtimeErrors = 0; // Send the interpreter summary message. sendMessage(new Message(INTERPRETER_SUMMARY, new Number[] {executionCount, runtimeErrors, elapsedTime})); } 7/28/2018
37
Pascal-Specific Front End Classes
PascalParserTD is a subclass of Parser and implements the parse() and getErrorCount() methods for Pascal. TD for “top down” PascalScanner is a subclass of Scanner and implements the extractToken() method for Pascal. This is the Strategy Design Pattern. 7/28/2018
38
Pascal Parser The initial implementation of a Pascal parser is extremely simplified. The class name PascalParserTD indicates the source language and the parser type. The TD stands for top down, which is the type of parser you’ll develop in the next several chapters. See Listing 2-17
39
The Pascal Parser Class
The initial version of method parse() does hardly anything, but it forces the scanner into action and serves our purpose of doing end-to-end testing. public void parse() throws Exception { Token token; long startTime = System.currentTimeMillis(); while (!((token = nextToken()) instanceof EofToken)) {} // Send the parser summary message. float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; sendMessage(new Message(PARSER_SUMMARY, new Number[] {token.getLineNumber(), getErrorCount(), elapsedTime})); } What is this while loop doing? 7/28/2018
40
Pascal Scanner Our initial implementation of a Pascal scanner is also greatly simplified. The PascalScanner class implements the extractToken() method of its Scanner superclass. See Listing 2-18.
41
The Pascal Scanner Class
The initial version of method extractToken() doesn’t do much either, other than create and return either a default token or the EOF token. protected Token extractToken() throws Exception { Token token; char currentChar = currentChar(); // Construct the next token. The current character determines the // token type. if (currentChar == EOF) { token = new EofToken(source); } else { token = new Token(source); return token; Remember that the Scanner method nextToken() calls the abstract method extractToken(). Here, the Scanner subclass PascalScanner implements method extractToken(). 7/28/2018
42
The Token Class The Token class’s default extract() method extracts one character from the source. This method will be overridden by the various token subclasses. It serves our purpose of doing end-to-end testing. protected void extract() throws Exception { text = Character.toString(currentChar()); value = null; nextChar(); // consume current character } A character (or a token) is “consumed” after it has been read and processed, and the next one is about to be read. If you forget to consume, you will loop forever on the same character or token. 7/28/2018
43
A Front End Factory The framework components in the front end are language-independent. You then integrate language specific components into the framework. The framework can support parsers for different source languages, and even multiple types of parsers for a specific language. Also, for any specific language, the parser and the scanner are closely related. A factory class makes it easier for a compiler or an interpreter to create proper front end components for specific languages. See Listing 2-19
44
Design Note The factory class is more than just a convenience. Because the parser and the scanner are closely related, using the factory class ensures that they’re created in matched pairs. For example, a Pascal parser is always created with a Pascal scanner. Using a factory class also preserves flexibility. The assignment Parser parser = FrontendFactory.createParser( … ); is more flexible than PascalParserTD parser = new PascalParserTD( … ); The latter permanently ties variable parser to a top-down Pascal parser. On the other hand, the call to the factory allows us to change the values of the arguments to create different parsers without changing any other code.
45
A Front End Factory Class
A language-specific parser goes together with a scanner for the same language. But we don’t want the framework classes to be tied to a specific language. Framework classes should be language-independent. We use a factory class to create a matching parser-scanner pair. _ 7/28/2018
46
A Front End Factory Class, cont’d
Good: Parser parser = FrontendFactory.createParser( … ); Arguments to the createParser() method enable it to create and return a parser bound to an appropriate scanner. Variable parser doesn’t have to know what kind of parser subclass the factory created. Once again, the idea is to maintain loose coupling. Bad: PascalParserTD parser = new PascalParserTD( … ); Now variable parser is tied to a specific language. “Coding to the interface.” 7/28/2018
47
A Front End Factory Class, cont’d
public static Parser createParser(String language, String type, Source source) throws Exception { if (language.equalsIgnoreCase("Pascal") && type.equalsIgnoreCase("top-down")) Scanner scanner = new PascalScanner(source); return new PascalParserTD(scanner); } else if (!language.equalsIgnoreCase("Pascal")) { throw new Exception("Parser factory: Invalid language '" + language + "'"); else { throw new Exception("Parser factory: Invalid type '" + type + "'"); 7/28/2018
48
A Back End Factory Like the one in the front end, a back end factory class creates proper back end components. See Listing 2-22.
49
A Back End Factory Class
public static Backend createBackend(String operation) throws Exception { if (operation.equalsIgnoreCase("compile") { return new CodeGenerator(); } else if (operation.equalsIgnoreCase("execute")) { return new Executor(); else { throw new Exception("Backend factory: Invalid operation '" + operation + "'"); 7/28/2018
50
Program 2: Program Listings
The framework components and the initial implementation components are all in place and integrated. Some simple end-to-end tests will verify that you’ve designed and developed these components correctly. A compiler test will cause the front end to parse a source Pascal program, generate a listing, and print the message produced by the compiler back end.
51
See Listing 2-23. java –classpath classes Pascal execute newton.pas
Command line: java –classpath classes Pascal execute newton.pas See Listing 2-23.
52
End-to-End: Program Listings
Here’s the heart of the main Pascal class’s constructor: source = new Source(new BufferedReader(new FileReader(filePath))); source.addMessageListener(new SourceMessageListener()); parser = FrontendFactory.createParser("Pascal", "top-down", source); parser.addMessageListener(new ParserMessageListener()); backend = BackendFactory.createBackend(operation); backend.addMessageListener(new BackendMessageListener()); parser.parse(); iCode = parser.getICode(); symTab = parser.getSymTab(); backend.process(iCode, symTab); source.close(); The front end parser creates the icode and the symtab of the intermediate tier. The back end processes the icode and the symtab. 7/28/2018
53
Listening to Messages Demo
Class Pascal has inner classes that implement the MessageListener interface. private static final String SOURCE_LINE_FORMAT = "%03d %s"; private class SourceMessageListener implements MessageListener { public void messageReceived(Message message) MessageType type = message.getType(); Object body[] = (Object []) message.getBody(); switch (type) { case SOURCE_LINE: { int lineNumber = (Integer) body[0]; String lineText = (String) body[1]; System.out.println(String.format(SOURCE_LINE_FORMAT, lineNumber, lineText)); break; } Demo 7/28/2018
54
Design Note One of the major software engineering challenges is managing change. To manage change, apply the software engineering principle that says to encapsulate the code that will vary in order to isolate it from the code that won’t. The Strategy Design Pattern specifies such use of class families. Implementing this design pattern early to manage change will return great dividends
55
Flexibility of this framework
language-independent - change the source language to Java and add the Parser subclass JavaParserTD, which will require a new Scanner subclass JavaScanner. technique-independent - change the top-down Pascal parser to a bottom-up parser and add a new Parser subclass PascalParserBU, which would use the same PascalScanner class. Outcome-independent – can support either a compiler or an interpreter by subclassing the back end class.
56
Design Note Java’s access control modifiers, public, protected, and private, play a major role in maintaining program security, reliability, and modularity. A public field or method is accessible by any class without restrictions. A protected field or method is accessible by any class defined within the same package or by any subclasses defined in other packages. A package field or method (this is the default, since it requires no access control modifier) is accessible only by any class defined within the same package. A private field or method is accessible only within its class.
57
Design Note You will often use protected fields and methods to share them with other objects within the same package, such as the front end package. Protected access is especially useful when you define language-specific subclasses of the framework classes. Each subclass can then access a protected field or method of its superclass. Public methods serve as gateways across packages. For example, the parser’s parse() method is public so it can be called from outside the package. Limiting the number of such public methods helps to preserve modularity. Top-level classes can themselves have public or package access to control who can reference their objects. A class defined within another class can be private. During program design, try to specify as well as possible which access control modifiers are appropriate for each class, field, and method.
58
Design Note A good rule of thumb is to use the most restrictive access control as practicable. However, in a program as complex as a compiler or an interpreter, it is very normal to have to go back and change existing modifiers as you develop more of the program.
59
Is it Really Worth All this Trouble?
Major software engineering challenges: Managing change. Managing complexity. To help manage change, use the open-closed principle. Close the code for modification. Open the code for extension. Closed: The language-independent framework classes. Open: The language-specific subclasses. Techniques to help manage complexity: Partitioning Loose coupling Incremental development Always build upon working code. Good object-oriented design Use design patterns. 7/28/2018
60
Source Files from the Book
Download the Java source code from each chapter of the book: You will not survive this course if you use a simple text editor like Notepad to view and edit the Java code. The complete Pascal interpreter in Chapter 12 contains 127 classes and interfaces. You can use either Eclipse or NetBeans. Learn how to create projects, edit source files, single-step execution, set breakpoints, examine variables, read stack dumps, etc. Eclipse is preferred because there is a JavaCC plug-in. 7/28/2018
61
How to Scan for Tokens Suppose the source line contains IF (index >= 10) THEN The scanner skips over the leading blanks. The current character is I, so the next token must be a word. The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. The scanner determines that the word is a reserved word. 7/28/2018
62
How to Scan for Tokens, cont’d
The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol. After extracting the special symbol token, the current character is i. The next token must be a word. After extracting the word token, the current character is a blank. 7/28/2018
63
How to Scan for Tokens, cont’d
Skip the blank. The current character is >. Extract the special symbol token. The current character is a blank. Skip the blank. The current character is 1, so the next token must be a number. After extracting the number token, the current character is ). 7/28/2018
64
How to Scan for Tokens, cont’d
Extract the special symbol token. The current character is a blank. Skip the blank. The current character is T, so the next token must be a word. Extract the word token. Determine that it’s a reserved word. The current character is \n, so the scanner is done with this line. 7/28/2018
65
Basic Scanning Algorithm
Skip any blanks until the current character is nonblank. In Pascal, a comment and the end-of-line character each should be treated as a blank. The current (nonblank) character determines what the next token is and becomes that token’s first character. Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token. Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token. 7/28/2018
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.