CMPE 152: Compiler Design August 30 Class Meeting

Slides:



Advertisements
Similar presentations
Programming Languages and Paradigms
Advertisements

George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
CS 153: Concepts of Compiler Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 2 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 9 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design October 5 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 152: Programming Language Paradigms April 2 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak
CS 153: Concepts of Compiler Design August 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
BASICS CONCEPTS OF ‘C’.  C Character Set C Character Set  Tokens in C Tokens in C  Constants Constants  Variables Variables  Global Variables Global.
CS 153: Concepts of Compiler Design September 16 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design October 7 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design October 10 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 30 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
CPS120: Introduction to Computer Science Variables and Constants.
CS 152: Programming Language Paradigms April 7 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak
CS 153: Concepts of Compiler Design October 12 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 23 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 152: Programming Language Paradigms May 12 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak
CS 153: Concepts of Compiler Design September 28 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS 432: Compiler Construction Lecture 9
CS 153: Concepts of Compiler Design September 14 Class Meeting
CS 153: Concepts of Compiler Design August 29 Class Meeting
CS 153: Concepts of Compiler Design October 17 Class Meeting
CS 153: Concepts of Compiler Design October 5 Class Meeting
CS 432: Compiler Construction Lecture 2
CS 153: Concepts of Compiler Design August 31 Class Meeting
CS 153: Concepts of Compiler Design September 7 Class Meeting
CS 153: Concepts of Compiler Design October 3 Class Meeting
CMPE 152: Compiler Design February 6 Class Meeting
CMPE 152: Compiler Design September 25 Class Meeting
CMPE 152: Compiler Design September 4 Class Meeting
CMPE 152: Compiler Design September 6 Class Meeting
CMPE 152: Compiler Design September 11 Class Meeting
Starting JavaProgramming
CMPE 152: Compiler Design September 18 Class Meeting
CMPE 152: Compiler Design September 13 Class Meeting
CMPE 152: Compiler Design October 2 Class Meeting
CMPE 152: Compiler Design October 4 Class Meeting
CMPE 152: Compiler Design August 28 Class Meeting
CMPE 152: Compiler Design October 4 Class Meeting
CMPE 152: Compiler Design August 23 Class Meeting
CMPE 152: Compiler Design August 21/23 Lab
CMPE 152: Compiler Design September 20 Class Meeting
CMPE 152: Compiler Design September 27 Class Meeting
CMPE 152: Compiler Design January 31 Class Meeting
CS 432: Compiler Construction Lecture 11
CMPE 152: Compiler Design February 14 Class Meeting
Fundamental Programming
CMPE 152: Compiler Design February 28 Class Meeting
CMPE 152: Compiler Design March 7 Class Meeting
CMPE 152: Compiler Design April 9 Class Meeting
The Recursive Descent Algorithm
CMPE 152: Compiler Design January 29 Class Meeting
CMPE 152: Compiler Design February 12 Class Meeting
CMPE 152: Compiler Design February 7 Class Meeting
CMPE 152: Compiler Design February 21 Class Meeting
CMPE 152: Compiler Design February 26 Class Meeting
CMPE 152: Compiler Design March 7 Class Meeting
CMPE 152: Compiler Design March 5 Class Meeting
CMPE 152: Compiler Design March 19 Class Meeting
CMPE 152: Compiler Design March 7/12 Lab
CMPE 152: Compiler Design September 3 Class Meeting
CMPE 152: Compiler Design August 27 Class Meeting
CMPE 152: Compiler Design September 17 Class Meeting
CMPE 152: Compiler Design February 7 Class Meeting
CMPE 152: Compiler Design September 26 Class Meeting
CMPE 152: Compiler Design September 19 Class Meeting
Presentation transcript:

CMPE 152: Compiler Design August 30 Class Meeting Department of Computer Engineering San Jose State University Fall 2018 Instructor: Ron Mak www.cs.sjsu.edu/~mak

Chapter 3 Syntax Diagrams

How to Scan for Tokens Suppose the source line contains IF (index >= 10) THEN The scanner skips over the leading blanks. The current character is I, so the next token must be a word. The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. The scanner determines that the word is a reserved word.

How to Scan for Tokens, cont’d The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol. After extracting the special symbol token, the current character is i. The next token must be a word. After extracting the word token, the current character is a blank.

How to Scan for Tokens, cont’d Skip the blank. The current character is >. Extract the special symbol token. The current character is a blank. Skip the blank. The current character is 1, so the next token must be a number. After extracting the number token, the current character is ).

How to Scan for Tokens, cont’d Extract the special symbol token. The current character is a blank. Skip the blank. The current character is T, so the next token must be a word. Extract the word token. Determine that it’s a reserved word. The current character is \n, so the scanner is done with this line.

Basic Scanning Algorithm Skip any blanks until the current character is nonblank. In Pascal, a comment and the end-of-line character each should be treated as a blank. The current (nonblank) character determines what the next token is and becomes that token’s first character. Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token. Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token.

Pascal-Specific Subclasses

Class PascalScanner The first character determines the type Token *PascalScanner::extract_token() throw (string) {     skip_white_space();     Token *token;     char current_ch = current_char();     string string_ch = " ";     string_ch[0] = current_ch;     // Construct the next token.  The current character determines the     // token type.     if (current_ch == Source::END_OF_FILE)     {         token = nullptr;     }     else if (isalpha(current_ch))         token = new PascalWordToken(source);     else if (isdigit(current_ch))         token = new PascalNumberToken(source); ... return token; } The first character determines the type of the next token. wci/frontend/pascal/PascalScanner.cpp

Pascal-Specific Token Classes Each class PascalWordToken, PascalNumberToken, PascalStringToken, PascalSpecial-SymbolToken, and PascalErrorToken is is a subclass of class PascalToken. PascalToken is a subclass of class Token. Each Pascal token subclass overrides the default extract() method of class Token. The default method could only create single-character tokens. Loosely coupled. Highly cohesive.

Class PascalWordToken void PascalWordToken::extract() throw (string) {     char current_ch = current_char();     // Get the word characters (letter or digit). The scanner has     // already determined that the first character is a letter.     while (isalnum(current_ch))     {         text += current_ch;         current_ch = next_char();  // consume character     }     // Is it a reserved word or an identifier?     string upper_case = to_upper(text);     if (PascalToken::RESERVED_WORDS.find(upper_case)             != PascalToken::RESERVED_WORDS.end())         // Reserved word.         type = (TokenType) PascalToken::RESERVED_WORDS[upper_case];         value = new DataValue(upper_case);     else         // Identifier.         type = (TokenType) PT_IDENTIFIER; } frontend/pascal/tokens/PascalWordToken.cpp

Pascal String Tokens A Pascal string literal constant uses single quotes. Two consecutive single quotes represents a single quote character inside a string. 'Don''t' is the string consisting of the characters Don't. A Pascal character literal constant is simply a string with only a single character, such as 'a'. Pascal token subclass PascalStringToken.

Pascal Number Tokens Any leading + or – sign before the literal constant is a separate token. A Pascal integer literal constant is an unsigned integer. A Pascal real literal constant starts with an unsigned integer (the whole part) followed by either A decimal point followed by another unsigned integer (the fraction part), or An E or e, optionally followed by + or -, followed by an unsigned integer (the exponent part), or A whole part followed by an exponent part.

Class PascalNumberToken For the token string "31415.926e-4", method extractNumber() passes the following parameter values to method computeFloatValue(): wholeDigits "31415" fractionDigits "926" exponentDigits "4" exponentSign '-' Compute variable exponentValue: 4 as computed by computeIntegerValue() -4 after negation since exponentSign is '-' -7 after subtracting fractionDigits.length() Compute 31415926 x 10-7 = 3.1415926 A bit of a hack!

Syntax Error Handling Error handling is a three-step process: Detect the presence of a syntax error. Flag the error by pointing it out or highlighting it, and display a descriptive error message. Recover by moving past the error and resume parsing. For now, we’ll just move on, starting with the current character, and attempt to extract the next token. SYNTAX_ERROR message source line number beginning source position token text syntax error message

Class PascalParserTD frontend/pascal/PascalParserTD.cpp void PascalParserTD::parse() throw (string) {     ...     // Loop over each token until the end of file.     while ((token = next_token(token)) != nullptr)     {         TokenType token_type = token->get_type();         last_line_number = token->get_line_number();         Object value = token->get_value();         string type_str;         string value_str;         switch ((PascalTokenType) token_type)         {             case PT_STRING:             {                 type_str = "STRING";                 value_str = cast(value, string);                 break;             } frontend/pascal/PascalParserTD.cpp

Class PascalParserTD, cont’d frontend/pascal/PascalParserTD.cpp             case PT_IDENTIFIER:             {                 type_str = "IDENTIFIER";                 value_str = "";                 break;             }             case PT_INTEGER:                 type_str = "INTEGER";                 value_str = to_string(cast(value, int));             case PT_REAL:                 type_str = "REAL";                 value_str = to_string(cast(value, float));             case PT_ERROR: break;

Class PascalParserTD, cont’d             default:  // reserved word or special character             {                 // Reserved word                 if (!value.empty())                 {                     value_str = cast(value, string);                     type_str = value_str;                 }                 // Special symbol                 else                     type_str =                         PascalToken::SPECIAL_SYMBOL_NAMES[                                            (PascalTokenType) token_type];                 break;             }         } frontend/pascal/PascalParserTD.cpp

Class PascalParserTD, cont’d         if (token_type != (TokenType) PT_ERROR)         {             // Format and send a message about each token.             Message message(TOKEN,                             LINE_NUMBER, to_string(token->get_line_number()),                             POSITION, to_string(token->get_position()),                             TOKEN_TYPE, type_str,                             TOKEN_TEXT, token->get_text(),                             TOKEN_VALUE, value_str);             send_message(message);         }         else             PascalErrorCode error_code =                                     (PascalErrorCode) cast(value, int);             error_handler.flag(token, error_code, this);     } ... } frontend/pascal/PascalParserTD.cpp

Program: Pascal Tokenizer Verify the correctness of the Pascal token subclasses. Verify the correctness of the Pascal scanner. Demo (Chapter 3)

Assignment #2 Write a scanner for the Java language. Add a new Java front end.

Quick Review of the Framework Chapter 4 Quick Review of the Framework TO: FROM: Our next topic: The symbol table

The Symbol Table: Basic Concepts Purpose To store information about certain tokens during the translation process (i.e., parsing and scanning) What information to store? Anything that’s useful! For an identifier: name data type how it’s defined (as a variable, type, function name, etc.)

The Symbol Table: Basic Operations Enter new information. Look up existing information. Update existing information.

The Symbol Table: Conceptual Design Goal: The symbol table should be source language independent. Each entry in the symbol table has a name attributes At the conceptual level, we don’t worry about implementation.

What Needs a Symbol Table? A Pascal program Identifiers for constant, type, variable, procedure, and function names. A Pascal procedure or function Identifiers for formal parameter (argument) names. A Pascal record type Identifiers for field names.

The Symbol Table Stack Language constructs can be nested. Procedures and functions are nested inside a program. Procedures and functions can be nested inside of each other. Record types are defined within programs, procedures, and functions. Record types can be nested inside of each other. Therefore, symbol tables need to be kept on a symbol table stack.

The Symbol Table Stack, cont’d Whichever symbol table is on top of the stack is the local symbol table. The first symbol table created (the one at the bottom of the stack) is the global symbol table. It stores the predefined information, such as entries for the names of the standard types integer, real, char, and boolean. During the translation process, symbol tables are pushed onto and popped off the stack … … as the parser enters and exits nested procedures, functions, record types, etc. Global symbol table

The Symbol Table Stack, cont’d For now, we’ll have only have a single symbol table. Therefore, the local symbol table is the global symbol table. We won’t need multiple symbol tables until we start to parse declarations. Implementing the symbol table stack now will make things easier for us later. Global symbol table

Symbol Table Interfaces Key operations Enter into the local symbol table, the table currently at the top of the stack. Look up (search for) an entry only in the local symbol table. Look up an entry in all the symbol tables in the stack. Search from the top (the local) table down to the bottom (global) table. Each symbol table has a nesting level. 0: global 1: program 2: top-level procedure 3: nested procedure, etc.

Why All the Interfaces? We’ve defined the symbol table components entirely with interfaces. Other components that use the symbol table will code to the interfaces, not to specific implementations. Loose coupling provides maximum support for flexibility. symTabStack = SymTabFactory.createSymTabStack(); SymTabEntry entry = symTabStack.lookup(name);

Why All the Interfaces? cont’d We’ll be able to implement the symbol table however we like. We can change the implementation in the future without affecting the users. But not change the interfaces. The interfaces provide an API for the symbol table. Callers of the symbol table API only need to understand the symbol table at the conceptual level.

Symbol Table Implementation Details

Symbol Table Interfaces, cont’d Abstract classes as interfaces Namespace wci::intermediate SymTabStack SymTab SymTabEntry SymTabKey

Symbol Table Interfaces /**  * Symbol table entry keys.  */ enum class SymTabKey {     // to be "subclassed" by implementation-specific symbol table keys }; class SymTabEntry public:     SymTabEntry(const string name, SymTab *symtab);     virtual ~SymTabEntry();     virtual string get_name() const = 0;     virtual SymTab *get_symtab() const = 0;     virtual void set_attribute(const SymTabKey key, Object *value) = 0;     virtual Object *get_attribute(const SymTabKey key) = 0;     virtual void append_line_number(const int line_number) = 0;     virtual vector<int> get_line_numbers() = 0; wci/intermediate/SymTabEntry.h For cross-referencing.

Symbol Table Components Implementation Implementation classes are defined in package intermediate.symtabimpl A SymTabStackImpl object can own zero or more SymTab objects. A SymTabImpl object can own zero or more SymTabEntry objects. A SymTabEntryImpl object maintains a reference to the SymTab object that contains it. _

A Symbol Table Factory Class SymTabStack *SymTabFactory::create_symtab_stack() {     return new SymTabStackImpl(); } SymTab *SymTabFactory::create_symtab(int nesting_level)     return new SymTabImpl(nesting_level); SymTabEntry *SymTabFactory::create_symtab_entry(string name, SymTab *symtab)     return new SymTabEntryImpl(name, symtab); wci/intermediate/SymTabFactory.cpp

Symbol Table Implementation Implement the symbol table as a map. Key: the identifier name (a string) Value: pointer to the symbol table entry corresponding to the name wci/intermediate/symtabimpl/SymTabImpl.h private:     map<string, SymTabEntryImpl *> contents;

Symbol Table Implementation, cont’d SymTabEntry *SymTabImpl::enter(string name) {     SymTabEntry *entry = SymTabFactory::create_symtab_entry(name, this);     contents[name] = (SymTabEntryImpl *) entry;     return entry; } SymTabEntry *SymTabImpl::lookup(const string name)     return (contents.find(name) != contents.end())                 ? contents[name]                 : nullptr; wci/intermediate/symtabimpl/SymTabImpl.cpp

Symbol Table Implementation, cont’d Member function sorted_entries() returns an array list of the symbol table entries in sorted order. vector<SymTabEntry *> SymTabImpl::sorted_entries() {     vector<SymTabEntry *> list;     map<string, SymTabEntryImpl *>::iterator it;     for (it = contents.begin(); it != contents.end(); it++)     {         list.push_back(it->second);     }     return list;  // sorted list of entries } wci/intermediate/symtabimpl/SymTabImpl.cpp

Symbol Table Stack Implementation Implement the stack as a vector of symbol tables: Constructor For now, the current nesting level will always be 0. Initialize the stack with the global symbol table. For now, that’s the only symbol table, so it’s also the local table. wci/intermediate/symtabimpl/SymTabStackImpl.h private:     vector<SymTab *> stack; SymTabStackImpl::SymTabStackImpl() {     stack.push_back(SymTabFactory::create_symtab(0)); } Who calls this constructor? wci/intermediate/symtabimpl/SymTabStackImpl.cpp

Symbol Table Stack Implementation, cont’d SymTabEntry *SymTabStackImpl::enter_local(const string name) {     return stack[current_nesting_level]->enter(name); } SymTabEntry *SymTabStackImpl::lookup_local(const string name) const     return stack[current_nesting_level]->lookup(name); SymTabEntry *SymTabStackImpl::lookup(const string name) const     return lookup_local(name); wci/intermediate/symtabimpl/SymTabStackImpl.cpp For now, since there is only one symbol table on the stack, method lookup() simply calls method lookupLocal(). In the future, method lookup() will search the entire stack. Why do we need both methods?