CMPE 152: Compiler Design February 26 Class Meeting Department of Computer Engineering San Jose State University Spring 2019 Instructor: Ron Mak www.cs.sjsu.edu/~mak
Pascal Type Definitions Chapter 9 Pascal Type Definitions A Pascal simple type can be: scalar (integer, real, boolean, char) enumeration subrange Not reserved words!
Pascal Array Type Definitions index type element type An array type specification has an index type and an element type. The index type must be a simple type (subrange or enumeration). The element type can be any type. Including another array type (multidimensional arrays).
Pascal Array Type Definitions Examples of array definitions. TYPE ar1 = ARRAY [grades] OF integer; ar2 = ARRAY [(alpha, beta, gamma)] OF range2; ar3 = ARRAY [weekday] OF ar2; ar4 = ARRAY [range3] OF (foo, bar, baz); ar5 = ARRAY [range1] OF ARRAY [range2] OF ARRAY[c..e] OF enum2; ar6 = ARRAY [range1, range2, c..e] OF enum2; Type definitions ar5 and ar6 above are equivalent ways to define a multidimensional array. A Pascal string type is an array of characters. The index type must be an integer subrange with a lower limit of 1. str = ARRAY [1..10] OF char;
Pascal Record Type Definitions A record field can be any type. Including another record type (nested records).
Pascal Record Type Definitions Examples of record definitions: TYPE rec1 = RECORD i : integer; r : real; b1, b2 : boolean; c : char END; rec2 = RECORD ten : integer; r : rec1; a1, a2, a3 : ARRAY [range3] OF range2;
Pascal Variable Declarations Variable declarations are syntactically similar to record field declarations: Examples: VAR var1 : integer; var2, var3 : range2; var4 : ar2 var5 : rec1; direction : (north, south, east, west); Types can be named or unnamed.
Declarations and the Symbol Table Identifiers from Pascal declarations that we will enter into a symbol table, names of: constants types enumeration values record fields variables Information from parsing type specifications: simple types array types record types
Scope and the Symbol Table Stack The scope of an identifier is the part of the source program where that identifier can be used. Everywhere in the program where the definition of the identifier is in effect. A program creates a scope whenever it has identifiers that can only be used in a certain part of the program. Example: Local variables of a function. Scope is closely related to nesting levels and the symbol table stack.
Scope and the Symbol Table Stack, cont’d Global scope Nesting level 0: At the bottom of the symbol table stack. All predefined global identifiers such as integer, real, boolean, char. Program scope Nesting level 1: One up from the bottom of the stack. All identifiers declared at the “top level” of a program (not in a procedure or function).
Scope and the Symbol Table Stack, cont’d Record definitions, procedures, and functions each has a scope. Scopes in a Pascal program are nested. An identifier can be redefined within a nested scope. Within the nested scope, the definition in the nested scope overrides the definition in an outer scope. Example: A function can have a local variable x which overrides a program variable x. Each scope must have its own symbol table.
Scope and the Symbol Table Stack, cont’d As the parser parses a program from top to bottom, it enters and exits nested scopes. Whenever the parser enters a scope, it must push that scope’s symbol table onto the symbol table stack. Whenever the parser exits a scope, it must pop that scope’s symbol table off the stack.
Scope and the Symbol Table Stack, cont’d Scope example: Program symbol table Global symbol table Note that the program name Test is defined in the global scope at level 0.
New Methods for Class SymTabStackImpl wci::intermediate::symtabimpl::SymTabStackImpl.cpp SymTab *SymTabStackImpl::push() { SymTab *symtab = SymTabFactory::create_symtab(++current_nesting_level); stack.push_back(symtab); return symtab; } SymTab *SymTabStackImpl::push(SymTab *symtab) ++current_nesting_level; SymTab *SymTabStackImpl::pop() SymTab *symtab = stack[current_nesting_level]; stack.erase(stack.begin() + current_nesting_level); current_nesting_level--; Push a new symbol table onto the stack. Push an existing symbol table onto the stack. Pop a symbol table off the stack. Recall that we implemented SymTabStackImpl as an vector<SymTab *>.
Class SymTabStackImpl wci::intermediate::symtabimpl::SymTabStackImpl.cpp SymTabEntry *SymTabStackImpl::lookup_local(const string name) const { return stack[current_nesting_level]->lookup(name); } SymTabEntry *SymTabStackImpl::lookup(const string name) const SymTabEntry *found_entry = nullptr; // Search the current and enclosing scopes. for (int i = current_nesting_level; (i >= 0) && (found_entry == nullptr); --i) { found_entry = stack[i]->lookup(name); } return found_entry; Method lookup() now searches the current symbol table and the symbol tables lower in the stack. It searches in the current scope and then outward in the enclosing scopes.
Type Specification Interfaces Interface TypeSpec represents a type specification. Method getForm() returns the type’s form. scalar enumeration subrange array record A named type refers to its type identifier: getIdentifier() Each type specification has certain attributes depending on its form.
Type Specification Attributes Each type specification has certain attributes depending on its form. A separate symbol table for the field identifiers Record Element count Element type Index type Array Maximum value Minimum value Base type Subrange List of enumeration constant identifiers Enumeration None Scalar Type specification attributes Type form
TypeFormImpl TypeFormImpl enumerated values represent the type forms. The left column of the table. wci::intermediate::typeimpl::TypeSpecImpl.h enum class TypeFormImpl { SCALAR, ENUMERATION, SUBRANGE, ARRAY, RECORD, }; A separate symbol table for the field identifiers Record Element count Element type Index type Array Maximum value Minimum value Base type Subrange List of enumeration constant identifiers Enumeration None Scalar Type specification attributes Type form
TypeKeyImpl TypeKeyImpl enumerated values are the attribute keys. The right column of the table. A separate symbol table for the field identifiers Record Element count Element type Index type Array Maximum value Minimum value Base type Subrange List of enumeration constant identifiers Enumeration None Scalar Type specification attributes Type form enum class TypeKeyImpl { // Enumeration ENUMERATION_CONSTANTS, // Subrange SUBRANGE_BASE_TYPE, SUBRANGE_MIN_VALUE, SUBRANGE_MAX_VALUE, // Array ARRAY_INDEX_TYPE, ARRAY_ELEMENT_TYPE, ARRAY_ELEMENT_COUNT, // Record RECORD_SYMTAB, }; A TypeFormImpl enumerated value tells us which set of TypeKeyImpl attribute keys to use. wci::intermediate::typeimpl::TypeSpecImpl.h
TypeSpecImpl Implement a type specification as a hash map. Similar to a symbol table entry and a parse tree node. class TypeSpecImpl : public TypeSpec { ... private: TypeForm form; SymTabEntry *type_id; // type identifier map<TypeKey, Object> contents; }; wci::intermediate::typeimpl::TypeSpecImpl.h
TypeSpecImpl String Constructor A constructor to make Pascal string types. TypeSpecImpl::TypeSpecImpl(string value) { initialize(); form = (TypeForm) TF_ARRAY; TypeSpec *index_type = new TypeSpecImpl((TypeForm) TF_SUBRANGE); index_type->set_attribute((TypeKey) SUBRANGE_BASE_TYPE, Predefined::integer_type); index_type->set_attribute((TypeKey) SUBRANGE_MIN_VALUE, 1); int length = value.length(); index_type->set_attribute((TypeKey) SUBRANGE_MAX_VALUE, length); set_attribute((TypeKey) ARRAY_INDEX_TYPE, index_type); set_attribute((TypeKey) ARRAY_ELEMENT_TYPE, Predefined::char_type); set_attribute((TypeKey) ARRAY_ELEMENT_COUNT, length); } wci::intermediate::typeimpl::TypeSpecImpl.cpp A separate symbol table for the field identifiers Record Element count Element type Index type Array Maximum value Minimum value Base type Subrange List of enumeration constant identifiers Enumeration None Scalar Type specification attributes Type form You’ll see soon where Predefined.integerType and Predefined.charType are defined.
Function TypeSpecImpl::base_type() Return the base type of a subrange type. Example: Return integer for a subrange of integer. Just return the type if it’s not a subrange. wci::intermediate::typeimpl::TypeSpecImpl.cpp TypeSpec *TypeSpecImpl::base_type() { if (form == (TypeForm) TF_SUBRANGE) { Object type_value = get_attribute((TypeKey) SUBRANGE_BASE_TYPE); return cast(type_value, TypeSpec*); } else return this; }
Function TypeSpecImpl::is_pascal_string() Is this a string type? wci::intermediate::typeimpl::TypeSpecImpl.cpp bool TypeSpecImpl::is_pascal_string() { if (form == (TypeForm) TF_ARRAY) { Object type_value = get_attribute((TypeKey) ARRAY_ELEMENT_TYPE); TypeSpec *elmt_type = cast(type_value, TypeSpec*); type_value = this->get_attribute((TypeKey) ARRAY_INDEX_TYPE); TypeSpec *index_type = cast(type_value, TypeSpec*); return (elmt_type->base_type() == Predefined::char_type) && (index_type->base_type() == Predefined::integer_type); } else return false; } To be a Pascal string type, the index type must be integer and the element type must be char.
Type Factory A type factory creates type specifications. Two factory methods. Pascal string types. All other types. wci::intermediate::TypeFactory.cpp TypeSpec *TypeFactory::create_type(const TypeForm form) { return new TypeSpecImpl(form); } TypeSpec *TypeFactory::create_string_type(const string value) return new TypeSpecImpl(value);
How are Identifiers Defined? Identifiers are no longer only the names of variables. Class DefinitionImpl enumerates all the ways an identifier can be defined. enum class DefinitionImpl { CONSTANT, ENUMERATION_CONSTANT, TYPE, VARIABLE, FIELD, VALUE_PARM, VAR_PARM, PROGRAM_PARM, PROGRAM, PROCEDURE, FUNCTION, UNDEFINED, }; Think of Definition as the role an identifier plays. wci::intermediate::symtabimpl::SymTabEntryImpl.h
Predefined Scalar Types Initialized by class Predefined in the global scope. Example: integer wci::intermediate::symtabimpl::Predefined.cpp TypeSpec *Predefined::integer_type; SymTabEntry *Predefined::integer_id; ... void Predefined::initialize_types(SymTabStack *symtab_stack) { // Type integer. integer_id = symtab_stack->enter_local("integer"); integer_type = TypeFactory::create_type((TypeForm) TF_SCALAR); integer_type->set_identifier(integer_id); integer_id->set_definition((Definition) DF_TYPE); integer_id->set_typespec(integer_type); } Type specification integer_type and the symbol table entry integer_id refer to each other.
Predefined Scalar Types Predefined type boolean introduces two predefined constants, true and false. wci::intermediate::symtabimpl::Predefined.cpp // Boolean enumeration constant false. false_id = symtab_stack->enter_local("false"); false_id->set_definition((Definition) DF_ENUMERATION_CONSTANT); false_id->set_typespec(boolean_type); false_id->set_attribute((SymTabKey) CONSTANT_VALUE, 0); // Boolean enumeration constant true. true_id = symtab_stack->enter_local("true"); true_id->set_definition((Definition) DF_ENUMERATION_CONSTANT); true_id->set_typespec(boolean_type); true_id->set_attribute((SymTabKey) CONSTANT_VALUE, 1); // Add false and true to the boolean enumeration type. vector<SymTabEntry *> constants; constants.push_back(false_id); constants.push_back(true_id); boolean_type->set_attribute((TypeKey) ENUMERATION_CONSTANTS, constants); Add the constant identifiers to the ENUMERTION_CONSTANTS list attribute.
Parsing Pascal Declarations
Function PascalParserTD::parse() wci::frontend::pascal::PascalParserTD.cpp icode = ICodeFactory::create_icode(); Predefined::initialize(symtab_stack); routine_id = symtab_stack->enter_local("dummyprogramname"); routine_id->set_definition((Definition) DefinitionImpl::PROGRAM); symtab_stack->set_program_id(routine_id); routine_id->set_attribute((SymTabKey) ROUTINE_SYMTAB, symtab_stack->push()); routine_id->set_attribute((SymTabKey) ROUTINE_ICODE, icode); Token *token = next_token(nullptr); BlockParser block_parser(this); ICodeNode *root_node = block_parser.parse_block(token, routine_id); icode->set_root(root_node); symtab_stack->pop(); Predefined types global symbol table. Push the new level 1 symbol table. The symbol table entry of the program id keeps a pointer to the level 1 symbol table. Parse the program’s block. When done, pop off the level 1 symbol table.
Function PascalParserTD::parse() wci::frontend::pascal::PascalParserTD.cpp token = current_token(); if (token->get_type() != (TokenType) PT_DOT) { error_handler.flag(token, MISSING_PERIOD, this); } int last_line_number = token->get_line_number(); // Set the parse tree root node. if (root_node != nullptr) icode->set_root(root_node); // Send the parser summary message. steady_clock::time_point end_time = steady_clock::now(); double elapsed_time = duration_cast<duration<double>>(end_time - start_time).count(); Message message(PARSER_SUMMARY, LINE_COUNT, to_string(last_line_number), ERROR_COUNT, to_string(get_error_count()), ELAPSED_TIME, to_string(elapsed_time)); send_message(message); Parse the final period.
Function BlockParser::parse() DeclarationsParser declarations_parser(this); StatementParser statement_parser(this); declarations_parser.parse_declaration(token); token = synchronize(StatementParser::STMT_START_SET); TokenType token_type = token->get_type(); ICodeNode *root_node = nullptr; if (token_type == (TokenType) PT_BEGIN) { root_node = statement_parser.parse_statement(token); } ... return root_node; Parse declarations. Synchronize to the start of a statement. Look for BEGIN. Parse the compound statement. wci.frontend.pascal.parsers.BlockParser
Function DeclarationsParser::parse() Parse constant definitions. If there is the CONST reserved word. Call the parse() method of ConstantDefinitionsParser Parse type definitions. If there is the TYPE reserved word. Call the parse() method of TypeDefinitionsParser Parse variable declarations. If there is the VAR reserved word. Call the parse() method of VariableDeclarationsParser
ConstantDefinitionsParser epsilon = 1.0e-6; pi = 3.1415926; delta = epsilon; parse() parseConstant() parseIdentifierConstant() getConstantType()
Function ConstantDefinitionsParser::parse() token = synchronize(IDENTIFIER_SET); while (token->get_type() == (TokenType) PT_IDENTIFIER) { string name = to_lower(token->get_text()); SymTabEntry *constant_id = symtab_stack->lookup_local(name); if (constant_id == nullptr) { constant_id = symtab_stack->enter_local(name); constant_id->append_line_number(token->get_line_number()); } else error_handler.flag(token, IDENTIFIER_REDEFINED, this); constant_id = nullptr; token = next_token(token); // consume the identifier token token = synchronize(EQUALS_SET); if (token->get_type() == (TokenType) PT_EQUALS) token = next_token(token); // consume the = Loop over the constant definitions. Enter the constant identifier into the symbol table but don’t set its definition to CONSTANT yet. Synchronize and parse the = sign. continued … wci::frontend::pascal::parsers::ConstantDefinitionsParser.cpp
Method ConstantDefinitionsParser.parse() Token *constant_token = token; Object data_value = parse_constant(token); if (data_value.empty()) data_value = -1; if (constant_id != nullptr) { constant_id->set_definition((Definition) DF_CONSTANT); constant_id->set_attribute((SymTabKey) CONSTANT_VALUE, data_value); TypeSpec *constant_type = constant_token->get_type() == (TokenType) PT_IDENTIFIER ? get_constant_type(constant_token) : get_constant_type(data_value); constant_id->set_typespec(constant_type); } ... token = synchronize(IDENTIFIER_SET); } Parse the constant’s value. Now set the constant identifier to CONSTANT. Set the constant’s value. Set the constant’s type. Guard against the constant definition pi = pi Synchronize at the start of the next constant definition. wci.frontend.pascal.parsers.ConstantDefinitionsParser