Symbol Table
Role of Symbol Table Essential data structure for compiler Used for storing information about identifiers appearing in a source program Lexical Analyzer and Parser fill up symbol table Code generator and optimizer make use of symbol table Entities stored in a symbol table Variables, procedures, functions, defined constants, labels, structures, file identifications, compiler generated temporaries
Information in symbol table Name: May be stored directly in the table or the entry may point to another character string, possibly in an associated string table Type: type of identifier Whether variable / function / procedure name For variables, identify whether integer / real / array … Location: offset in the program where the identifier is defined Scope: identifies the region of the program in which the current symbol definition is valid Other attributes: Array limits, record fields / parameters / return values of functions
Usage of symbol table information Semantic Analysis: check correct semantic usage of language constructs May need checking types of identifiers Code generation: All program variables and temporaries need to be allocated some memory locations Symbol table provides information regarding memory size required for identifiers by their types Error Detection: Leave variables undefined Optimization: To reduce the total number of variables used in a program we need to reuse the temporaries generated by the compiler
Features of symbol tables Insert Delete Lookup Modify
Symbol Table Design Format of symbol table entries Access mechanism Linear Lists / arrays / trees…. Access mechanism Linear search / hashing / binary search .. Location of storage Primary Memory (RAM) / Secondary Memory (large symbol table) Scope issues Simple symbol table with nested scope Scoped symbol table with nested scopes
Simple symbol table: Operations Enter a new symbol into the table Lookup for a symbol Modify information about a symbol stored earlier
Simple Symbol Table Formats Linear Table Ordered List Tree Hash table
Linear Table int x, y; float z; ….. procedure abc L1:…
Linear Table Name Type Location x y z Abc L1 int float procedure label Offset of x Offset of y Offset of abc Offset of L1 Insert, Lookup and modify operations take O (n) time, n being the number of identifiers Insertion can be made in O (1) be remembering the pointer to the next free position
Ordered List Variation of linear table List may be sorted and a binary search may be used for access in O (log n) Insertion needs to be done at proper place to preserve the sorted nature Self-organizing list: Dynamically re-arrange list based on recency of reference
Tree Symbol Table Each entry is used represented as a node in a tree Based on string comparison of names, entries lesser than a reference node are kept in the left subtree and entries greater than a reference node are kept in the right subtree Average lookup time is O (log n) AVL trees may be used
Tree symbol table example abc y L1 z
Hash table symbol table Used in cases where access time is to be minimized Most common method implementing Symbol Table in compilers Hash function used to map identifier names to hash table locations, organized as an array To store a symbol table into the table, hash function is applied which results in the unique location in the table Symbol along with associated information stored Accessed using the hash function [O (1) time] Problem of hash tables Imperfect hash function – Collision Collision resolution (Chaining …)
Hash function properties Should depend on name of symbol Easy and quick to compute Should be uniform in mapping names to different parts of the table Computed value must always be within table range
Scoped Symbol Table Global Scope File-wide scope Global Variables File-wide scope Modules in more than one file (static variables / functions) Local scope within a procedure Function local variables Local scope within a block
Scoping rules Static or lexical scoping Dynamic or runtime scoping Procedure P1 is callable from P2 and P3 P1 has reference to a non-local variable x There exist two different definitions of x, one each in P2 and P3 When P1 is called from P2, x will refer to the definition in P2, while when called from P3, it will refer to the definition in P3
Nested Lexical Scoping Procedure P1 ….. Procedure P2 End Procedure Procedure P3 …. Procedure P4 x =
Scope resolution rule If a name is defined in more than one scope, the innermost definition closest to the reference is to be used to interpret the reference to that name When a scope is exited, all variables declared in that scope are deleted from the symbol table
Scope symbol table implementation One table per scope One table for all scopes Operations on symbol tables Insert: Insert a new symbol in current scope Delete Modify Create: Create a new scope Lookup
One table per scope List Tree Hash tables
One symbol table for all scopes Lists Trees Hash tables
Reference Compiler Design: Santanu Chattopadhay