Symbol Table 薛智文 cwhsueh@csie.ntu.edu.tw (textbook ch#2.7 and 6.5) 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 96 Spring.

Slides:



Advertisements
Similar presentations
Pointers.
Advertisements

Chap 8 Symbol Table name attributes : :
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
1 Symbol Tables. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Symbol Table.
Intermediate Code Generation
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
1 Compiler Construction Intermediate Code Generation.
The Assembly Language Level
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
The Symbol Table Lecture 13 Wed, Feb 23, The Symbol Table When identifiers are found, they will be entered into a symbol table, which will hold.
ISBN Chapter 10 Implementing Subprograms.
Tutorial 6 & 7 Symbol Table
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 9: Subprogram Control
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
國立台灣大學 資訊工程學系 薛智文 98 Spring Symbol Table (textbook ch#2.7 and 6.5 )
Chap. 8, Declaration Processing and Symbol Tables J. H. Wang Dec. 13, 2011.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Names and Scope. Scope Suppose that a name is used many times for different entities in text of the program, or in the course of execution. When the name.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Basic Semantics Associating meaning with language entities.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Chapter 10 Implementing Subprograms. Copyright © 2012 Addison-Wesley. All rights reserved.1-2 Chapter 10 Topics The General Semantics of Calls and Returns.
Chapter 5 Record Storage and Primary File Organizations
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
1 Compiler Construction Run-time Environments,. 2 Run-Time Environments (Chapter 7) Continued: Access to No-local Names.
Semantic Analysis. Find 6 problems with this code. These issues go beyond syntax.
Implementing Subprograms
Chapter 10 : Implementing Subprograms
Implementing Subprograms
May 3rd – Hashing & Graphs
Type Checking Generalizes the concept of operands and operators to include subprograms and assignments Type checking is the activity of ensuring that the.
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
Data Structure Interview Question and Answers
Constructing Precedence Table
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS 326 Programming Languages, Concepts and Implementation
Lecture No.43 Data Structures Dr. Sohail Aslam.
Dynamic Memory Allocation
Data Structures Interview / VIVA Questions and Answers
Cinda Heeren / Geoffrey Tien
Review Graph Directed Graph Undirected Graph Sub-Graph
Hashing Exercises.
Chapter 10: Implementing Subprograms Sangho Ha
Implementing Subprograms
Disk Storage, Basic File Structures, and Hashing
Computer Science 2 Hashing
CMPE 152: Compiler Design September 18 Class Meeting
Chapter 10 Hashing.
CMPE 152: Compiler Design October 4 Class Meeting
Advance Database System
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Introduction to Data Structure
EE 312 Software Design and Implementation I
COMPILERS Semantic Analysis
SYMBOL TABLE Chuen-Liang Chen Department of Computer Science
EE 312 Software Design and Implementation I
Implementing Subprograms
Lecture-Hashing.
Presentation transcript:

Symbol Table 薛智文 cwhsueh@csie.ntu.edu.tw (textbook ch#2.7 and 6.5) 薛智文 cwhsueh@csie.ntu.edu.tw http://www.csie.ntu.edu.tw/~cwhsueh/ 96 Spring

Definition Symbol table: A data structure used by a compiler to keep track of semantics of names. Data type. When is used: scope. The effective context where a name is valid. Where it is stored: storage address. Operations: Find: whether a name has been used. Insert: add a name. Delete: remove a name when its scope is closed. 00:58 /27

Some Possible Implementations Unordered list: for a very small set of variables; coding is easy, but performance is bad for large number of variables. Ordered linear list: use binary search; insertion and deletion are expensive; coding is relatively easy. Binary search tree: O(log n) time per operation (search, insert or delete) for n variables; coding is relatively difficult. Hash table: most commonly used; very efficient provided the memory space is adequately larger than the number of variables; performance maybe bad if unlucky or the table is saturated; coding is not too difficult. 00:58 /27

Hash Table Hash function h(n): returns a value from 0, . . . , m − 1, where n is the input name and m is the hash table size. Uniformly and randomly. Many possible good designs. Add up the integer values of characters in a name and then take the remainder of it divided by m. Add up a linear combination of integer values of characters in a name, and then take the remainder of it divided by m. Resolving collisions: Linear resolution: try (h(n) + 1) mod m, where m is a large prime number, and then (h(n) + 2) mod m, . . . , (h(n) + i) mod m. Chaining: most popular. Keep a chain on the items with the same hash value. Quadratic-rehashing: try (h(n) + 12) mod m, and then try (h(n) + 22) mod m, and then · · · try (h(n) + i2) mod m. 00:58 /27

Performance of Hash Table Performance issues on using different collision resolution schemes. Hash table size must be adequately larger than the maximum number of possible entries. Frequently used variables should be distinct. Keywords or reserved words. Short names, e.g., i, j and k. Frequently used identifiers, e.g., main. Uniformly distributed. 00:58 /27

Contents in a Symbol Table Possible entries in a symbol table: Name: a string. Attribute: Reserved word Variable name Type name Procedure name Constant name · · · Data type. Storage allocation, size, . . . Scope information: where and when it can be used. 00:58 /27

How Names are Stored? Fixed-length: allocate a fixed space for each name allocated. Too little: names must be short. Too much: waste a lot of spaces. Variable-length name: A string of space is used to store all names. For each name, store the length and starting index of each name. NAME ATTRIBUTES STORAGE ADDRESS … s o r t a e d y i 2 Name ATTRIBUTES STROAGE ADDRESS … index length 5 2 7 10 17 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 s o r t $ a $ r e a d a r r a y $ i 2 $ 00:58 /27

Handling Block Structures main() /* C code */ { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } Nested blocks mean nested scopes. Two major ways for implementation: Approach 1: multiple symbol tables in one stack. Approach 2: one symbol table with chaining. 00:58 /27

Multiple Symbol Tables in One Stack An individual symbol table for each scope. Use a stack to maintain the current scope. Search top of stack first. If not found, search the next one in the stack. Use the first one matched. Note: a popped scope can be destroyed in a one-pass compiler, but it must be saved in a multi-pass compiler. searching direction main() /* C code */ { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } S.T. for x, y, H S.T. for A, C, M S.T. for H, A, L S.T. for H, A, L S.T. for H, A, L parse point A parse point B parse point C 00:58 /27

Pros and Cons for Multiple Symbol Tables Advantage: Easy to close a scope. Disadvantage: Difficulties encountered when a new scope is opened . Need to allocate adequate amount of entries for each symbol table if it is a hash table. Waste lots of spaces. A block within a procedure does not usually have many local variables. There may be many global variables, and many local variables when a procedure is entered. 00:58 /27

One Symbol Table with Chaining (1/2) A single global table marked with the scope information. Each scope is given a unique scope number. Incorporate the scope number into the symbol table. Two possible codings (among others): Hash table with chaining. Chaining at the front when names hashed into the same location. parse table hash with chaining main() /* C code */ { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } H(2) H(1) H(1) L(1) L(1) x(2) C(3) y(2) M(3) A(1) A(3) A(1) parse point B parse point C 00:58 /27

One Symbol Table with Chaining (2/2) A second coding choice: Binary search tree with chaining. Use a doubly linked list to chain all entries with the same name. main() /* C code */ { /* open a new scope */ int H,A,L; /* parse point A */ ... { /* open another new scope */ float x,y,H; /* parse point B */ /* x and y can only be used here */ /* H used here is float */ } /* close an old scope */ /* H used here is integer */ { char A,C,M; /* parse point C */ } H(2) H(1) A(1) L(1) x(2) parse point B y(2) H(1) A(1) A(3) L(1) C(3) M(3) parse point C 00:58 /27

Pros and Cons for a Unique Symbol Table Advantage: Does not waste spaces. Little overhead in opening a scope. Disadvantage: It is difficult to close a scope. Need to maintain a list of entries in the same scope. Using this list to close a scope and to reactive it for the second pass if needed. 00:58 /27

Records and Fields The “with” construct in PASCAL can be considered an additional scope rule. Field names are visible in the scope that surrounds the record declaration. Field names need only to be unique within the record. Another example is the “using namespace” directive in C++. Example (PASCAL code): A, R: record A: integer X: record A: real; C: boolean; end ... R.A := 3; /* means R.A := 3; */ with R do A := 4; /* means R.A := 4; */ 00:58 /27

Implementation of Field Names Two choices for handling field names: Allocate a symbol table for each record type used. Associate a record number within the field names. Assign record number #0 to names that are not in records. A bit time consuming in searching the symbol table. Similar to the scope numbering technique. main symbol table record A R another symbol table A integer X record another symbol table another symbol table another symbol table A integer A real A real X record C boolean C boolean 00:58 /27

Locating Field Names Example: with R do begin A := 3; with X do A := 3.3 end If each record (each scope) has its own symbol table, then push the symbol table for the record onto the stack. If the record number technique is used, then keep a stack containing the current record number; During searching, succeed only if it matches the name and the current record number. If fail, then use next record number in the stack as the current record number and continue to search. If everything fails, search the normal main symbol table. 00:58 /27

Overloading (1/3) A symbol may, depending on context, have more than one semantics. Examples. operators: I := I + 3; X := Y + 1.2; function call return value and recursive function call: f := f + 1; 00:58 /27

Overloading (2/3) Implementation: Link together all possible definitions of an overloading name. Call this an overloading chain. Whenever a name that can be overloaded is defined: if the name is already in the current scope, then add the new definition in the overloading chain; if it is not already there, then enter the name in the current scope, and link the new entry to any existing definitions; search the chain for an appropriate one, depending on the context. Whenever a scope is closed, delete the overloading definitions defined in this scope from the head of the chain. 00:58 /27

Overloading (3/3) Example: PASCAL function name and return variable. Within the function body, the two definitions are chained. i.e., function call and return variable. When the function body is closed, the return variable definition disappears. [PASCAL] function f: integer; begin if global > 1 then f := f +1; return end 00:58 /27

Forward Reference Definition: Possible implementations: A name that is used before its definition is given. To allow mutually referenced and linked data types, names can sometimes be used before that are declared. Possible implementations: Multi-pass compiler. Back-patching. Avoid resolving a symbol until all possible places where symbols can be declared have been seen. In C, ADA and languages commonly used today, the scope of a declaration extends only from the point of declaration to the end of the containing scope. If names must be defined before their usages, then one-pass compiler with normal symbol table techniques suffices. Some possible usages for forward referencing: GOTO labels. Recursively defined pointer types. Mutually or recursively called procedures. 00:58 /27

GOTO Labels Some language like C uses labels without declarations. Implicit declaration. Example: [C] L0: ... goto L0; goto L1; L1: 00:58 /27

Recursively Defined Pointer Types Determine the element type if possible; Chaining together all references to unknown type names until the end of the type declaration; All type names can then be looked up and resolved. Names that are unable to resolved are undeclared type names. Example: [PASCAL] type link = ^ cell; cell = record info: integer; next: link; end; 00:58 /27

Mutually or Recursively Called Procedures Need to know the specification of a procedure before its definition. Some languages require prototype definitions. Example: procedure A() { ... call B(); } procedure B() call A(); 00:58 /27

Type Equivalent and Others How to determine whether two types are equivalent? Structural equivalence. Express a type definition via a directed graph where nodes are the elements and edges are the containing information. Two types are equivalent if and only if their structures (labeled graphs) are the same. A difficult job for compilers. entry = record [entry] info : real; +-----> [info] <real> coordinates : record +-----> [coordinates] x : integer; +----> [x] <integer> y : integer; +----> [y] <integer> end Name equivalence. Two types are equivalent if and only if their names are the same. An easy job for compilers, but the coding takes more time. Symbol table is needed during compilation, and might also be needed during debugging. 00:58 /27

Usage of Symbol Table with YACC Define symbol table routines: Find_in_symbol_table(name,scope): check whether a name within a particular scope is currently in the symbol table or not. Return “not found” or an entry in the symbol table; Insert_into_symbol_table(name,scope) Return the newly created entry. Delete_from_symbol_table(name,scope) For interpreters: Use the attributes associated with the symbols to hold temporary values. Use a structure with maybe some unions to record all attributes. struct YYTYPE { char type; /* data type of a variable */ int value; int addr; char *namelist; /* list of names */ } 00:58 /27

YACC Coding: Declaration I D  L id {use Find_in_symbol_table to check for a possible duplicated name with $.name; use Insert_into_symbol_table to insert $2.name with the type $1.type; allocate sizeof($1.type) bytes; record the storage address in the symbol table entry; $$.type = $1.type;} L  L id , | T {$$.type = $1.type;} T  int {$$.type = int;} 00:58 /27

YACC Coding: Declaration II D  T L {append each name in $2.namelist into symbol table, i.e., use Find_in_symbol_table to check for possible duplicated names; use Insert_into_symbol_table to insert each name in the list with the type $1.type; allocate sizeof($1.type) bytes; record the storage address in the symbol table entry;} T  int {$$.type = int;} L  L , id {insert the new name yytext into $1.namelist; return $$.namelist as $1.namelist;} | id {the variable name is in yytext; create a list of one name, i.e., yytext, $$.namelist;} 00:58 /27

YACC Coding: Expressions and Assignments Usage of variables: Assign_S  L_var := Expression; {$1.addr is the address of the variable to be stored; $3.value is the value of the expression; generate code for storing $3.value into $1.addr;} L_var  id { use Find_in_symbol_table to check whether yytext is already declared; $$.addr = storage address;} Expression  Expression + Expression {$$.value = $1.value + $3.value;} | Expression − Expression {$$.value = $1.value − $3.value;} · · · | id if yes, error · · · . if not, $$.value = the value of the variable yytext} 00:58 /27