SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.

Slides:



Advertisements
Similar presentations
CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Advertisements

C Language.
Structure of a YACC File Has the same three-part structure as Lex Each part is separated by a % symbol The three parts are even identical: – definition.
Intermediate Code Generation
There are two types of addressing schemes:
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Elementary Data Structures: Part 2: Strings, 2D Arrays, Graphs
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
1 Compiler Construction Intermediate Code Generation.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Primitive Data Types There are a number of common objects we encounter and are treated specially by almost any programming language These are called basic.
Set 20 Interrupts. INTERRUPTS The Pentium has a mechanism whereby external devices can interrupt it. Devices such as the keyboard, the monitor, hard disks.
Compiler Construction
The Symbol Table Lecture 13 Wed, Feb 23, The Symbol Table When identifiers are found, they will be entered into a symbol table, which will hold.
Data Management and File Organization
Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,
Chapter 5: Hashing Hash Tables
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Pointers Applications
INTRODUCTION TO IBM PC ASSEMBLY LANGUAGE
OBJECT MODULE FORMATS. The object module format we have employed as an educational device is called OMF (relocatable object format). It’s one of the earliest.
國立台灣大學 資訊工程學系 薛智文 98 Spring Symbol Table (textbook ch#2.7 and 6.5 )
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
MATH 224 – Discrete Mathematics
Hash Table March COP 3502, UCF.
CIS Computer Programming Logic
A Simple Two-Pass Assembler
The Assembly Language Level Part B – The Assembly Process.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
C Tokens Identifiers Keywords Constants Operators Special symbols.
Compiler Construction Lexical Analysis. The word lexical means textual or verbal or literal. The lexical analysis implemented in the “SCANNER” module.
Compiler Construction
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
___________________________________________ COMPILER Theory___________________________________________ Fourth Year (First Semester) Dr. Hamdy M. Mousa.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
2 : Assembler 1 Chapter II: Assembler Chapter goal: r Introduce the fundamental functions that any assembler must perform. m Assign machine address m Translate.
Hashing – Part I CS 367 – Introduction to Data Structures.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Types of C Variables:  The following are some types of C variables on the basis of constants values it has. For example: ○ An integer variable can hold.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Set 27 HANDLING COMMENTS IN LEX & SEARCHING & SORTING IN C.
Module B - Computation1/61 Module-B-Computation Variables Basic Memory Operations Expressions.
Characters and Strings
Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
1 Compiler Construction Run-time Environments,. 2 Run-Time Environments (Chapter 7) Continued: Access to No-local Names.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Format of Assembly language
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
System Programming and administration
Tries A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes,
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Introduction to Java, and DrJava part 1
A Simple Two-Pass Assembler
Compiler Construction
Symbol Table 薛智文 (textbook ch#2.7 and 6.5) 薛智文 96 Spring.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Compiler Construction
Chapter 5: Hashing Hash Tables
Chapter 12 Variables and Operators
Presentation transcript:

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module format) as opposed to a program in an intermediate language (and, in fact, for optimization purposes, all compilers) need to make use of a symbol table

The symbol table records information about the identifiers in the source program such as their name, type, no. of dimensions, space assignment, etc.

To illustrate the use of symbol tables, lets consider a simple compiler, where symbol_stack consists of integers, and the integer associated with an identifier on the stack is the index of the entry for that identifier in the symbol table.

Our symbol stack entries will provide pointers to the entries in the symbol table where the name of the identifier and the offset assigned to it in the data segment is stored. Negative numbers will be employed on symbol stack as codes to denote the registers, AX, BX, etc.

As identifiers are encountered in the source code, their names are packed onto an array, we will call id_stack, defined as: char id_stack[1000]; Since strings in C all end in a 00h byte, it is only necessary to specify where on id_stack a name begins, in order to retrieve it.

The symbol table entry for a name does not contain the name itself, but instead a pointer to the beginning of the name on id_stack. The reason for this is that, since the symbol table is an array of symbol table entries, we would have otherwise have to provide space in each entry for the largest legal name size.

When an identifier is encountered in the source code, the compiler has to search the symbol table to find the entry, if any, for it. Various methods have been investigated for making this process more efficient, such as the use of binary trees,

But the method of choice has been to derive a number called a hash code from an identifier, and then link all identifiers with the same hash code in a list, which we will refer to as a hashlist

One method for evaluating a hash code, is to add up the ascii codes of the individual characters of the identifier and then take, as the hash code the remainder of this sum after division by a prime number, such as 127.

The following is sample code for this purpose: int hash(char * name) { int hash_value = 0; int i = 0; while(name[i] != '\0') { hash_value += name[i]; ++i; } return(hash_value % 127); } In this scheme there are 127 hash-lists

A simple symbol table could be defined as follows: typedef struct { int name_index; int offset; int hash_link; } symbol_table_entry; symbol_table_entry symbol_table[1000];

Here name_index is the pointer into ID_S where the name is stored, offset is the offset in the data segment assigned to the identifier, and hash_link is a pointer to the symbol table entry for the next identifier encountered, if any, with the same hash code

The entries at symbol_table[0] thru symbol_table[126] are reserved for the heads of the 127 hash-lists.

For example if X1 is the first identifier encountered in the source with hash-code (say) 30, then an entry for it will be made at symbol_table[30]. If later on, an identifier ZZ is encountered which also has hash-code 30, then an entry will be made for ZZ at the next free index > 127 in symbol_table, and the hash-link in the entry for X1 will be changed from null to point instead to the entry for ZZ.

Within the rules section of the Lex definition file, the regular expression and associated code for an identifier may take a form such as the following: {letter}({letter}|{digit}|'_')* {yylval= find(yytext); return identifier;} where the find function returns the index into the symbol_table of the entry for the identifier, creating an entry if one doesnt already exist

The find function begins as follows: int find(char * name) { int j; j = hash(name); and proceeds according to the flow-diagram on the next slide

Code Generation Using the Symbol Table Lets consider the code required in our simple compiler within our Yacc definition file for addition. To avoid complications, lets assume that the code for our arithmetic expressions requires the use of register AX only

So on symbol stack, positive numbers are indexes of entries for identifiers in symbol_table, and (say) -1 is used as a code for AX: expression : expression + term { c code as described below} The c code should check whether $1 and $3 are positive or negative, and generate appropriate object code for each of the 4 cases.

Case where $1 and $3 are both positive: Generate machine code corresponding to: mov AX, symbol_table[$1].offset; add AX, symbol_table[$3].offset; and set $$ = -1

Case where $1 is neg. and $3 is positive: Generate machine code corresponding to: add AX, symbol_table[$3].offset; and set $$ = -1