Chih-Hung Wang Chapter 2: Assembler (Full) 參考書目 Leland L. Beck, System Software: An Introduction to Systems Programming (3rd), Addison-Wesley, 1997. 1.

Slides:



Advertisements
Similar presentations
The Assembly Language Level
Advertisements

Chapter 3 Loaders and Linkers
Chapter 3 Loaders and Linkers
Macro Processor.
Machine Independent Assembler Features
Assembler Design Options
System Software Chih-Shun Hsu
Machine-Dependent Assembler Features (SIC/XE Assembler) Instruction Formats, Addressing Modes, and Program Relocation.
Chapter 6: Machine dependent Assembler Features
CPS4200 System Programming 2007 Spring 1 Systems Programming Chapter 2 Assembler I.
Chih-Hung Wang Chapter 2: Assembler (Part-1) 參考書目 Leland L. Beck, System Software: An Introduction to Systems Programming (3rd), Addison-Wesley, 1997.
System Software by Leland L. Beck Chapter 2
Assembler – Assembler Design Options. One-Pass Assemblers (1/2) Main problem  Forward references Data items Labels on instructions Solution  Data items:
Chapter 2 Assemblers Assembler Linker Source Program Object Code
CS2422 Assembly Language & System Programming December 22, 2005.
Assemblers Dr. Monther Aldwairi 10/21/20071Dr. Monther Aldwairi.
1 Chapter 2 Assemblers Source Program Assembler Object Code Loader.
An introduction to systems programming
Chih-Hung Wang Chapter 1: Background (Part-1) 參考書目 Leland L. Beck, System Software: An Introduction to Systems Programming (3rd), Addison-Wesley, 1997.
Machine-Independent Assembler Features
Assembler (Basic Functions)
Assembler – Machine Independent Features. Literals Design idea  Let programmers to be able to write the value of a constant operand as a part of the.
CS2422 Assembly Language & System Programming December 14, 2006.
UNIT II ASSEMBLERS.
A Simple Two-Pass Assembler
Assembler Design Options
Assemblers.
CS2422 Assembly Language and System Programming Machine Independent Assembler Features Department of Computer Science National Tsing Hua University.
1 Assemblers System Software by Leland L. Beck Chapter 2.
Assemblers System Software by Leland L. Beck Chapter 2.
CS2422 Assembly Language and System Programming Assembler Design Options Department of Computer Science National Tsing Hua University.
1 Assemblers System Programming by Leland L. Beck Chapter 2.
Machine-Independent Assembler Features Literals, Symbol-Defining Statements, Expressions, Program Blocks, Control Sections and Program Linking.
2 : Assembler 1 Chapter II: Assembler Chapter goal: r Introduce the fundamental functions that any assembler must perform. m Assign machine address m Translate.
1 Assemblers System Software by Leland L. Beck Chapter 2.
Machine Independent Assembler Features
G.Umamaheswari Lect/IT R.M.D.EC system software
Loader and Linker.
Assemblers Two functions: – Mnemonic opcode  Machine Code – Symbolic labels  machine addresses (in memory) Some features: – Depend on assembly language.
Assembler Design Options One-Pass and Multi-Pass Assemblers.
Assemblers System Software.
COMPILERS CLASS IV Er. Vikram Dhiman M.tech NIT jalandhar.
Linking Loader untuk SIC/XE Machine. Lebih Lanjut mengenai Absolute Loader Shortcoming of an absolute loader –Programmer needs to specify the actual address.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 10 – Loaders.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 4 - Assembler 1.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 9 - Assembler 4.
ADHIPARASAKTHI ENGINEERING COLLEGE
CC410: System Programming
Machine dependent Assembler Features
CC410: System Programming
Machine Independent Assembler Features
System Programming and administration
System Software by Leland L. Beck Chapter 2
SYSTEM SOFTWARE - UNIT II
Machine Independent Assembler Features
Chapter 9 : Assembler Design Options
Assembler Design Options
Assembler Design Options
Assemblers - 2 CSCI/CMPE 3334 David Egle.
Optional Assembler Features
Machine Independent Features
Assembler Design Options
A Simple Two-Pass Assembler
Optional Assembler Features 2
System Programming by Leland L. Beck Chapter 2
Assemblers CSCI/CMPE 3334 David Egle.
Machine Independent Assembler Features
Chapter 1 Computer architecture Languages: machine, assembly, high
Machine Independent Assembler Features
An introduction to systems programming
Presentation transcript:

Chih-Hung Wang Chapter 2: Assembler (Full) 參考書目 Leland L. Beck, System Software: An Introduction to Systems Programming (3rd), Addison-Wesley,

Role of Assembler Source Program Assembler Object Code Loader Executable Code Linker 2

Chapter 2 -- Outline  Basic Assembler Functions  Machine-dependent Assembler Features  Machine-independent Assembler Features  Assembler Design Options 3

Introduction to Assemblers  Fundamental functions  Translating mnemonic operation codes to their machine language equivalents  Assigning machine addresses to symbolic labels  Machine dependency  Different machine instruction formats and codes 4

Example Program (Fig. 2.1)  Purpose  Reads records from input device (code F1)  Copies them to output device (code 05)  At the end of the file, writes EOF on the output device, then RSUB to the operating system  Program (See Fig. 2.1) 5

SIC Assembly Program (Fig. 2.1) Line numbers (for reference) Address labels Mnemonic opcode operands comments 6

SIC Assembly Program (Fig. 2.1) Index addressing Indicate comment lines 7

SIC Assembly Program (Fig. 2.1) 8

Example Program (Fig. 2.1)  Data transfer (RD, WD)  a buffer is used to store record  buffering is necessary for different I/O rates  the end of each record is marked with a null character (00 16 )  the end of the file is indicated by a zero-length record  Subroutines (JSUB, RSUB)  RDREC, WRREC  save link register first before nested jump 9

Assembler Directives  Pseudo-Instructions  Not translated into machine instructions  Providing information to the assembler  Basic assembler directives  START :  Specify name and starting address for the program  END :  Indicate the end of the source program, and (optionally) the first executable instruction in the program.  BYTE :  Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant.  WORD :  Generate one-word integer constant  RESB :  Reserve the indicated number of bytes for a data area  RESW :  Reserve the indicated number of words for a data area 10

Object Program  Header Col. 1H Col. 2~7Program name Col. 8~13Starting address (hex) Col Length of object program in bytes (hex)  Text Col.1 T Col.2~7Starting address in this record (hex) Col. 8~9Length of object code in this record in bytes (hex) Col. 10~69Object code ( )/6=10 instructions  End Col.1E Col.2~7Address of first executable instruction (hex) (END program_name) 11

Fig. 2.3 (Object Program) : Storage reserved by the loader

Assembler Tasks  The translation of source program to object code requires us the accomplish the following functions:  Convert mnemonic operation codes to their machine language equivalents (e.g. translate STL to 14 - Line 10)  Convert symbolic operands to their equivalent machine addresses format (e.g. translate RETARD to Line 10)  Build machine instructions in the proper format  Convert the data constants specified in the source program into their internal machine representations (e.g. translate EOF to 454F46) - Line 80  Write object program and the assembly listing 13

Example of Instruction Assemble  Forward reference STCH BUFFER,X (54) 16 1 (001) 2 (039)

Forward Reference  A reference to a label (RETADR) that is defined later in the program  Solution  Two passes  First pass: does little more than scan the source program for label definition and assign addresses (such as those in the Loc column in Fig. 2.2).  Second pass: performs most of the actual instruction translation previously defined. 15

Difficulties: Forward Reference  Forward reference: reference to a label that is defined later in the program. LocLabelOperatorOperand 1000FIRSTSTLRETADR 1003CLOOPJSUBRDREC … ………… 1012JCLOOP … ………… 1033RETADRRESW1 16

Two Pass SIC Assembler  Pass 1 (define symbols)  Assign addresses to all statements in the program  Save the addresses assigned to all labels for use in Pass 2  Perform assembler directives, including those for address assignment, such as BYTE and RESW  Pass 2 (assemble instructions and generate object program)  Assemble instructions (generate opcode and look up addresses)  Generate data values defined by BYTE, WORD  Perform processing of assembler directives not done during Pass 1  Write the object program and the assembly listing 17

Two Pass SIC Assembler  Read from input line  LABEL, OPCODE, OPERAND Pass 1Pass 2 Intermediate file Object codes Source program OPTAB SYMTAB 18

Assembler Data Structures  Operation Code Table (OPTAB)  Symbol Table (SYMTAB)  Location Counter (LOCCTR) Source Object Program Intermediate file Pass 1 Pass 2 OPTAB SYMTAB LOCCTR 19

Location Counter ( LOCCTR)  A variable that is used to help in the assignment of addresses, i.e., LOCCTR gives the address of the associated label.  LOCCTR is initialized to be the beginning address specified in the START statement.  After each source statement is processed during pass 1, the length of assembled instruction or data area to be generated is added to LOCCTR. 20

Operation Code Table ( OPTAB)  Contents:  Mnemonic operation codes (as the keys)  Machine language equivalents  Instruction format and length  Note: SIC/XE has instructions of different lengths  During pass 1:  Validate operation codes  Find the instruction length to increase LOCCTR  During pass 2:  Determine the instruction format  Translate the operation codes to their machine language equivalents  Implementation: a static hash table (entries are not normally added to or deleted from it)  Hash table organization is particularly appropriate 21

SYMTAB  Contents:  Label name  Label address  Flags (to indicate error conditions)  Data type or length  During pass 1:  Store label name and assigned address (from LOCCTR) in SYMTAB  During pass 2:  Symbols used as operands are looked up in SYMTAB  Implementation:  a dynamic hash table for efficient insertion and retrieval  Should perform well with non-random keys (LOOP1, LOOP2). COPY1000 FIRST 1000 CLOOP1003 ENDFIL1015 EOF1024 THREE102D ZERO1030 RETADR1033 LENGTH1036 BUFFER1039 RDREC

Fig. 2.2 (1) Program with Object code 23

Fig. 2.2 (2) Program with Object code 24

Fig. 2.2 (3) Program with Object code 25

Figure 2.1 (Pseudo code Pass 1) 26

Figure 2.1 (Pseudo code Pass 1) 27

Figure 2.1 (Pseudo code Pass 2) 28

Figure 2.1 (Pseudo code Pass 2) 29

SIC/XE Assembly Program indirect addressing immediate addressing extended format 30

SIC/XE Assembly Program 31

SIC/XE Assembly Program 32

Benefits of SIC/XE Addressing Modes  Register-to-register instructions  Shorter than register-to-memory instructions  No memory reference  Immediate addressing mode  No memory reference. The operand is already present as part of the instruction  Indirect addressing mode  Avoids the needs for another instruction  Relative addressing mode  Shorten than the extended instruction  Easy program relocation 33

Considering Instruction Formats  START directive specifies a beginning program address of 0: a relocatable program.  Register-to-register instructions: simply convert the mnemonic name to their number equivalents  OPTAB: for opcodes  SYMTAB: preloaded with register names and their values 34

 COMPR A,S  A004 CLEAR X  B410 35

Considering Addressing Modes  PC or base relative addressing  Calculate displacement  Displacement must be small enough to fit in the 12-bit field ( for PC relative mode, for base relative mode)  Extended instruction format (4-byte)  20-bit field for direct addressing 36

How Assembler Recognizes the Addressing Mode  Extended format:+op m  Indirect addressing:  Immediate addressing: op #c  Index addressing: op m,X  Relative addressing: op m  1st choice: PC relative (arbitrarily chosen)  2nd choice: base relative (if displacement is invalid in PC relative mode)  3rd choice: error message (if displacement is invalid in both relative modes) 37

SIC/XE Assembly with Object Code 38

SIC/XE Assembly with Object Code 39

SIC/XE Assembly with Object Code 40

Instruction: LDA # (00) (003) 16 (01) 16 (0) 16 (003) 16 Instruction: C +LDT # (74) (01000) 16 (75) 16 (1) 16 (01000) 16 Immediate Addressing Mode 41

Extended Format Instruction: CLOOP +JSUB RDREC 4B (48) (01036) 16 (4B) 16 (1) 16 (01036) 16 42

PC Relative Addressing Mode Instruction: FIRST STL RETADR 17202D LDB #LENGTH 69202D : : RETADR RESW 1 (14) (02D) 16 (17) 16 (2) 16 (02D) 16 PC is advanced after each instruction is fetched and before it is executed. That is, PC contains the address of the next instruction. disp = (0030) 16 -(0003) 16 = (002D) 16 43

PC Relative Addressing Mode Instruction: CLOOP +JSUB RDREC 4B : : J CLOOP 3F2FEC A ENDFIL LDA EOF (3C) (FEC) 16 (3F) 16 (2) 16 (FEC) 16 disp = (006) 16 -(01A) 16 = (FEC)

Base Relative Addressing Mode Instruction: LDB #LENGTH 69202D 13 BASE LENGTH : : LENGTH RESW BUFFER RESB 4096 : : E STCH BUFFER,X 57C003 (54) (003) 16 (57) 16 (C) 16 (003) 16 disp = (0036) 16 -(0033) 16 = (0003) 16 PC relative is no longer applicable BASE directive explicitly informs the assembler that the base register will contain the address of LENGTH (use NOBASE to invalidate) LDB loads the address of LENGTH into base register during execution 45

Instruction: LDB #LENGTH 69202D 13 BASE LENGTH CLOOP +JSUB RDREC 4B : : LENGTH RESW 1 (68) (02D) 16 (69) 16 (2) 16 (02D) 16 Immediate + PC Relative Addressing Mode disp = (0033) 16 -(0006) 16 = (002D) 16 46

Instruction: A 3E D EOF BYTE C’EOF’ 454F RETADR RESW 1 Indirect + PC Relative Addressing Mode (3C) (003) 16 (3E) 16 (2) 16 (003) 16 disp = (0030) 16 -(002D) 16 = (0003) 16 47

Why Program Relocation  To increase the productivity of the machine  Want to load and run several programs at the same time (multiprogramming)  Must be able to load programs into memory wherever there is room  Actual starting address of the program is not known until load time 48

Absolute Program  Program with starting address specified at assembly time  In the example of SIC assembly program  The address may be invalid if the program is loaded into some where else. Instruction: B LDA THREE 00102D 49 Calculated from the starting address 1000

Relocatable Program 50

 Need to be modified:  The address portion of those instructions that use absolute (direct) addresses.  Need not be modified:  Register-to-register instructions (no memory references)  PC or base-relative addressing (relative displacement remains the same regardless of different starting addresses) What Needs to be Relocated 51

 For Assembler  For an address label, its address is assigned relative to the start of the program (that’s why START 0)  Produce a modification record to store the starting location and the length of the address field to be modified.  For loader  For each modification record, add the actual beginning address of the program to the address field at load time. How to Relocate Addresses 52

Format of Modification Record  One modification record for each address to be modified  The length is stored in half-bytes (20 bits = 5 half-bytes)  The starting location is the location of the byte containing the leftmost bits of the address field to be modified.  If the field contains an odd number of half-bytes, the starting location begins in the middle of the first byte. 53

Relocatable Object Program 15 +JSUB RDREC 5 half-bytes 35 +JSUB WRREC 65 +JSUB WRREC 54

Machine Independent Assembler Features  Features are not closely related to machine architecture.  More related to issues about:  Programmer convenience  Software environment  Common examples:  Literals  Symbol-defining statements  Expressions  Program blocks  Control sections  Assembler directives are widely used to support these features 55

Literals  Literal is equivalent to:  Define a constant explicitly and assign an address label for it  Use the label as the instruction operand  Why use literals:  To avoid defining the constant somewhere and making up a label for it  Instead, to write the value of a constant operand as a part of the instruction  How to use literals:  A literal is identified with the prefix =, followed by a specification of the literal value 56

Original Program 57

Using Literal 58

Object Program Using Literal The same as before 59

Original Program 60

Using Literal 61

Object Program Using Literal The same as before 62

Literal vs. Immediate Addressing  Same:  Operand field contains constant values  Difference:  Immediate addressing: the assembler put the constant value as part of the machine instruction  Literal: the assembler store the constant value elsewhere and put that address as part of the machine instruction 63

Literal Pool  All of the literal operands are gathered together into one or more literal pools.  literal pool:  At the location where the LTORG directive is encountered  To keep the literal operand close to the instruction that uses it  At the end of the object program, generated immediately following the END statement 64

Duplicate Literals  Duplicate literals:  The same literal used more than once in the program  Only one copy of the specified value needs to be stored  For example, =X’05’ in the example program  How to recognize the duplicate literals  Compare the character strings defining them  Easier to implement, but has potential problem (see next)  E.g., =X’05’  Compare the generated data value  Better, but will increase the complexity of the assembler  E.g., =C’EOF’ and =X’454F46’ 65

Problem of Duplicate-Literal Recognition using Character Strings  There may be some literals that have the same name, but different values  For example, the literal whose value depends on its location in the program  The value of location counter denoted by * BASE * LDB=*  The literal =* repeatedly used in the program has the same name, but different values  All this kind of literals have to be stored in the literal pool 66

Implementation of Literal  Data structure: a literal table LITTAB  Literal name  Operand value and length  Address  LITTAB is often organized as a hash table, using the literal name or value as the key 67

Implementation of Literal  Pass 1  As each literal operand is recognized  Search the LITTAB for the specified literal name or value  If the literal is already present, no action is needed  Otherwise, the literal is added to LITTAB (store the name, value, and length, but not address)  As LTORG or END is encountered  Scan the LITTAB  For each literal with empty address field, assign the address and update the LOCCTR accordingly 68

Implementation of Literal  Pass 2  As each literal operand is recognized  Search the LITTAB for the specified literal name or value  If the literal is found, use the associated address as the operand of the instruction  Otherwise, error (should not happen)  As LTORG or END is encountered  insert the data values of the literals in the object program  Modification record is generated if necessary 69

Symbol-Defining Statements  How to define symbols and their values  Address label  The label is the symbol name and the assigned address is its value FIRST STL RETADR  Assembler directive EQU symbol EQU value  This statement enters the symbol into SYMTAB and assigns to it the value specified  The value can be a constant or an expression  Assembler directive ORG ORG value 70

Use of EQU  To improve the program readability, avoid using the magic numbers, make it easier to find and change constant values  +LDT #4096  MAXLEN EQU LDT #MAXLEN  To define mnemonic names for registers  A EQU 0  X EQU 1  BASE EQU R1  COUNT EQU R2 71

Use of ORG  Indirect value assignment: ORG value  When ORG is encountered, the assembler resets its LOCCTR to the specified value  ORG will affect the values of all labels defined until the next ORG  If the previous value of LOCCTR can be automatically remembered, we can return to the normal use of LOCCTR by simply write ORG 72

Example of Using ORG  Data structure  SYMBOL: 6 bytes  VALUE: 3 bytes (one word)  FLAGS: 2 bytes  Refer to every field of each entry 73

Not Using ORG  We can fetch the VALUE field by LDA VALUE,X  X = 0, 11, 22, … for each entry 74 Offsets from STAB Less readable and meaningful

Using ORG Size of field more meaningful Restore the LOCCTR to its previous value Or only use ORG Set the LOCCTR to STAB 75

 Forward reference is not allowed for EQU and ORG.  That is, all terms in the value field must have been defined previously in the program.  The reason is that all symbols must have been defined during Pass 1 in a two-pass assembler. Allowed Not allowed Forward-Reference Problem 76

Not allowed Forward-Reference Problem 77

Expressions  A single term as an instruction operand can be replaced by an expression. STAB RESB 1100 STAB RESB 11*100 STAB RESB (6+3+2)*MAXENTRIES  The assembler has to evaluate the expression to produce a single operand address or value.  Expressions consist of  Operator  +,-,*,/ (division is usually defined to produce an integer result)  Individual terms  Constants  User-defined symbols  Special terms, e.g., *, the current value of LOCCTR 78

Relocation Problem in Expressions  Values of terms can be  Absolute (independent of program location)  constants  Relative (to the beginning of the program)  Address labels  * (value of LOCCTR)  Expressions can be  Absolute  Only absolute terms  Relative terms in pairs with opposite signs for each pair  Relative  All the relative terms except one can be paired as described in “absolute”. The remaining unpaired relative term must have a positive sign.  No relative terms may enter into a multiplication or division operation  Expressions that do not meet the conditions of either “absolute” or “relative” should be flagged as errors. 79

Absolute Expression  Relative term or expression implicitly represents (S+r)  S: the starting address of the program  r: value of the term or expression relative to S  For example  BUFFER: S+r1  BUFEND: S+r2  The expression, BUFEND-BUFFER, is absolute.  MAXLEN = (S+r2)-(S+r1) = r2-r1 (no S here)  MAXLEN means the length of the buffer area  Illegal expressions: BUFEND+BUFFER, 100-BUFFER, 3*BUFFER Values associated with symbols 80

Absolute or Relative  To determine the type of an expression, we must keep track of the types of all symbols defined in the program.  We need a “flag” in the SYMTAB for indication. 81

Program Blocks  Collect many pieces of code/data that scatter in the source program but have the same kind into a single block in the generated object program.  For example, code block, initialized data block, un- initialized data block. (Like code, data segments on a Pentium PC).  Advantage:  Because pieces of code are closer to each other now, format 4 can be replaced with format 3, saving space and execution time.  Code sharing and data protection can better be done.  With this function, in the source program, the programmer can put related code and data near each other for better readability. 82

Advantages of Using Program blocks  To satisfy the contradictive goals:  Separate the program into blocks in a particular order  Large buffer area is moved to the end of the object program  Using the extended format instructions or base relative mode may be reduced. (lines 15, 35, and 65)  Placement of literal pool is easier: simply put them before the large data area, CDATA block. (line 253)  Data areas are scattered  Program readability is better if data areas are placed in the source program close to the statements that reference them. 83

Program Block Example Default block. 84

Use the default block. 85

Use the default block. At the beginning of the program, statements are assumed to be part of the unnamed (default) block. The default block (unnamed) contains the executable instructions. The CDATA block contains all data areas that are a few words or less in length. The CBLKS block contain all data areas that consist of large blocks of memory. 86

Job of Assembler  A program block may contain several separate segments of the source program.  The assembler will (logically) rearrange these segments to gather together the pieces of each block.  These blocks will then be assigned addresses in the object program, with the blocks appearing in the same order in which they were first begun in the source program.  The result is the same as if the programmer had physically rearranged the source statements to group together all the source lines belonging to each block. 87

Assembler Processing (1)  Pass 1:  Maintain a separate location counter for each program block.  The location counter for a block is initialized to 0 when the block is first begun.  The current value of this location counter is saved when switching to another block, and the saved value is restored when resuming a previous block.  Thus, during pass 1, each label is assigned an address that is relative to the beginning of the block that contains it.  After pass 1, the latest value of the location counter for each block indicates the length of that block.  The assembler then can assign to each block a starting address in the object program. 88

Assembler Processing (2)  Pass 2  When generating object code, the assembler needs the address for each symbol relative to the start of the object program (not the start of an individual problem block)  This can be easily done by adding the location of the symbol (relative to the start of its block) to the assigned block starting address. 89

Figure 2.12 (a) There is no block number for MAXLEN. This is because MAXLEN is an absolute symbol. 90

Symbol Table After Pass 1 92

Object Code in Pass 2  LDA LENGTH The SYMTAB shows that LENGTH has a relative address 0003 within problem block 1 (CDATA). The starting address for CDATA is Thus the desired target address is = Because this instruction is assembled using program counter-relative addressing, and PC will be 0009 when the instruction is executed (the starting address for the default block is 0), the displacement is 0069 – 0009 =

Advantages  Because the large buffer area is moved to the end of the object program, we no longer need to use format 4 instructions on line 15, 35, and 65.  For the same reason, use of the base register is no longer necessary; the LDB and BASE have been deleted.  Code sharing and data protection can be more easily achieved. 94

Object Code (Figure 2.13)  Although the assembler internally rearranges code and data to form blocks, the generated code and data need not be physically rearranged. The assembler can simple write the object code as it is generated during pass 2 and insert the proper load address in each text record. 95

Leave the Job to Loader No code need to be generated for these two blocks. We just need to reserve space for them. 96

Control Section  A control section is a part of the program that maintains its identity after assembly.  Each such control section can be loaded and relocated independently of the others. (Main advantage)  Different control sections are often used for subroutines or other logical subdivisions of a program.  The programmer can assemble, load, and manipulate each of these control sections separately. 97

Program Linking  Instructions in one control section may need to refer to instructions or data located in another control section. (Like external variables used in C language)  Thus, program (actually, control section) linking is necessary.  Because control sections are independently loaded and relocated, the assembler is unable to know a symbol’s address at assembly time. This job can only be delayed and performed by the loader.  We call the references that are between control sections “external references”.  The assembler generates information for each external reference that will allow the loader to perform the required linking. 98

Control Section Example Default control section 99

A new control section 100

A new control section 101

External References  Symbols that are defined in one control section cannot be used directly by another control section.  They must be identified as external references for the loader to handle.  Two assembler directives are used:  EXTDEF (external definition)  Identify those symbols that are defined in this control section and can be used in other control sections.  Control section names are automatically considered as external symbols.  EXTREF (external reference)  Identify those symbols that are used in this control section but defined in other control sections. 102

Code Involving External Reference (1)  CLOOP +JSUB RDREC 4B  The operand (RDREC) is named in the EXTREF statement, therefore this is an external reference.  Because the assembler has no idea where the control section containing RDREC will be loaded, it cannot assemble the address for this instruction.  Therefore, it inserts an address of zero.  Because the RDREC has no predictable relationship to anything in this control section, relative addressing cannot be used.  Instead, an extended format instruction must be used.  This is true of any instruction whose operand involves an external reference. 103

Code Involving External Reference (2)  STCH BUFFER,X  This instruction makes an external reference to BUFFER.  The instruction is thus assembled using extended format with an address of zero.  The x bit is set to 1 to indicate indexed addressing. 104

Code Involving External Reference (3)  MAXLEN WORD BUFEND – BUFFER  The value of the data word to be generated is specified by an expression involving two external references.  As such, the assembler stores this value as zero.  When the program is loaded, the loader will add to this data area the address of BUFEND and subtract from it the address of BUFFER, which then results in the desired value.  Notice the difference between line 190 and 107. In line 107, EQU can be used because BUFEND and BUFFER are defined in the same control section and thus their difference can be immediately calculated by the assembler. 105

Figure 2.16 Program Object Code (1) 106

Figure 2.16 Program Object Code (2) 107

Figure 2.16 Program Object Code (3) 108

External Reference Processing  The assembler must remember (via entries in SYMTAB) in which control section a symbol is defined.  Any attempt to refer to a symbol in another control section must be flagged as an error unless the symbol is identified (via EXTREF) as an external reference.  The assembler must allow the same symbol to be used in different control sections.  E.g., the conflicting definitions of MAXLEN on line 107 and 190 should be allowed. 109

Two New Record Types (1)  We need two new record types in the object program and a change in the previous defined modification record type.  Define record  Give information about external symbols that are defined in this control section  Refer record  List symbols that are used as external references by this control section. 110

111 Two New Record Types (2)

Revised Modification Record 112

Object Program (Figure 2.17) 113

Program Relocation  The modified “modification record” can still be used for program relocation. Program name 114

More Restriction on Expression  Previously we required that all of the relative terms in an expression be paired to make the expression an absolute expression.  With control sections, the above requirement is not enough.  We must require that both terms in each pair must be relative within the same control section.  BUFEND- BUFFER (allowed) because they are defined in the same control section.  On the other hand, RDRED – COPY (not allowed) because the value is unpredictable.  How to enforce this restriction  When an expression involves external references, the assembler cannot determine whether or not the expression is legal. The assembler evaluates all of the terms it can, combines these to form an initial expression value, and generates Modification records. The loader checks the expression for errors and finishes the evaluation. 115

Assembler Design Options - One and Multi-Pass Assembler  So far, we have presented the design and implementation of a two-pass assembler.  Here, we will present the design and implementation of  One-pass assembler  If avoiding a second pass over the source program is necessary or desirable.  Multi-pass assembler  Allow forward references during symbol definition. 116

One-Pass Assembler  The main problem is about forward reference.  Eliminating forward reference to data items can be easily done.  Simply ask the programmer to define variables before using them.  However, eliminating forward reference to instruction cannot be easily done.  Sometimes your program needs a forward jump.  Asking your program to use only backward jumps is too restrictive. 117

Program Example 118

119

All variables are defined before they are used. 120

Two Types of One-pass Assembler  There are two types of one-pass assembler:  Produce object code directly in memory for immediate execution  No loader is needed  Load-and-go for program development and testing  Good for computing center where most students reassemble their programs each time.  Can save time for scanning the source code again  Produce the usual kind of object program for later execution 121

Internal Implementation  The assembler generate object code instructions as it scans the source program.  If an instruction operand is a symbol that has not yet been defined, the operand address is omitted when the instruction is assembled.  The symbol used as an operand is entered into the symbol table.  This entry is flagged to indicate that the symbol is undefined yet. 122

Internal Implementation (cont’d)  The address of the operand field of the instruction that refers to the undefined symbol is added to a list of forward references associated with the symbol table entry.  When the definition of the symbol is encountered, the forward reference list for that symbol is scanned, and the proper address is inserted into any instruction previously generated. 123

Processing Example After scanning line

Processing Example (cont’d) After scanning line

Processing Example (cont’d)  Between scanning line 40 and 160:  On line 45, when the symbol ENDFIL is defined, the assembler places its value in the SYMTAB entry.  The assembler then inserts this value into the instruction operand field (at address 201C).  From this point on, any references to ENDFIL would not be forward references and would not be entered into a list.  At the end of the processing of the program, any SYMTAB entries that are still marked with * indicate undefined symbols.  These should be flagged by the assembler as errors. 126

Multi-Pass Assembler  If we use a two-pass assembler, the following symbol definition cannot be allowed. ALPHA EQU BETA BETA EQU DELTA DELTA RESW1  This is because ALPHA and BETA cannot be defined in pass 1. Actually, if we allow multi-pass processing, DELTA is defined in pass 1, BETA is defined in pass 2, and ALPHA is defined in pass 3, and the above definitions can be allowed.  This is the motivation for using a multi-pass assembler. 127

Multi-Pass Assembler(cont’d)  It is unnecessary for a multi-pass assembler to make more than two passes over the entire program.  Instead, only the parts of the program involving forward references need to be processed in multiple passes.  The method presented here can be used to process any kind of forward references. 128

Multi-Pass Assembler Implementation  Use a symbol table to store symbols that are not totally defined yet.  For a undefined symbol, in its entry,  We store the names and the number of undefined symbols which contribute to the calculation of its value.  We also keep a list of symbols whose values depend on the defined value of this symbol.  When a symbol becomes defined, we use its value to reevaluate the values of all of the symbols that are kept in this list.  The above step is performed recursively. 129

Forward Reference Example 130 LOC:1034

Forward Reference Processing After first line Not defined yet But one symbol is unknown yet Defined 131

After second line Now defined But two symbols are unknown yet 132

After third line 133

After 4’th line Start knowing values 134

After 5’th line All symbols are defined and their values are known now. Start knowing values 135