Assemblers CSCI/CMPE 3334 David Egle
Fundamental functions Take a program written in assembly language and translate it to machine language (binary) What does this entail? Translate mnemonics to machine code Convert symbolic operands to addresses Convert constants into internal representation Process assembler directives Build machine language instruction Write the object and listing files
Machine dependencies The design and many features of assembler depend on underlying hardware instruction set instruction format registers memory management Some features do not depend on hardware Programmer conveniences Program format etc.
SIC assembly language General format label – must begin in first column operation – may be mnemonic or directive operand – symbol, value, or hexadecimal address comment This is typical of many assembly languages. The specific formats may vary with respect to punctuation, etc.
SIC sample program (1/3) COPY START 1000 COPY FILE FROM INPUT TO OUTPUT FIRST STL RETADR SAVE RETURN ADDRESS CLOOP JSUB RDREC READ INPUT RECORD LDA LENGTH TEST FOR EOF (LENGTH = 0) COMP ZERO JEQ ENDFIL EXIT IF EOF FOUND JSUB WRREC WRITE OUTPUT RECORD J CLOOP LOOP ENDFIL LDA EOF INSERT END OF FILE MARKER STA BUFFER LDA THREE SET LENGTH = 3 STA LENGTH JSUB WRREC WRITE EOF LDL RETADR GET RETURN ADDRESS RSUB RETURN TO CALLER EOF BYTE C'EOF' THREE WORD 3 ZERO WORD 0 RETADR RESW 1 LENGTH RESW 1 LENGTH OF RECORD BUFFER RESB 4096 4096-BYTE BUFFER AREA
SIC sample program (2/3) . . SUBROUTINE TO READ RECORD INTO BUFFER RDREC LDX ZERO CLEAR LOOP COUNTER LDA ZERO CLEAR A TO ZERO RLOOP TD INPUT TEST INPUT DEVICE JEQ RLOOP LOOP UNTIL READY RD INPUT READ CHARACTER INTO REGISTAR A COMP ZERO TEST FOR END OF RECORD (X'00') JEQ EXIT EXIT LOOP IF EOF STCH BUFFER,X STORE CHARCTER IN BUFFER TIX MAXLEN LOOP UNLESS MAX LENGTH JLT RLOOP HAS BEEN REACHED EXIT STX LENGTH SAVE RECORD LENGTH RSUB RETURN TO CALLER INPUT BYTE X'F1' CODE FOR INPUT DEVICE MAXLEN WORD 4096
SIC sample program (3/3) . . SUBROUTINE TO WRITE RECORD FROM BUFFER WRREC LDX ZERO CLEAR LOOP COUNTER WLOOP TD OUTPUT TEST OUTPUT DEVICE JEQ WLOOP LOOP UNTIL READY LDCH BUFFER,X GET CHARCTER FROM BUFFER WD OUTPUT WRITE CHARACTER TIX LENGTH LOOP UNTIL ALL CHARACTERS JLT WLOOP HAVE BEEN WRITTEN RSUB RETURN TO CALLER OUTPUT BYTE X'05' CODE FOR OUTPUT DEVICE END FIRST
SIC assembly language (cont’d) Some specifics regarding SIC Directives (pseudo-ops) START END BYTE WORD RESB RESW Indexing uses “,X” format, as in STCH BUFFER,X Comment lines begin with period
End result – the object file For ease in displaying the file, it is stored in ASCII with hexadecimal values for the code NOTE: This is not the usual way of storing the object file HCOPY__00100000107A T0010001E1410334820390010362810303010154820613C100300102A0C103900102D T00101E150C10364820610810334C0000454F46000003000000 T0020391E041030001030E0205D30203FD8205D2810303020575490392C205E38203F T0020571C1010364C0000F1001000041030E02079302064509039DC20792C1036 T002073073820644C000005 E001000
The process So, how do we get from the source FIRST STL RETADR to the object code 141033 The STL part is easy – the code is 14 (hex) BUT, what about the operand, RETADR? What is its value? (It is called a forward reference) Some preliminary work must be done!
Handling forward references We need to know the address of RETADR before we can complete the instruction Two possibilities Store the partial instruction until the address is determined (one pass assembly) Go through the code and determine ALL the addresses before starting the actual translation (two pass assembly) Both have their advantages and disadvantages We consider the two pass version first
Two pass assembler Pass one – BUILD SYMBOL TABLE assign addresses to all statements save addresses assigned to labels in symbol table perform some processing of assembler directives Pass two – TRANSLATION convert mnemonics & symbols to machine code convert constants process remaining assembler directives write machine code to object file; also write listing file In both passes, try to detect any errors in syntax
Data structures The location counter (called LOCCTR in text) Tables Initialized to address specified in START statement (which is a hexadecimal value) As each statement is examined, it’s length is added to the LOCCTR Tables Table of opcode values (OPTAB) Table of symbol values (SYMTAB) Files Intermediate file Object file Listing file
Opcode table Contains (for SIC) May also contain mnemonic its equivalent machine language value May also contain instruction format required operands Usually organized as a closed (static) hash table fast retrieval with minimal searching
Symbol Table Contains (for SIC) May also contain associated value (usually the address) May also contain type length scope information Also organized as a hash table but it is dynamic
Intermediate file Written in pass 1 and used in pass 2 Contains (at minimum) source statement address mnemonic or its value operand(s) errors in statement May contain pointers to OPTAB or SYMTAB
Object file Contains results of assembly Format varies according to designer Usually read by a linker/loader program Several record types (this also can vary) header record linkage records code records end record
Listing file Contains results of assembly for viewing by the programmer (or others) Format varies but usually includes address of statement machine code equivalent source statement (including comments) errors associated with each statement
Algorithm for pass 1 (SIC) read first input line and write to intermediate file if OPCODE = ‘START’ then { save #{OPERAND} as starting address initialize LOCCTR to starting address read next input line and write to intermediate file } else set LOCCTR=0 while OPCODE != ‘END’ do { if this is not a comment line then handle line 1 save (LOCCTR – starting address) as program length
Pass 1: handle line 1 If there is a symbol in the LABEL field then { search SYMTAB for LABEL if found then set error flag (duplicate label) else insert (LABEL, LOCCTR) into SYMTAB } search OPTAB for OPCODE if found then add 3 to LOCCTR else if OPCODE = ‘WORD’ then add 3 to LOCCTR else if OPCODE = ‘RESB’ then add #[OPERAND] to LOCCTR else if OPCODE = ‘RESW’ then add 3* #[OPERAND] to LOCCTR else if OPCODE = ‘BYTE’ then { find length of constant in bytes add length to LOCCTR } else set error flag (invalid operation code)
Algorithm for pass 2 (SIC) read first input line from intermediate file if OPCODE = ‘START’ then { write listing line read next input line } write header record to object file initialize first text record while OPCODE != ‘END’ do { if this is not a comment line then handle line 2 write listing line write last text record to object file write end record to object file write last listing line
Pass 2: handle line 2 search OPTAB for OPCODE if found then { if there is a symbol in OPERAND field then { search SYMTAB for OPERAND store symbol value as operand address else store 0 as operand address set error flag (undefined symbol) } else if OPCODE = ‘BYTE’ or ‘WORD’ then convert constant to object code if object code will not fit into current text record then { write text record to object file initialize new text record } add object code to text record
Example (Exercise 2.1.2) Assemble the following: SUM START 4000 FIRST LDX ZERO INITIALIZE REGISTERS LDA ZERO LOOP ADD TABLE,X ADD THE ELEMENTS TIX COUNT ARE WE DONE? JLT LOOP IF NOT, LOOP STA TOTAL STORE THE TOTAL RSUB AND RETURN TABLE RESW 2000 FOR THE ARRAY COUNT RESW 1 NUMBER OF ELEMENTS ZERO WORD 0 CONSTANT TOTAL RESW 1 PLACE FOR TOTAL END FIRST