COMP 2003: Assembly Language and Digital Logic Chapter 6: Becoming the Machine Notes by Neil Dickson This chapter discusses machine code: what all executables are made of. Note that the examples go into great detail, but don’t be frightened, since knowing the gory details is much less important than understanding generally how instructions are encoded. The examples are just to give a flavour of the variations that can occur.
Machine Code The CPU doesn’t understand text Need a concise way of representing an instruction such that it is easy (fast) for the CPU to determine what to do This representation is called machine code
Example of Machine Code address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C Notice that the line labels take up no space. They are just names for addresses.
Example of Machine Code address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C +2 +5 +2 +2 +7 +3 +2 +1 Notice that the increase in address is the size of the instruction.
x86 Instruction Machine Code prefix(es) REX prefix opcode mod-reg-r/m SIB offset immediate opcode: main indication of what the instruction is; looked up in an opcode map the only part present in all instructions may be multiple bytes, or use reg in mod-reg-r/m if only one operand or an immediate mod-reg-r/m byte: mod (high 2 bits): 0 = r/m is memory & no offset; 1 = memory & 8-bit offset; 2 = memory & 32-bit offset; 3 = r/m is a register reg (middle 3 bits): specifies the register (eax=0 to edi=7) used as the register operand r/m (low 3 bits): if mod=3, specifies the other register used as an operand, else specifies an addressing register scale-index-base byte: allows 2 addressing registers; present iff mod≠3 and r/m=4 (esp) scale (high 2 bits): power of two by which to multiply the index register (0reg; 1reg*2; 2reg*4; 3reg*4) index (middle 3 bits): addressing register to be multiplied by 2scale base (low 3 bits): addressing register not to be multiplied only esp used for addressing if index=4 (esp) and base=4 (esp) prefixes: most common prefix is 66h, which changes the operand size from dwords to words
Let’s look back at our example code Register Numbers eax ax al 1 ecx 1 cx 1 cl 2 edx 2 dx 2 dl 3 ebx 3 bx 3 bl 4 esp 4 sp 4 ah 5 ebp 5 bp 5 ch 6 esi 6 si 6 dh 7 edi 7 di 7 bh Let’s look back at our example code
Decoding Machine Code 11011000 mod=11=3=both registers; reg=011=3=ebx; r/m=000=0=eax address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that: 3B cmp register,dword ptr register/memory & followed by mod-reg-r/m
Decoding Machine Code address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that BB mov ebx,constant & followed by 32-bit constant opcode map says that C3 ret
Decoding Machine Code 00000100 mod=0=no offset; reg is ignored; r/m=4=followed by SIB address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C 10011001 scale=2=index*4; index=3=ebx; base=1=ecx opcode map says that C7 mov dword ptr register/memory,constant & followed by mod-reg-r/m & 32-bit constant at the end
Decoding Machine Code 11100011 mod=3=register; reg=4=mul in opcode map; r/m=3=ebx address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that F7 ??? dword ptr register/memory & followed by mod-reg-r/m where reg specifies the operation (from not, neg, mul, div, ...) similar for the add instruction
What about jumps and calls? Opcode indicates that it is a jump or call and the condition (if conditional jump) Opcode is followed by a signed constant that is the number to add to eip if the condition is met i.e. jumps and calls are relative to the following instruction because eip contains the address of the following instruction
Decoding Machine Code Jumps 000001FF (address of following instruction) + 0C = 0000020B, address of Done address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that 73 jae LineLabel & followed by 8-bit signed relative address of LineLabel
Decoding Machine Code Jumps 0000020B (address of following instruction) + FFFFFFF0 = 0000020B + (-10) = 000001FB, address of NextPixel sign-extended address encoding source code 000001F4 F7 E3 mul ebx 000001F6 BB 00000000 mov ebx,0 000001FB NextPixel: 000001FB 3B D8 cmp ebx,eax 000001FD 73 0C jae Done 000001FF C7 04 99 mov dword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 add ebx,1 00000209 EB F0 jmp NextPixel 0000020B Done: 0000020B C3 ret 0000020C Note: Jumps beyond -128 bytes or +127 bytes and all calls have a 32-bit relative address instead. opcode map says that EB jmp LineLabel & followed by 8-bit signed relative address of LineLabel