Introduction to Assembly

Introduction to Assembly
Here we have a brief introduction to Intel Assembly Language This is the assembly language developed for the Intel 8086 processor, which continues to be the basis for all x86/Pentium/iCore processors CISC instruction set Special purpose register set 8 and 16 bit operations initially, expanded to 32 and 64 bit operations for Pentium Memory-register and register-register operations available, several addressing modes including many implied addresses

Instruction Format [name/label] [mnemonic] [operands] [;comments]
Operands are either literals, variables/constants, or registers Number of operands depends on type of instruction, range from 0 to 2 Examples: mov eax, ebx – 2 operands, source and destination mov eax, 5 – one operand is a literal mov y, eax – memory to register movement add eax, 5 – 2 operands for add mul value – 1 operand for mul, other operand is eax nop – no operands for the no-op instruction je location – 1 operand with comparison implied to be a flag

Literals and Variables
Literals require that the type of value be specified by following the value with one of the following: D, d for decimal (the default) H, h for hexadecimal Q, q for octal b for binary Strings are placed in ‘ ’ or “ ” marks Examples: b 0Ah 35 ‘hello’ “goodbye” We will define all assembly code within C/C++ programs so we will declare all variables in C/C++ code int is 32 bit short is 16 bit char is 8 bit We must insure that we place the datum into the right sized register (see next slide) We will only be using decimal literal values or character values in our examples and programs, and decimal values do not need a D/d as it is the default. Characters should be placed inside ‘’ as with C code.

Registers Register sizes: 14 registers, all special purpose
4 data registers EAX – accumulator EAX is an implied register in the Mul and Div instructions EBX – base counter used for addressing, particularly when dealing with arrays and strings EBX can be used as a data register when not used for addressing ECX – counter implicitly used in loop instructions in non-looping instructions, can be used as a data register EDX – data register used for In and Out instructions (input, output), also used to store partial results of Mul and Div operations in other cases, can be used as a data register Register sizes: _X – 16 bits (e.g., AX, BX) _H and _L –8 bits (high and low end of _X registers) E_X – 32 bits (extends the _X registers to 32 bits) We will typically use the extended registers so that we can store our data in int variables. Note that if you do any input or output in C code, it will overwrite the EDX register so be careful when you use the DX/EDX.

Other Registers Other registers can not be used for data but have specific uses Segment registers point to different segments in memory SS – stack CS – code DS – data ES – extra (used as a base pointer for variables) Indexing registers as offsets into local function, stack, or string BP – base pointer used with SS to address subroutine local variables on the stack SP – stack pointer used with SS for top of stack SI and DI – source and destination for string transfers IP – program counter Status flags We won’t be bothering with any of these registers as we won’t be covering any of the more exotic forms of addressing.

Operations: Data Movement
mov and xchg instructions mov allows for register-register, memory-register, register-memory, register-immediate and memory-immediate first item is destination, second is source memory-memory moves must be done with 2 instructions using a register as temporary storage memory references can use direct, direct+offset, or register-indirect modes if datum is 8-bit, register only uses high or low side, 16-bit uses entire register, 32-bit uses extended register (e.g., EAX, EDX) and 64-bit combines two registers xchg instruction allows only register-register, memory-register and register-memory and exchanges two values rather than moves one value as with mov Notice that the mov command performs both loads and stores. The difference is where you put the memory reference (if there is one): mov eax, x – load x into eax mov x, eax – store eax into x mov eax, ebx – register movement, no memory involved mov eax, 0 – immediate load, that is, load a register with an immediate datum mov x, 0 – memory initialization

Operations: Arithmetic/Conditional
inc/dec dest add/sub dest, source dest is register or memory reference, source for add/sub is register, memory reference, or literal, sizes must match mul/div source one datum is source, the other is implied to be eax (or ax or al) destination is implied as eax/edx combined (or ax/dx, al/ah depending on size) source can be a register or memory reference but not a literal (cannot do mul 2) div: quotient in ax, al or eax, remainder in dx, ah or edx mul: result is twice the size, so goes into edx/eax or dx/ax or ah/al shl, shr, sal, sar, shld, shrd , rol, ror, rcl, rcr shift, shift arithmetic, shift double, rotate, rotate w/ carry two operands: item being shifted/rotated, bits shifted/rotated Logic operations: AND, OR, XOR, NOT form is OP dest, source NEG dest convert two’s complement value to its opposite CMP first, second compare first and second and set proper flag(s) (PF, ZF, NF) the result of cmp operations are then used for branch instructions Examples: inc x – increment variable x in memory add eax, 5 – eax  eax + 5 sub x, 1 – x  x – 1 For mul and div, you must first place one datum in the ax/eax register and set edx to 0. mul x – eax = eax * x (with the upper 32 bits of the product being moved into edx) div x – eax = edx || eax / x, edx = edx || eax % x (where || means “concatenate or combine the two registers) We won’t be using the shifts or rotates although they should be self explanatory. CMP is most commonly used immediately before a conditional branch (see the next slide).

Operations: Branches Unconditional branches: Conditional branches:
branch automatically to location jmp location jmp instructions are used to implement goto statements and procedure calls loop location used for downward counting for loops initialize ecx (or cx) to starting value loop location combines dec ecx cmp ecx, 0 jg location Conditional branches: instruction preceded by an instruction which sets at least one status flag, usually a cmp instruction flag tested based on type of branch je/jne location – branch if zero flag set/clear jg/jge/jl/jle location – branch on positive/positive+zero, negative/negative+zero flag set jc/jnc/jz/jnz/jp/jnp location – branch on carry/no carry, zero/not zero, even parity/odd parity The location is a label for a branch. For example, the following would be the same as while (x != 0) { … } top: cmp x, 0 jne xout jmp top xout: … Note that I use xout instead of out because “out” is an assembly language command (for output). Similarly, do not use any C reserved words as labels: if(x < y) x++; else y++; cmp x, y jge xelse inc x jmp xout xelse: inc y xout: …

Addressing Modes Immediate – place datum in instruction as a literal
add eax, 10 use this mode when datum is known at program implementation time Direct – place variable in instruction mov eax, x ; moves x into register ax add y, eax ; sets y = [y] + [eax] use this mode to access a variable in memory Direct + Offset mov eax, x ; eax  x[4 bytes] – this is not the same as x[4] mov eax, x[ebx] ; eax  x[ebx] –ebx stores the byte offset Note: mov eax, x[y] is illegal because it has 2 memory references use this mode when dealing with strings, arrays and structs Register Indirect – use index and/or segment registers mov eax, [si + ds] ; base-indexed mov eax, [si – 4] ; base with displacement mov eax, [si + ds – 6] ; base-indexed with displacement we will not use these modes We typically write an immediate datum using #value as in add ax, #10 to differentiate it from a direct address such as mov eax, (move the contents of memory location into eax). But in IBM PC assembly, immediate data are denoted by the numeric value without a # and direct addresses are denoted by the variable name. Notice for Direct + Offset, if you use +offset, then this is a constant and you cannot change it. We will find it far more flexible to store an offset in a register so that we can adjust the register. For such a mode, we use [register] as in a[ebx] Note: if the datum is an int (4 bytes), then a is an array of int values and we would move the datum into an extended register. However, the offset does not have to be a 4 byte value. The following would work: mov eax, a[bx] // a is an int array, bx is a 16 bit offset mov ax, a[ebx] // a is a short array, ebx is a 32 bit offset For convenience though, use all extended registers (int arrays, 32 bit offsets) We will not use the register indirect modes

Addressing Examples Imagine that we have declared in C:
int a[ ] = {0, 11, 15, 21, 99}; Then, the following accesses give us the values of a as shown: mov eax, a eax  0 mov eax, a eax  11 mov eax, a+8 eax  15 mov eax, a[ebx] eax  99 if ebx = 16 If ebx and ecx both = 0 and size is the number of items in the array, then we can iterate through the array as follows: Hopefully the above example code is self explanatory. We could also use the loop command to make the code a little shorter: mov ecx, n mov ebx, 0 top: mov eax, a[ebx] // do something with the array element add ebx, 4 loop top top: mov eax, a[ebx] … do something with the array value … add ebx, 4 add ecx, 1 cmp ecx, size jl top // use jl since we stop once ecx == size

Writing Assembly in a C Program
For simplicity, we will write our code inside of C (or C++) programs This allows us to declare variables in C/C++ thus avoiding the .data section do I/O in C/C++ thus avoiding difficulties dealing with assembly input and assembly output compile our programs rather than dealing with assembling them using MASM or TASM To include assembly code, in your C/C++ program, add the following compiler directive _ _ asm { } And place all of your assembly code between the { } NOTE: the two underscores before asm do not have a space between them. The space in the text above is there so that you can see there are 2 consecutive _ characters. You can only use the __asm{ } directive if you are running in Windows. I believe this does not work if you are running in Linux.

Data Types One problem that might arise in using C/C++ to run our assembly code is that we might mix up data types if you declare a variable to be of type int, then this is a 4-byte variable moving it into a register means that you must move it into a 4-byte register (such as eax) and not a 2-byte or 1-byte register! if you try to move a variable into the wrong sized register, or a register value into the wrong sized variable, you will get a “operand size conflict” syntax error message when compiling your program to use ax, bx, cx, dx, declare variables to be of type short to use eax, ebx, ecx, edx, declare variables to be of type int also notice that char are 1 byte, so should use either the upper or lower half a register (al, ah, dl, dh)

Introduction to Assembly

Similar presentations

Presentation on theme: "Introduction to Assembly"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Assembly

Similar presentations

Presentation on theme: "Introduction to Assembly"— Presentation transcript:

Similar presentations

About project

Feedback