Computer Architecture and System Programming Laboratory

Computer Architecture and System Programming Laboratory
TA Session 7 x87 FPU

x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities
floating-point, integer, and packed BCD integer data types floating-point processing algorithms exception handling IEEE Standard 754

x87 FPU represents a separate execution environment, consists of 8 data registers and the following special-purpose registers Value loaded from memory into x87 FPU data register is automatically converted into double extended-precision floating-point format

x87 FPU instructions treat the eight x87 FPU data registers as a register stack
The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top-of-stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one

16-bit x87 FPU status register indicates the current state of the x87 FPU
16-bit tag word indicates the contents of each the 8 registers in the x87 FPU data-register stack (one 2-bit tag per register). Each tag in tag word corresponds to a physical register. TOP pointer is used to associate tags with registers relative to ST(0).

tui reg float var1: dt 5.6 var2: dt 2.4 var3: dt 3.8 var4: dt 10.3
fld tword [var1] ; st0 = 5.6, TOP=4 fmul tword [var2] ; st0=st0*2.4=13.44, TOP=4 fld tword [var3] ; st0=3.8, st1=13.44, TOP=3 fmul tword [var4] ; st0=st0*10.3=39.14, st1=13.44, TOP=3 fadd st1 ; st0=st0+st1, st1=13.44, TOP=3 gdb command to see stack data registers: tui reg float

x87 FPU recognizes and operates on the following seven data types:
single-precision floating point, double-precision floating point, double extended-precision floating point, signed word integer, signed doubleword integer, signed quadword integer, and packed BCD decimal integers.

IEEE 754 standard integer number in memory Example: mov tword [n], 9
RAM integer number in memory Example: mov tword [n], 9 fild tword [n] … 1 sign bit = 0 exponent = 11 significand = 1.001 … 1 float-point number in x87 data registers stack

push commonly used constants onto st0
FPU INSTRUCTION SET x87 FPU instruction set fall into ESC instructions. They have a common opcode format, where the first byte of the opcode is one of the numbers from D8H through DFH. push commonly used constants onto st0

Basic Arithmetic Instructions
Example of reverse instruction: Operands in memory can be in single-precision floating-point, double-precision floating-point, word-integer, or doubleword-integer format. They are converted to double extended-precision floating-point format automatically. The pop versions of instructions offer the option of popping the x87 FPU register stack following the arithmetic operation. These instructions operate on values in the ST(i) and ST(0) registers, store the result in the ST(i) register, and pop the ST(0) register.

Stack overflow and underflow exceptions
Control Instructions FINIT/FNINIT instructions initialize the x87 FPU and its internal registers to default values. Stack overflow and underflow exceptions Stack overflow — an instruction attempts to load a non-empty x87 FPU register from memory. A non-empty register is defined as a register containing a zero (tag value of 01), a valid value (tag value of 00), or a special value (tag value of 10). Stack underflow — an instruction references an empty x87 FPU register as a source operand, including attempting to write the contents of an empty register to memory. An empty register has a tag value of 11.

Magic square For the 3 x 3 magic square, each row, each column and both diagonals would sum to 3 • (3² + 1) ÷ 2 = 15 1) '1' goes in the middle of the top row 2) All numbers are then placed one column to the right and one row up from the previous number. 3) Whenever the next number placement is above the top row, stay in that column and place the number in the bottom row. 4) Whenever the next number placement is outside of the rightmost column, stay in that row and place the number in the leftmost column. 5) When encountering a filled-in square, place the next number directly below the previous number. 6) When the next number position is outside both a row and a column, place the number directly beneath the previous number.

section .data fs_usage: db "Call with single, positive, odd number", 10, 0 fs_malloc_failed: db "A call to malloc() failed", 10, 0 fs_long: db "%*ld", 0 fs_newline: db 10, 0 section .bss argv: resq 1 n: resq 1 n2: resq 1 a: resq 1 b: resq 1 table: resq 1 width: resq 1

extern printf, atoi, calloc
global main section .text main: enter 0, 0 finit ; FINIT instruction initialize the x87 FPU and its internal registers to default values. The x87 FPU tag word is set to FFFFH, which marks all the x87 FPU data registers as empty. mov qword [argv], rsi cmp rdi, 2 ; argc jne .error mov rdi, qword [argv] mov rdi, qword [rdi + 8*1] ; argv[1] call atoi cmp rax, 2 jle .error test rax, 1 jz .error ; test rax, 1 tests whether the number is odd. The equivalent would be to do and rax, 1, but this would change rax.

… … … … mov qword [n], rax mov rdi, rax mov rsi, 8 call calloc
mov rdi, rax mov rsi, 8 call calloc cmp rax, 0 je .malloc_failed mov qword [table], rax mov rdx, rax mov rax, 0 mov rbx, qword [n] .allocate_table: cmp rax, rbx ; check if reach end of table je .fill_table ; if yes, finish allocation and start filling the table mov rdi, rbx mov rsi, 8 ; gdb changes this line to be “mov esi, 8” push rax push rbx. push rdx call calloc ; allocate a single row of the table pop rdx mov qword [rdx], rax pop rbx pop rax add rdx, 8 add rax, 1 jmp .allocate_table … … … …

mov rdi, qword [table] ; rdi = pointer to table
.fill_table_loop: cmp r8, r10 ; i == n^2 jg .fill_table_done mov rdi, qword [table] ; rdi = pointer to table mov rdi, qword [rdi + 8 * rbx] ; rdi = pointer to row[rbx] of the table (row 0, then row 1, and then row 2) mov qword [rdi + 8 * rcx], r8 inc r8 ; r8 = 1,2,3,... lea rax, [rbx + r9 - 1] cdq div r9 mov rbx, rdx ` lea rax, [rcx + 1] mov rcx, rdx mov rdi, qword [table] mov rdi, qword [rdi + 8 * rbx] cmp qword [rdi + 8 * rcx], 0 je .fill_table_loop lea rax, [rbx + 2] lea rax, [rcx + r9 - 1] jmp .fill_table_loop .fill_table: mov rbx, 0 ; a = 0 mov r9, qword [n] ; n mov rcx, r9 shr rcx, 1 ; b = n / 2 mov r8, 1 ; i mov rax, r9 cdq mul rax mov r10, rax ; n^2

fmulp ; Multiply floating point and pop ST(0) from the register stack
fild qword [n] ; FILD (load integer) instruction converts an integer operand in memory into double extended-precision floating-point format and pushes the value onto the top of the register stack. fld st0 ; FLD (load floating point) instruction pushes a floating-point operand from memory onto the top of the x87 FPU data-register stack. fmulp ; Multiply floating point and pop ST(0) from the register stack fxtract ; Extract exponent and significand - put significand in ST(0), and exponent in ST(1) (in binary basic 2) fld1 ; Load +1.0 into ST(0) fxch ; If no source operand is specified, the contents of ST(0) and ST(1) are exchanged fyl2x ; FYL2X instruction computes (y * log2x) ; Replace ST(1) with (ST(1) ∗ log2ST(0)) and pop the register stack. faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. fldl2t ; Push log210 onto the FPU register stack. fdivp ; Divide ST(1) by ST(0), store result in ST(1), and pop the register stack. ; Indeed we would like to calculate log10x, and not log2x

jmp .continue_voodoo .voodoo: dq 1.5 ; add 1.5 to st0, and store at the label width the closest integer to st0 (i.e., rounding it), and pop off the stack .continue_voodoo: fld qword [.voodoo] faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. ; ST(0)+1.5 fistp qword [width] ; Store ST(0) in m64int and pop register stack. ; Indeed, this rounds the value of ‘width’ because it converts it to integer value

;;; PRINT THE MAGIC SQUARE
mov rbx, 0 .outer_loop: cmp rbx, qword [n] je .end mov rcx, 0 .inner_loop: cmp rcx, qword [n] je .end_inner_loop mov rdi, fs_long mov rsi, qword [width] mov rdx, qword [table] mov rdx, qword [rdx + 8 * rbx] mov rdx, qword [rdx + 8 * rcx] mov rax, 0 push rbx push rcx call printf pop rcx pop rbx inc rcx jmp .inner_loop .end_inner_loop: mov rdi, fs_newline inc rbx jmp .outer_loop error: mov rdi, fs_usage mov rax, 0 call printf jmp .end .malloc_failed: mov rdi, fs_malloc_failed .end: leave ret

Computer Architecture and System Programming Laboratory

Similar presentations

Presentation on theme: "Computer Architecture and System Programming Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Architecture and System Programming Laboratory

Similar presentations

Presentation on theme: "Computer Architecture and System Programming Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback