Presentation is loading. Please wait.

Presentation is loading. Please wait.

With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float.

Similar presentations


Presentation on theme: "With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float."— Presentation transcript:

1 with a focus on floating point

2  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float  real8  double precision; IEEE standard; analogous to double  real10  double extended precision  Not IEEE standard  NaN = Not a Number (see p. 4-14 of v1)

3  SSE2 supports 32 and 64 bit f.p. data  x87 supports 32, 64, and 80 bit f.p. data

4

5 Note: These are 24-bit binary numbers. Here they are in base 10: 2.00000000000000 1.99999988079071

6

7  SSE2 = Streaming SIMD Extensions 2  SIMD = Single Instruction Multiple Data instructions  SSE2 introduced in 2000 on Pentium 4 and Intel Xeon processors.

8  1996Intel MMX  1998AMD 3DNow!  1999Intel SSE on P3  2001Intel SSE2 on P4  2003Intel SSE3 (since Prescott P4)  2006Intel S upplemental SSE3 (since Woodcrest Xeons)  2006Intel SSE4 (4.1 and 4.2)  2007AMD SSE5 (proposed 2007, implemented 2011)  2008Intel AVX (proposed 2008, implemented 2011 in Intel Westmere and AMD Bulldozer)  XMM registers go from 128 bit to 256 bit, called YMM.

9 1. You must use MASM v6.15 or newer for SIMD support. (MASM v6.15 is available from the course software web page.) 2. You must enable MASM support for these instructions with the following:.686;instructions for Pentium Pro (or better).xmm;allow simd instructions.modelflat, stdcall;no crazy segments!

10  Each one of the 8 128-bit registers (xmm0...xmm7) can hold:  16 packed 1 byte integers  8 packed word (2 byte) integers  4 packed doubleword (4 byte) integers  2 packed quadword (8 byte) integers  1 double quadword (16 byte)  4 packed single precision (4 bytes each) floating point values  2 packed double precision (8 bytes each) floating point values

11

12

13

14

15 IA32 Registers:  8 32-bit GPRs  Integer only  8 80-bit fp regs  Floating point only  8 64-bit mmx regs  Integer only  Re-uses fp regs  8 128-bit xmm regs  Integer and fp

16 IA32 Registers:  8 32-bit GPRs  Integer only  8 80-bit fp regs  Floating point only  8 64-bit mmx regs  Integer only  Re-uses fp regs  8 128-bit xmm regs  Integer and fp

17 IA32 Registers:  8 32-bit GPRs  Integer only  8 80-bit fp regs  Floating point only  8 64-bit mmx regs  Integer only  Re-uses fp regs  8 128-bit xmm regs  Integer and fp

18 IA32 Registers:  8 32-bit GPRs  Integer only  8 80-bit fp regs  Floating point only  8 64-bit mmx regs  Integer only  Re-uses fp regs  8 128-bit xmm regs  Integer and fp  These will be the focus of our discussion.

19

20 XMM register formats

21  The utilities.asm MASM code (on the course’s software web page) contains a function that you can call to display the contents of the 8 xmm registers (dump) as pairs of 64 bit double precision fp values. call dumpXmm64

22

23 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion

24 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion

25  movhpd  Move High Packed Double-Precision Floating-Point Value  movlpd  Move Low Packed Double-Precision Floating-Point Value  movsd  Move Scalar Double-Precision Floating-Point Value

26  movhpd - Move High Packed Double-Precision Floating-Point Value  for memory to XMM move:  DEST[127-64] ← SRC;DEST[63-0] unchanged  Ex.movhpdxmm0, m64  for XMM to memory move:  DEST ← SRC[127-64]  Ex.movhpdm64, xmm2

27  movlpd - Move Low Packed Double-Precision Floating-Point Value  for memory to XMM move:  DEST[127-64] unchanged;DEST[63-0] ← SRC  Ex.movlpdxmm1, m64  for XMM to memory move:  DEST ← SRC[63-0]  Ex.movlpdm64, xmm2

28  movsd - Move Scalar Double-Precision Floating-Point Value 1. when source and destination operands are both XMM registers:  DEST[127-64] remains unchanged;DEST[63-0] ← SRC[63-0]  Ex.movsdxmm1, xmm3 2. when source operand is XMM register and destination operand is memory location:  DEST ← SRC[63-0]  Ex.movsdm64, xmm2 3. when source operand is memory location and destination operand is XMM register:  DEST[127-64] ← 0000000000000000H;DEST[63-0] ← SRC  Ex.movsdxmm1, m64

29 1. Data movement 2. Arithmetic (scalar) 3. Comparison 4. Conversion

30  addsd - Add Scalar Double-Precision Floating- Point Values  subsd - Subtract Scalar Double-Precision Floating- Point Values  mulsd - Multiply Scalar Double-Precision Floating- Point Values  divsd - Divide Scalar Double-Precision Floating- Point Values  Also sqrtsd but no sin or cos SSE2 instructions! We have to use the x87 instructions for that!

31  addsd  DEST[63-0] ← DEST[63-0] + SRC[63-0]  DEST[127-64] remains unchanged

32  subsd  DEST[63-0] ← DEST[63-0] − SRC[63-0]  DEST[127-64] remains unchanged

33  mulsd  DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0]  DEST[127-64] remains unchanged

34  divsd  DEST[63-0] ← DEST[63-0] / SRC[63-0]  DEST[127-64] remains unchanged

35 1. Data movement 2. Arithmetic (packed) 3. Comparison 4. Conversion

36  addpd - Add Packed Double-Precision Floating-Point Values  subpd - Subtract Packed Double-Precision Floating-Point Values  mulpd - Multiply Packed Double-Precision Floating-Point Values  divpd - Divide Packed Double-Precision Floating-Point Values

37  addpd - Add Packed Double-Precision Floating-Point Values  DEST[63-0] ← DEST[63-0] + SRC[63-0]  DEST[127-64] ← DEST[127-64] + SRC[127-64]

38  subpd - Subtract Packed Double-Precision Floating-Point Values  DEST[63-0] ← DEST[63-0] / (SRC[63-0])  DEST[127-64] ← DEST[127-64] / (SRC[127-64])

39  mulpd - Multiply Packed Double-Precision Floating-Point Values  DEST[63-0] ← DEST[63-0] / (SRC[63-0])  DEST[127-64] ← DEST[127-64] / (SRC[127-64])

40  divpd - Divide Packed Double-Precision Floating-Point Values  DEST[63-0] ← DEST[63-0] / (SRC[63-0])  DEST[127-64] ← DEST[127-64] / (SRC[127-64])

41 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion

42  comisd  Compare Scalar Ordered Double-Precision Floating- Point Values and Set EFLAGS

43 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion

44  cvtsd2si  Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer  cvtsi2sd  Convert Doubleword Integer to Scalar Double- Precision Floating-Point Value

45  cvtsd2si  Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer  DEST[31-0] ← Convert_Double_Precision_Floating_Point_To_Integ er(SRC[63-0])

46  cvtsi2sd  Convert Doubleword Integer to Scalar Double- Precision Floating-Point Value  DEST[63-0] ← Convert_Integer_To_Double_Precision_Floating_Poi nt(SRC[31-0])  DEST[127-64] remains unchanged

47


Download ppt "With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float."

Similar presentations


Ads by Google