Machine-Level Programming IV: Machine Data 9th Lecture Instructor: David Trammell Slides modified from those provided by textbook authors Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective (3rd Edition)
Today Arrays Structures Floating Point One-dimensional Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Floating Point
Machine Memory is a Byte Array How do we allocate arrays of multi-byte types? Given array of data type T and length L We contiguously allocate a region of L * sizeof(T) bytes char s[12]; x x + 12 int arr[5]; x x + 4 x + 8 x + 12 x + 16 x + 20 double data[3]; x + 24 x x + 8 x + 16 char *cpa[3]; x x + 8 x + 16 x + 24
Array Access in Byte Addressable Memory Basic Principle: T A[L] Array of data type T and length L Identifier A can be used as a pointer of type (T *) to array element 0 Reference Type Value arr[4] int 3 //element 4 arr int * x //pointer to arr arr+1 int * x + 4 //pointer to arr[1] &arr[2] int * x + 8 //pointer to arr[2] arr[5] int (garbage) //(out of bounds) *(arr+1) int 5 //the value at M[x+4] &arr[i] int * ? arr + i int * ? int arr[5]; 1 5 2 3 x x + 4 x + 8 x + 12 x + 16 x + 20
Array Access in Byte Addressable Memory Basic Principle: T A[L] Array of data type T and length L (let type T have a size of K bytes) Identifier A can be used as a pointer of type (T *) to array element 0 Reference Type Value arr[4] int 3 //element 4 arr int * x //pointer to arr arr+1 int * x + 4 //pointer to arr[1] &arr[2] int * x + 8 //pointer to arr[2] arr[5] int (garbage) //(out of bounds) *(arr+1) int 5 //the value at M[x+4] &arr[i] int * x + 4i //the value at M[x+4i] arr + i int * x + K*i // int arr[5]; 1 5 2 3 x x + 4 x + 8 x + 12 x + 16 x + 20
Array Example Array elements are guaranteed to be contiguous in memory #define M 5 int a[M] = { 1, 5, 2, 1, 3 }; int b[M] = { 0, 2, 1, 3, 9 }; int c[M] = { 9, 4, 7, 2, 0 }; int a[] 1 5 2 3 16 20 24 28 32 36 int b[] 2 1 3 9 36 40 44 48 52 56 int c[] 9 4 7 2 56 60 64 68 72 76 Array elements are guaranteed to be contiguous in memory Separately declared arrays a, b and c might NOT be contiguous (although they are in this example)
Array Accessing Example int a[] 1 5 2 3 16 20 24 28 32 36 int get_digit (int *z, int digit) { return z[digit]; } Register %rdi contains starting address of array Register %rsi contains array index Desired digit at %rdi + 4*%rsi Use memory reference (%rdi,%rsi,4) X86-64 # %rdi = z (1st arg) # %rsi = digit (2nd arg) movl (%rdi,%rsi,4),%eax
Array Loop Example void inc_z(int *z) { size_t i; for (i = 0; i < 5; i++) z[i]++; } # %rdi = z (1st arg) movl $0, %eax # i = 0 jmp .L3 # goto .L3 .L4: # .L4 (loop start) addl $1, (%rdi,%rax,4) # z[i]++ addq $1, %rax # i++ .L3: # .L3 (loop test) cmpq $4, %rax # cmp i to 4 jbe .L4 # if i <= 4 goto .L4 rep; ret
Multidimensional (Nested) Arrays Declaration: T A[R][C]; 2D array of data type T R rows, C columns Type T element requires K bytes Array Size is: R * C * K bytes Arrangement in memory Row-Major Ordering (that is, all of row 0, then all of row 1 … etc.) int A[R][C]; Board: show 3D example: a[2][3][2] to illustrate the idea of row major as enumerating indices from right to left … A[0][0] A[0][C-1] A[1][0] A[1][C-1] A[R-1][0] A[R-1][C-1] 4*R*C Bytes
Nested Array Example “Row-Major” ordering in memory int MAT[4][5] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; 1 5 2 6 1 5 2 3 1 5 2 7 1 5 2 MAT[4][5] 76 96 116 136 156 MAT[] is an array of 4 elements (contiguously allocated) Each element in MAT[] is an array of 5 ints (allocated contiguously) “Row-Major” ordering in memory
Nested Array Row Access Address to the start of Each Row If A[i] is an array of C elements … Starting address A + i * (C * K) 1 5 2 6 1 5 2 3 1 5 2 7 1 5 2 76 96 116 136 156
Nested Array, Row Access: MAT[4][5] 1 5 2 6 3 7 int *get_row(int index, int MAT[][]){ return MAT[index]; } # %rdi = index # %rsi = MAT //pointer to start start of matrix leaq (%rdi,%rdi,4),%rax #index + index*4 == 5*index leaq (%rsi,%rax,4),%rax # MAT + (20 * index) Row Vector MAT[index] is an array of 5 int’s Starting address: MAT +(20*index) Machine Code 1st leaq is for math, 2nd computes an address. How can we tell? Address is: MAT + 4*(5*index) == MAT + 20*index
Nested Array Element Access Array Elements A[i][j] is element of type T, which uses K bytes &A[i][j] is: A + i * (C * K) + j * K == A + K*( C*i + j )
Nested Array Element Access Array Elements A[i][j] is element of type T, which uses K bytes &A[i][j] is: A + i * (C * K) + j * K == A + K*( C*i + j ) MAT 1 5 2 6 3 7 int get_elem(int index, int digit, int MAT[][]){ return MAT[index][digit]; } # %rdi = index # %rsi = digit # %rdx = MAT //pointer to start start of matrix leaq (%rdi,%rdi,4), %rax # rax = 5*index addl %rax, %rsi # rsi = 5*index + digit movl (%rdx,%rsi,4), %eax # Mem[MAT+4*(5*index+digit)]
Multi-Level Array Example Variable ap denotes array of 3 elements Each element is a pointer 8 bytes Each pointer points to array of int’s int a[5] = { 1, 5, 2, 1, 3 }; int b[5] = { 0, 2, 1, 3, 9 }; int c[5] = { 9, 4, 7, 2, 0 }; int *ap[3] = {b, a, c}; //pointers! 36 160 16 56 168 176 ap a[] b[] c[] 1 5 2 3 20 24 28 32 9 40 44 48 52 4 7 60 64 68 72 76
Element Access in Multi-Level Array int get_ap_digit (size_t index, size_t digit, int ap[][]) { return ap[index][digit]; } # salq $2, %rsi # 4*digit addq (%rdx,%rdi,8),%rsi # rsi = ap[index] + 4*digit movl (%rsi), %eax # return *p ret Computation Element access Mem[Mem[ap+8*index]+4*digit] Must do two memory reads First get pointer to row array Then access element within array
Array Element Accesses Nested array Multi-level array int get_MAT_digit (size_t index, size_t digit, int MAT[][]) { return MAT[index][digit]; } int get_ap_digit (size_t index, size_t digit, int *ap[]) { return ap[index][digit]; } Accesses looks similar in C, but address computations very different: Mem[MAT+20*index+4*digit] Mem[Mem[ap+8*index]+4*digit]
Next Arrays Structures Unions Floating Point One-dimensional Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Unions Floating Point
Structure Representation next 16 24 32 struct rec { int a[4]; size_t i; struct rec *next; }; a Structure represented as block of memory Big enough to hold all of the fields Fields ordered according to declaration Even if another ordering could yield a more compact representation Compiler determines overall size + positions of fields
Generating Pointer to Structure Member next 16 24 32 r+4*idx struct rec { int a[4]; size_t i; struct rec *next; }; a int *get_ap (struct rec *r, size_t idx) { return &r->a[idx]; } Generating Pointer to Array Element Offset of each structure member determined at compile time Compute as r + 4*idx # r in %rdi # idx in %rsi leaq (%rdi,%rsi,4), %rax ret
Structures & Alignment Unaligned Data Aligned Data Primitive data type requires K bytes Address must be multiple of K struct S1 { char c; int i[2]; double v; } *p; c i[0] i[1] v p p+1 p+5 p+9 p+17 c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8
Alignment Principles Aligned Data Why Align? If a primitive data type requires K bytes Then address must be multiple of K This is required on some machines X86-64 doesn’t require it, but it is advised Why Align? Memory is accessed in chunks for efficiency Chunks are aligned on 4 or 8 byte boundaries Inefficient to access datum that spans boundaries Compiler puts gaps in structure to ensure alignment
Specific Cases of Alignment (x86-64) 1 byte: char, … no restrictions on address 2 bytes: short, … lowest 1 bit of address must be 02 4 bytes: int, float, … lowest 2 bits of address must be 002 8 bytes: double, long, pointers, … lowest 3 bits of address must be 0002 16 bytes: long double (GCC on Linux) lowest 4 bits of address must be 00002 (or lowest hex digit since a hex digit is 4 bits)
Satisfying Alignment with Structures Within structure: Must satisfy each element’s alignment requirement Overall structure placement Each structure has alignment requirement K K = Largest alignment of any element Initial address & structure length must be multiples of K Example: K = 8, due to double element struct S1 { char c; int i[2]; double v; } *p; c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8
Meeting Overall Alignment Requirement For largest alignment requirement K Overall structure must be multiple of K struct S2 { double v; int i[2]; char c; } *p; v i[0] i[1] c 7 bytes p+0 p+8 p+16 p+24 Multiple of K=8
Arrays of Structures Overall structure length multiple of K struct S2 { double v; int i[2]; char c; } a[10]; Overall structure length multiple of K Satisfy alignment requirement for every element a[0] a[1] a[2] • • • a+0 a+24 a+48 a+72 v i[0] i[1] c 7 bytes a+24 a+32 a+40 a+48
Saving Space in a Structure Always put largest data types first Effect (K=4) struct S4 { char c; int i; char d; } *p; struct S5 { int i; char c; char d; } *p; c 3 bytes i d 3 bytes i c d 2 bytes
Union Allocation Allocate according to largest element union U1 { char c; int i[2]; double v; } *up; c i[0] i[1] v up+0 up+4 up+8 struct S1 { char c; int i[2]; double v; } *sp; c 3 bytes i[0] i[1] 4 bytes v sp+0 sp+4 sp+8 sp+16 sp+24
Using Union to Determine Endianness typedef union { char c; int i; } char_int_u; c 3 bytes i 0 4 int isLittleEndian() { charint_u test; test.i = 1; return test.c; }
Using Union to Access Bit Patterns union uf_union { unsigned u; float f; }; u f 4 union uf_union uf; unsigned s, exp, frac; uf.f = -1.5; s = uf.u >> 31; exp = (uf.u >> 23) & 0xFF; frac = uf.u & 0x007FFFFF; printf("Float %f\n", uf.f); printf("sign: 0x%X\n", s); printf("exp: 0x%X\n", exp); printf("frac: 0x%X\n", frac); Not the same as casting (unsigned)uf.f !
Byte Ordering Example x86 x86_64 union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } example; x86 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] x86_64 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0]
Byte Ordering on IA32 Little Endian Output on IA32 (x86_32): f0 f1 f2 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] LSB MSB LSB MSB Print Output on IA32 (x86_32): Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf3f2f1f0]
Byte Ordering on Sun SPARC Big Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] MSB LSB MSB LSB Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7] Long 0 == [0xf0f1f2f3]
Byte Ordering on x86-64 Little Endian Output on x86-64: f0 f1 f2 f3 f4 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] LSB MSB Print Output on x86-64: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf7f6f5f4f3f2f1f0]
Last … Arrays Structures Floating Point in x86-64 One-dimensional Multi-dimensional (nested) Multi-level Structures Allocation Access Alignment Floating Point in x86-64
XMM Registers: 16 registers (each holding 16 bytes) Sixteen 8-bit integers Eight 16-bit integers Four 32-bit integers Four single-precision floats Two double-precision floats One single-precision float One double-precision float
Scalar or SIMD (single instructiom multiple data) + %xmm0 %xmm1 addss %xmm0,%xmm1 Scalar Operations: Single Precision SIMD Operations: Single Precision Scalar Operations: Double Precision + %xmm0 %xmm1 addps %xmm0,%xmm1 + %xmm0 %xmm1 addsd %xmm0,%xmm1
FP Basics Arguments passed in %xmm0, %xmm1, ... Result returned in %xmm0 All XMM registers caller-saved (no callee saved XMM regs) addss means add scalar single-precision addsd means add scalar double-precision float fadd(float x, float y) { return x + y; } double dadd(double x, double y) { return x + y; } # x in %xmm0, y in %xmm1 addss %xmm1, %xmm0 ret # x in %xmm0, y in %xmm1 addsd %xmm1, %xmm0 ret
FP Memory Referencing Integer (and pointer) arguments passed in regular registers FP values passed in XMM registers Different mov instructions to move between XMM registers, and between memory and XMM registers double dincr(double *p, double v) { double x = *p; *p = x + v; return x; } # p in %rdi, v in %xmm0 movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret
Other Aspects of FP Code Lots of instructions Floating-point comparisons Instructions ucomiss and ucomisd Set condition codes CF, ZF, and PF (parity flag) FP constants: stored in memory like a string Set XMM reg to 0.0 with xorpd %xmm0,%xmm0 addsd .LC0(%rip), %xmm0 addsd .LC1(%rip), %xmm0 … .LC0: .long 2920577761 .long 1079533895 .LC1: .long 1202590843 .long 1079597793
Summary Arrays Structures Floating Point Elements packed into contiguous region of memory Use index arithmetic to locate individual elements Structures Elements packed into single region of memory Access using offsets determined by compiler Padding may be required to to ensure alignment Floating Point Data held and operated on in XMM registers Similar to Integer machine code, but more complex instructions
Reading Read: 3.8 and 3.9 (pages 255 ) Skip or Skim 3.10 Skim 3.11 floating point This Thursday: mid-term exam review Mid-Term is Tuesday March 6th