Machine-Level Programming IV: Structured Data

Slides:



Advertisements
Similar presentations
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
Advertisements

Machine-Level Programming IV: Structured Data Feb 5, 2004 Topics Arrays Structs Unions class08.ppt “The course that gives CMU its Zip!”
University of Washington Data Structures! Arrays  One-dimensional  Multi-dimensional (nested)  Multi-level Structs  Alignment Unions 1.
Structured Data: Sept. 20, 2001 Topics Arrays Structs Unions class08.ppt “The course that gives CMU its Zip!”
Machine-Level Programming IV: Structured Data Apr. 23, 2008 Topics Arrays Structures Unions EECS213.
Machine-Level Programming IV: Structured Data Topics Arrays Structures Unions CS213.
Structured Data I: Homogenous Data Sept. 17, 1998 Topics Arrays –Single –Nested Pointers –Multilevel Arrays Optimized Array Code class08.ppt “The.
Structured Data II Heterogenous Data Sept. 22, 1998 Topics Structure Allocation Alignment Operating on Byte Strings Unions Byte Ordering Alpha Memory Organization.
Data Representation and Alignment Topics Simple static allocation and alignment of basic types and data structures Policies Mechanisms.
1 1 Machine-Level Programming IV: x86-64 Procedures, Data Andrew Case Slides adapted from Jinyang Li, Randy Bryant & Dave O’Hallaron.
Machine-Level Programming 6 Structured Data Topics Structs Unions.
Ithaca College 1 Machine-Level Programming VIII: Advanced Topics: alignment & Unions Comp 21000: Introduction to Computer Systems & Assembly Lang Systems.
Machine-Level Programming IV: Structured Data Topics Arrays Structs Unions CS 105 “Tour of the Black Holes of Computing”
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Machine-Level Programming IV: Data : Introduction.
Instructors: Greg Ganger, Greg Kesden, and Dave O’Hallaron
University of Amsterdam Computer Systems – Data in C Arnoud Visser 1 Computer Systems New to C?
Machine-level Programming IV: Data Structures Topics –Arrays –Structs –Unions.
University of Washington Today Lab 3 out  Buffer overflow! Finish-up data structures 1.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Machine-Level Programming IV: Data Topics: Arrays.
Fabián E. Bustamante, Spring 2007 Machine-Level Prog. IV - Structured Data Today Arrays Structures Unions Next time Arrays.
Machine-Level Programming IV: Structured Data
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Machine-Level Programming IV: Data MCS284: Computer.
Machine-Level Programming IV: Structured Data Topics Arrays Structs Unions CS 105 “Tour of the Black Holes of Computing”
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Machine-Level Programming IV: Data Slides adapted.
Lecture 9 Aggregate Data Topics Arrays Structs Unions again Lab 02 comments Vim, gedit, February 22, 2016 CSCE 212 Computer Architecture.
Instructor: Fatma CORUT ERGİN
Machine-Level Programming V: Miscellaneous Topics
Machine-Level Programming IV: Data
Machine-Level Programming IV: Data
Arrays CSE 351 Spring 2017 Instructor: Ruth Anderson
C Tutorial (part 5) CS220, Spring 2012
Machine-Level Programming IV: Data
Machine-Level Programming IV: Structured Data Sept. 30, 2008
Referencing Examples Code Does Not Do Any Bounds Checking!
Machine-Level Programming IV: x86-64 Procedures, Data
Machine-Level Programming 5 Structured Data
Machine-Level Programming IV: Data CSCE 312
Machine-level Programming IV: Data Structures
Arrays CSE 351 Winter
Feb 2 Announcements Arrays/Structs in C Lab 2? HW2 released
Machine-Level Programming IV: Data September 8, 2008
Machine-Level Programming 6 Structured Data
Machine-Level Programming IV: Structured Data Sept. 19, 2002
Machine-Level Programming IV: Data
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Instructor: Phil Gibbons
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Instructors: Majd Sakr and Khaled Harras
Machine-Level Programming IV: Structured Data
Structured Data II Heterogenous Data Feb. 15, 2000
Machine-Level Programming IV: Data
Machine-Level Programming IV: Data /18-213/14-513/15-513: Introduction to Computer Systems 8th Lecture, September 20, 2018.
Instructors: Majd Sakr and Khaled Harras
Machine Level Representation of Programs (III)
Machine-Level Programming IV: Machine Data 9th Lecture
Machine-Level Programming VIII: Data Comp 21000: Introduction to Computer Systems & Assembly Lang Spring 2017 Systems book chapter 3* * Modified slides.
Arrays CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants:
Machine-Level Programming 5 Structured Data
Machine-Level Programming 5 Structured Data
Machine-Level Representation of Programs (x86-64)
Machine-Level Programming IV: Structured Data
Structured Data I: Homogenous Data Feb. 10, 2000
Instructor: Fatma CORUT ERGİN
Machine-Level Programming VIII: Data Comp 21000: Introduction to Computer Systems & Assembly Lang Spring 2017 Systems book chapter 3* * Modified slides.
Data Spring, 2019 Euiseong Seo
Machine-Level Programming VIII: Data Comp 21000: Introduction to Computer Systems & Assembly Lang Spring 2017 Systems book chapter 3* * Modified slides.
Machine-Level Programming IV: Structured Data
Machine-Level Programming IV: Data
Machine-Level Programming IV: Structured Data
Machine-Level Programming IV: Structured Data
Presentation transcript:

Machine-Level Programming IV: Structured Data CS 105 “Tour of the Black Holes of Computing” Machine-Level Programming IV: Structured Data Topics Arrays Structs Unions

Basic Data Types Integral Floating Point Stored & operated on in general registers Signed vs. unsigned depends on instructions used Intel GAS Bytes C byte b 1 [unsigned] char word w 2 [unsigned] short double word l 4 [unsigned] int quad word q 8 [unsigned] long Floating Point Stored & operated on in floating-point registers Single s 4 float Double l 8 double

Array Allocation Basic Principle T A[L]; Array of data type T and length L Contiguously allocated region of L * sizeof(T) bytes in memory char string[12]; x x + 12 int val[5]; x x + 4 x + 8 x + 12 x + 16 x + 20 double a[3]; x + 24 x x + 8 x + 16 char *p[3]; x x + 8 x + 16 x + 24

Array Access Basic Principle Reference Type Value T A[L]; Array of data type T and length L Identifier A can be used as a pointer to array element 0 Reference Type Value val[4] int 3 val int[5] x (acts like int *) val+1 int * x + 4 &val[2] int * x + 8 val[5] int ?? *(val+1) int 5 val + i int * x + 4 i 1 5 2 3 int val[5]; x x + 4 x + 8 x + 12 x + 16 x + 20

Array Example int cmu[5] = { 1, 5, 2, 1, 3 }; int mit[5] = { 0, 2, 1, 3, 9 }; int hmc[5] = { 9, 1, 7, 1, 1 }; int cmu[5]; 1 5 2 3 16 20 24 28 32 36 int mit[5]; 9 40 44 48 52 56 int hmc[5]; 7 60 64 68 72 76 Note: Example arrays were allocated in successive 20-byte blocks Not guaranteed to happen in general Here, [5] could be omitted because initializer implies size

Array Accessing Example int cmu[5]; 1 5 2 3 16 20 24 28 32 36 As argument, size of z doesn’t need to be specified Register %rdi contains starting address of array Register %rsi contains array index Desired digit at %rdi + 4*%rsi Use memory reference (%rdi,%rsi,4) int get_digit (int z[], int digit) { return z[digit]; } x86-64 # %rdi = z # %rsi = digit movl (%rdi,%rsi,4), %eax # z[digit]

Referencing Examples Code Does Not Do Any Bounds Checking! int cmu[5]; 1 5 2 3 16 20 24 28 32 36 int mit[5]; 9 40 44 48 52 56 int hmc[5]; 7 60 64 68 72 76 Code Does Not Do Any Bounds Checking! Reference Address Value Guaranteed? mit[3] 36 + 4* 3 = 48 3 mit[5] 36 + 4* 5 = 56 9 mit[-1] 36 + 4*-1 = 32 3 cmu[15] 16 + 4*15 = 76 ?? Out-of-range behavior implementation-dependent No guaranteed relative allocation of different arrays Yes No No No

Referencing Examples Code Does Not Do Any Bounds Checking! int cmu[5]; 1 5 2 3 16 20 24 28 32 36 int mit[5]; 9 40 44 48 52 56 int hmc[5]; 7 60 64 68 72 76 Code Does Not Do Any Bounds Checking! Reference Address Value Guaranteed? mit[3] 36 + 4* 3 = 48 3 mit[5] 36 + 4* 5 = 56 9 mit[-1] 36 + 4*-1 = 32 3 cmu[15] 16 + 4*15 = 76 ?? Out-of-range behavior implementation-dependent No guaranteed relative allocation of different arrays

Array Loop Example void zincr(int z[5]) { size_t i; for (i = 0; i < ZLEN; i++) z[i]++; } # %rdi = z movl $0, %eax # i = 0 jmp .L3 # goto middle .L4: # loop: addl $1, (%rdi,%rax,4) # z[i]++ addq $1, %rax # i++ .L3: # middle cmpq $4, %rax # i:4 jbe .L4 # if <=, goto loop rep; ret

Multidimensional (Nested) Arrays Declaration T A[R][C]; 2D array of data type T R rows, C columns Type T element requires K bytes Array Size R * C * K bytes Arrangement Row-Major Ordering A[0][0] A[0][C-1] A[R-1][0] • • • A[R-1][C-1] • Board: show 3D example: a[2][3][2] to illustrate the idea of row major as enumerating indices from right to left int A[R][C]; • • • A [0] [C-1] [1] [R-1] •  •  • 4*R*C Bytes

Nested Array Example #define PCOUNT 4 int pgh[PCOUNT][5] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; 76 96 116 136 156 1 5 2 6 3 7 int pgh[4][5]; Variable pgh: array of 4 elements, allocated contiguously Each element is an array of 5 int’s, allocated contiguously “Row-Major” ordering of all elements in memory

Nested Array Row Access Row Vectors A[i] is array of C elements Each element of type T requires K bytes Starting address A + i * (C * K) int A[R][C]; • • • A [0] [C-1] A[0] • • • A [i] [0] [C-1] A[i] • • • A [R-1] [0] [C-1] A[R-1] •  •  • •  •  • A A+(i*C*4) A+((R-1)*C*4)

Nested Array Element Access Array Elements A[i][j] is element of type T, which requires K bytes Address A + i * (C * K) + j * K = A + (i * C + j) * K int A[R][C]; A [0] [C-1] • • • A[0] A[i] A [R-1] [0] [C-1] • • • A[R-1] •  •  • • • • • • • A [i] [j] •  •  • A A+i*C*4 A+(R-1)*C*4 A+(i*C+j)*4

Strange Referencing Examples 76 96 116 136 156 1 5 2 6 3 7 int pgh[4][5]; Reference Address Value Guaranteed? pgh[3][3] 76+20*3+4*3 = 148 2 pgh[2][5] 76+20*2+4*5 = 136 1 pgh[2][-1] 76+20*2+4*-1 = 112 3 pgh[4][-1] 76+20*4+4*-1 = 152 1 pgh[0][19] 76+20*0+4*19 = 152 1 pgh[0][-1] 76+20*0+4*-1 = 72 ?? Code does not do any bounds checking Ordering of elements within array guaranteed Yes Yes Yes Yes Yes No

Strange Referencing Examples 76 96 116 136 156 1 5 2 6 3 7 int pgh[4][5]; Reference Address Value Guaranteed? pgh[3][3] 76+20*3+4*3 = 148 2 pgh[2][5] 76+20*2+4*5 = 136 1 pgh[2][-1] 76+20*2+4*-1 = 112 3 pgh[4][-1] 76+20*4+4*-1 = 152 1 pgh[0][19] 76+20*0+4*19 = 152 1 pgh[0][-1] 76+20*0+4*-1 = 72 ?? Code does not do any bounds checking Ordering of elements within array guaranteed

Multi-Level Array Example Variable univ denotes array of 3 elements Each element is a pointer 8 bytes Each pointer points to array of int’s int cmu[] = { 1, 5, 2, 1, 3 }; int mit[] = { 0, 2, 1, 3, 9 }; int hmc[] = { 9, 1, 7, 1, 1 }; #define UCOUNT 3 int *univ[UCOUNT] = {mit, cmu, hmc}; 36 160 16 56 168 176 univ cmu 1 5 2 3 20 24 28 32 mit 9 40 44 48 52 hmc 7 60 64 68 72 76

Element Access in Multi-Level Array int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } salq $2, %rsi # 4*digit addq univ(,%rdi,8), %rsi # p = univ[index] + 4*digit movl (%rsi), %eax # return *p ret Computation Element access Mem[Mem[univ+8*index]+4*digit] Must do two memory reads First get pointer to row array Then access element within array

Array Element Accesses Similar C references Nested Array Element at Mem[pgh+20*index+4*dig] Different address computation Multi-Level Array Element at Mem[Mem[univ+4*index]+4*dig] int get_pgh_digit (int index, int dig) { return pgh[index][dig]; } int get_univ_digit (int index, int dig) { return univ[index][dig]; } 36 160 16 56 168 176 univ cmu 1 5 2 3 20 24 28 32 mit 9 40 44 48 52 ucb 4 7 60 64 68 72 76 hmc

Strange Referencing Examples 36 160 16 56 168 176 univ cmu 1 5 2 3 20 24 28 32 mit 9 40 44 48 52 hmc 7 60 64 68 72 76 Reference Address Value Guaranteed? univ[2][3] 56+4*3 = 68 1 univ[1][5] 16+4*5 = 36 0 univ[2][-1] 56+4*-1 = 52 9 univ[3][-1] ?? ?? univ[1][12] 16+4*12 = 64 7 Code does not do any bounds checking Ordering of elements in different arrays not guaranteed Yes No No No No

Strange Referencing Examples 36 160 16 56 168 176 univ cmu 1 5 2 3 20 24 28 32 mit 9 40 44 48 52 hmc 7 60 64 68 72 76 Reference Address Value Guaranteed? univ[2][3] 56+4*3 = 68 1 univ[1][5] 16+4*5 = 36 0 univ[2][-1] 56+4*-1 = 52 9 univ[3][-1] ?? ?? univ[1][12] 16+4*12 = 64 7 Code does not do any bounds checking Ordering of elements in different arrays not guaranteed

N X N Matrix Code Fixed dimensions #define N 16 typedef int fix_matrix[N][N]; /* Get element a[i][j] */ int fix_ele(fix_matrix a, size_t i, size_t j) { return a[i][j]; } Fixed dimensions Know value of N at compile time Variable dimensions, explicit indexing Traditional way to implement dynamic arrays Variable dimensions, implicit indexing Now supported by gcc #define IDX(n, i, j) ((i)*(n)+(j)) /* Get element a[i][j] */ int vec_ele(size_t n, int *a, size_t i, size_t j) { return a[IDX(n,i,j)]; } /* Get element a[i][j] */ int var_ele(size_t n, int a[n][n], size_t i, size_t j) { return a[i][j]; }

16 X 16 Matrix Access Array Elements Address A + i * (C * K) + j * K /* Get element a[i][j] */ int fix_ele(fix_matrix a, size_t i, size_t j) { return a[i][j]; } # a in %rdi, i in %rsi, j in %rdx salq $6, %rsi # 64*i addq %rsi, %rdi # a + 64*i movl (%rdi,%rdx,4), %eax # M[a + 64*i + 4*j] ret

n X n Matrix Access Array Elements Address A + i * (C * K) + j * K C = n, K = 4 Must perform integer multiplication /* Get element a[i][j] */ int var_ele(size_t n, int a[n][n], size_t i, size_t j) { return a[i][j]; } # n in %rdi, a in %rsi, i in %rdx, j in %rcx imulq %rdx, %rdi # n*i leaq (%rsi,%rdi,4), %rax # a + 4*n*i movl (%rax,%rcx,4), %eax # a + 4*n*i + 4*j ret

Structure Representation next 16 24 32 struct rec { int a[4]; size_t i; struct rec *next; }; a Structure represented as block of memory Big enough to hold all of the fields Fields ordered according to declaration Even if another ordering could yield a more compact representation Compiler determines overall size + positions of fields Machine-level program has no understanding of the structures in the source code

Generating Pointer to Structure Member next 16 24 32 r+4*idx struct rec { int a[4]; size_t i; struct rec *next; }; a Generating Pointer to Array Element Offset of each structure member determined at compile time Compute as r + 4*idx int *get_ap (struct rec *r, size_t idx) { return &r->a[idx]; } # r in %rdi, idx in %rsi leaq (%rdi,%rsi,4), %rax ret

Following Linked List C Code i next 16 24 32 a struct rec { int a[3]; int i; struct rec *next; }; Following Linked List C Code Element i r i next 16 24 32 a void set_val (struct rec *r, int val) { while (r != NULL) { int i = r->i; r->a[i] = val; r = r->next; } Register Value %rdi r %rsi val .L11: # loop: movslq 16(%rdi), %rax # i = M[r+16] movl %esi, (%rdi,%rax,4) # M[r+4*i] = val movq 24(%rdi), %rdi # r = M[r+24] testq %rdi, %rdi # Test r jne .L11 # if !=0 goto loop

Alignment Principles Aligned Data Motivation for Aligning Data Primitive data type requires K bytes Address must be multiple of K Required on some machines; advised on x86-64 Motivation for Aligning Data Memory accessed by (aligned) chunks of 4 or 8 bytes (system-dependent) Inefficient to load or store datum that spans quad word boundaries Virtual memory trickier when datum spans 2 pages Compiler Inserts gaps in structure to ensure correct alignment of fields

Structures & Alignment Unaligned Data Aligned Data Primitive data type requires K bytes Address must be multiple of K struct S1 { char c; int i[2]; double v; } *p; c i[0] i[1] v p p+1 p+5 p+9 p+17 c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8

Specific Cases of Alignment (x86-64) 1 byte: char, … no restrictions on address 2 bytes: short, … lowest 1 bit of address must be 02 4 bytes: int, float, … lowest 2 bits of address must be 002 8 bytes: double, long, char *, … lowest 3 bits of address must be 0002 16 bytes: long double (GCC on Linux) lowest 4 bits of address must be 00002

Satisfying Alignment within Structures Must satisfy each element’s alignment requirement Overall structure placement Each structure has alignment requirement K K = Largest alignment of any element Initial address & structure length must be multiples of K Example: K = 8, due to double element struct S1 { char c; int i[2]; double v; } *p; c 3 bytes i[0] i[1] 4 bytes v p+0 p+4 p+8 p+16 p+24 Multiple of 4 Multiple of 8 Multiple of 8 Multiple of 8

Meeting Overall Alignment Requirement For largest alignment requirement K Overall structure must be multiple of K struct S2 { double v; int i[2]; char c; } *p; v i[0] i[1] c 7 bytes p+0 p+8 p+16 p+24 Multiple of K=8

Arrays of Structures Overall structure length multiple of K struct S2 { double v; int i[2]; char c; } a[10]; Overall structure length multiple of K Satisfy alignment requirement for every element a[0] a[1] a[2] • • • a+0 a+24 a+48 a+72 v i[0] i[1] c 7 bytes a+24 a+32 a+40 a+48

Accessing Array Elements Compute array offset 12*idx sizeof(S3), including alignment spacers Element j is at offset 8 within structure Assembler gives offset a+8 Resolved during linking struct S3 { short i; float v; short j; } a[10]; a[0] • • • a[idx] • • • a+0 a+12 a+12*idx i 2 bytes v j a+12*idx a+12*idx+8 short get_j(int idx) { return a[idx].j; } # %rdi = idx leaq (%rdi,%rdi,2),%rax # 3*idx movzwl a+8(,%rax,4),%eax

Saving Space Put large data types first Effect (K=4) c i d i c d struct S4 { char c; int i; char d; } *p; struct S5 { int i; char c; char d; } *p; c 3 bytes i d 3 bytes i c d 2 bytes

Union Allocation Allocate according to largest element Can only use one field at a time union U1 { char c; int i[2]; double v; } *up; c i[0] i[1] v up+0 up+4 up+8 struct S1 { char c; int i[2]; double v; } *sp; c 3 bytes i[0] i[1] 4 bytes v sp+0 sp+4 sp+8 sp+16 sp+24

Using Union to Access Bit Patterns typedef union { float f; unsigned int u; } bit_float_t; u f 4 float bit2float(unsigned u) { bit_float_t arg; arg.u = u; return arg.f; } unsigned float2bit(float f) { bit_float_t arg; arg.f = f; return arg.u; } Same as (float) u ? Same as (unsigned) f ?

Byte Ordering Revisited Idea Short/long/quad words stored in memory as 2/4/8 consecutive bytes Which byte is most (least) significant? Can cause problems when exchanging binary data between machines Big Endian Most significant byte has lowest address Sparc Little Endian Least significant byte has lowest address Intel x86, ARM Android and IOS Bi Endian Can be configured either way ARM

Byte Ordering Example union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw; 32-bit c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] 64-bit c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0]

Byte Ordering Example (Cont). int j; for (j = 0; j < 8; j++) dw.c[j] = 0xf0 + j; printf("Characters 0-7 == [0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n", dw.c[0], dw.c[1], dw.c[2], dw.c[3], dw.c[4], dw.c[5], dw.c[6], dw.c[7]); printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n", dw.s[0], dw.s[1], dw.s[2], dw.s[3]); printf("Ints 0-1 == [0x%x,0x%x]\n", dw.i[0], dw.i[1]); printf("Long 0 == [0x%lx]\n", dw.l[0]);

Byte Ordering on Sun Big Endian Output on Sun: f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] MSB LSB MSB LSB Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7] Long 0 == [0xf0f1f2f3]

Byte Ordering on x86-64 Little Endian Output on x86-64: f0 f1 f2 f3 f4 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] LSB MSB Print Output on x86-64: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf7f6f5f4f3f2f1f0]

Summary of Compound Types in C Arrays Contiguous allocation of memory Aligned to satisfy every element’s alignment requirement Pointer to first element No bounds checking Structures Allocate bytes in order declared Pad in middle and at end to satisfy alignment Unions Overlay declarations Way to circumvent type system