Machine/Assembler Language Control Flow & Compiling Function Calls Noah Mendelsohn Tufts University Web:

Slides:



Advertisements
Similar presentations
University of Washington Procedures and Stacks II The Hardware/Software Interface CSE351 Winter 2013.
Advertisements

Introduction to Machine/Assembler Language Noah Mendelsohn Tufts University Web:
Copyright 2014 – Noah Mendelsohn UM Macro Assembler Functions Noah Mendelsohn Tufts University Web:
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
COMP 2003: Assembly Language and Digital Logic
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 2 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Copyright 2014 – Noah Mendelsohn The Programming the UM Macro Assembler Noah Mendelsohn Tufts University Web:
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
64-Bit Architectures Topics 64-bit data New registers and instructions Calling conventions CS 105 “Tour of the Black Holes of Computing!”
University of Washington Today More on procedures, stack etc. Lab 2 due today!  We hope it was fun! What is a stack?  And how about a stack frame? 1.
Machine/Assembler Language Control Flow & Compiling Function Calls Noah Mendelsohn Tufts University Web:
INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Machine-Level Programming: X86-64 Topics Registers Stack Function Calls Local Storage X86-64.ppt CS 105 Tour of Black Holes of Computing.
Machine-Level Programming III: Procedures Topics IA32 stack discipline Register-saving conventions Creating pointers to local variables CS 105 “Tour of.
Derived from "x86 Assembly Registers and the Stack" by Rodney BeedeRodney Beede x86 Assembly Registers and the Stack Nov 2009.
1 ICS 51 Introductory Computer Organization Fall 2009.
UHD:CS2401: A. Berrached1 The Intel x86 Hardware Organization.
Carnegie Mellon 1 Odds and Ends Intro to x86-64 Memory Layout.
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
Chapter 2 Parts of a Computer System. 2.1 PC Hardware: Memory.
X86 Assembly Language We will be using the nasm assembler (other assemblers: MASM, as, gas)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Carnegie Mellon Instructor: San Skulrattanakulchai Machine-Level Programming.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
1 Assembly Language: Function Calls Jennifer Rexford.
Calling Procedures C calling conventions. Outline Procedures Procedure call mechanism Passing parameters Local variable storage C-Style procedures Recursion.
The Universal Machine (UM) Implementing the UM Noah Mendelsohn Tufts University Web:
Microprocessors CSE- 341 Dr. Jia Uddin Assistant Professor, CSE, BRAC University Dr. Jia Uddin, CSE, BRAC University.
1 Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems March 2016 Systems book chapter 3* * Modified.
Spring 2016Assembly Review Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
Reading Condition Codes (Cont.)
Instruction Set Architecture
Credits and Disclaimers
Machine-Level Programming I: Basics
Credits and Disclaimers
CSCE 212 Computer Architecture
Introduction to Compilers Tim Teitelbaum
The Stack & Procedures CSE 351 Spring 2017
Assembly IA-32.
Homework Reading Continue work on mp1
Low level Programming.
Machine-Level Programming III: Procedures
BIC 10503: COMPUTER ARCHITECTURE
Data Addressing Modes • MOV AX,BX; This instruction transfers the word contents of the source-register(BX) into the destination register(AX). • The source.
Machine-Level Programming III: Procedures /18-213/14-513/15-513: Introduction to Computer Systems 7th Lecture, September 18, 2018.
Carnegie Mellon Machine-Level Programming III: Procedures : Introduction to Computer Systems October 22, 2015 Instructor: Rabi Mahapatra Authors:
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures
Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems Systems book chapter 3* * Modified slides from.
CS 301 Fall 2002 Computer Organization
EECE.3170 Microprocessor Systems Design I
EECE.3170 Microprocessor Systems Design I
Machine Level Representation of Programs (IV)
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Ithaca College Machine-Level Programming VII: Procedures Comp 21000: Introduction to Computer Systems & Assembly Lang Spring 2017.
Computer Architecture CST 250
X86 Assembly Review.
Machine-Level Representation of Programs (x86-64)
Machine-Level Programming II: Basics Comp 21000: Introduction to Computer Organization & Systems Instructor: John Barr * Modified slides from the book.
Machine-Level Programming II: Basics Comp 21000: Introduction to Computer Organization & Systems Spring 2016 Instructor: John Barr * Modified slides.
Machine-Level Programming V: Control: loops Comp 21000: Introduction to Computer Organization & Systems Systems book chapter 3* * Modified slides from.
CS201- Lecture 8 IA32 Flow Control
Ithaca College Machine-Level Programming VII: Procedures Comp 21000: Introduction to Computer Systems & Assembly Lang Spring 2017.
Credits and Disclaimers
The von Neumann Machine
Credits and Disclaimers
Credits and Disclaimers
Computer Architecture and System Programming Laboratory
Computer Architecture and System Programming Laboratory
Low level Programming.
Presentation transcript:

Machine/Assembler Language Control Flow & Compiling Function Calls Noah Mendelsohn Tufts University Web: COMP 40: Machine Structure and Assembly Language Programming – Fall 2015

© 2010 Noah Mendelsohn Goals for today – learn:  Review intro to machine & assembler lang from last time  Accessing registers and memory – addressing modes  Control flow  Function calls 2

© 2010 Noah Mendelsohn 3 Remember We will teach some highlights of machine/assembler code here in class but you must take significant time to learn and practice on your own! Suggestions for teaching yourself: Read Bryant and O’Hallaron carefully Remember that 64 bit section was added later Look for online reference material Some linked from HW5 Write examples, compile with –S, figure out resulting.s file!

© 2010 Noah Mendelsohn Review & A Few Updates

© 2010 Noah Mendelsohn Machine code  Simple instructions – each does small unit of work  Stored in memory  Bitpacked into compact binary form  Directly executed by transistor/hardware logic 5 Operation code (opcode)Operand 1Operand 2??? Opcode is only required field On Intel architecture, instruction length varies

© 2010 Noah Mendelsohn X86-64 / AMD 64 / IA 64 General Purpose Registers %eax %ecx %edx %ebx %esi %edi %esp %ebp %ah $al %ax %ch $cl %cx %dh $dl %dx %bh $bl %bx %si %di %sp %bp %rax %rcx %rdx %rbx %rsi %rdi %rsp %rbp 63 %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r

© 2010 Noah Mendelsohn General Purpose Registers 7 %ah $al %ax%eax %rax 0 63

© 2010 Noah Mendelsohn General Purpose Registers 8 %ah $al %ax%eax %rax 0 63 mov $123,%rax

© 2010 Noah Mendelsohn General Purpose Registers 9 %ah $al %ax%eax %rax 0 63 mov $123,%ax

© 2010 Noah Mendelsohn General Purpose Registers 10 $al %ah %ax%eax %rax 0 63 mov $123,%eax

© 2010 Noah Mendelsohn Classes of AMD 64 registers  General purpose registers –16 registers, 64 bits each –Used to compute integer and pointer values –Used for integer call/return values to functions  XMM registers –16 Registers, 128 bits each –Used to compute float/double values, and for parallel integer computation –Used to pass double/float call/return values  X87 Floating Point registers –8 registers, 80 bits each –Used to compute, pass/return long double 11

© 2010 Noah Mendelsohn Machine code for a simple function 12 int times16(int i) { return i * 16; } Remember: This is what’s really in memory and what the machine executes! 89 f8 c1 e0 04 c3

© 2010 Noah Mendelsohn Machine code for a simple function 13 int times16(int i) { return i * 16; } But what does it mean?? Does it really implement the times16 function? 89 f8 c1 e0 04 c3

© 2010 Noah Mendelsohn Machine code for a simple function 14 int times16(int i) { return i * 16; } 0:89 f8 mov %edi,%eax 2:c1 e0 04 shl $0x4,%eax 5:c3 retq

© 2010 Noah Mendelsohn Machine code for a simple function 15 int times16(int i) { return i * 16; } 0:89 f8 mov %edi,%eax 2:c1 e0 04 shl $0x4,%eax 5:c3 retq Load i into result register %eax

© 2010 Noah Mendelsohn Machine code for a simple function 16 int times16(int i) { return i * 16; } 0:89 f8 mov %edi,%eax 2:c1 e0 04 shl $0x4,%eax 5:c3 retq Shifting left by 4 is quick way to multiply by 16.

© 2010 Noah Mendelsohn Machine code for a simple function 17 int times16(int i) { return i * 16; } 0:89 f8 mov %edi,%eax 2:c1 e0 04 shl $0x4,%eax 5:c3 retq Return to caller, which will look for result in %eax REMEMBER: you can see the assembler code for any C program by running gcc with the –S flag. Do it!!

© 2010 Noah Mendelsohn 18 INTERPRETER Software or hardware that does what the instructions say COMPILER Software that converts a program to another language ASSEMBLER Like a compiler, but the input assembler language is (mostly)1-to-1 with machine instructions

© 2010 Noah Mendelsohn Moving Data, Calculating Values and Working With Pointers

© 2010 Noah Mendelsohn Addressing modes 20 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movq %rax, %rbx

© 2010 Noah Mendelsohn Register-to-register copy 21 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 Assume %rax holds variable x and %rbx holds y, then his implements: long x, y; y = x; movq %rax, %rbx

© 2010 Noah Mendelsohn Controlling the size 22 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 Assume %rax holds variable x and %rbx holds y, then his implements: long x, y; y = x; movq %rax, %rbx movq“quad” length move b  byte w  2 bytes (short) l  long (4 bytes not 8!) q  quad (8 bytes)

© 2010 Noah Mendelsohn Memory-to-register copy 23 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movq (%rax), %rbx Assume %rax holds &x and %rbx holds y, then his implements: long *xp = &x; long y; y = *xp;

© 2010 Noah Mendelsohn Register-to-register copy 24 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movq (%rax), %rbx Assume %rax holds &x and %rbx holds y, then his implements: long *xp = &x; long y; y = *xp; By the way, the source or the target can reference memory…but not both. Copying from memory to memory takes two instructions (into register and back out).

© 2010 Noah Mendelsohn Address arithmetic 25 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movq 0x10(%rax), %rbx struct s { long m1, m2, m3, m4) } *sp long y; y = sp -> m3;

© 2010 Noah Mendelsohn Array addressing 26 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movq 0x4089a0(, %rax, 8,) %rbx long arr[1000]; /* assume at 0x4089a0 */ long i; /* array index in %rax*/ long y; /* move target in %rbx */ y = arr[i]; /* all in one assembler mov! */

© 2010 Noah Mendelsohn Examples 27 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1))

© 2010 Noah Mendelsohn Examples 28 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl Moves the data at the address: similar to: char *ebxp; char edx = ebxp[ecx]

© 2010 Noah Mendelsohn Examples 29 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) leal Moves the address itself: similar to: char *ebxp; char *edxp = &(ebxp[ecx]);

© 2010 Noah Mendelsohn Examples 30 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4)) leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4)) scale factors support indexing larger types similar to: int *ebxp; int edxp = ebxp[ecx];

© 2010 Noah Mendelsohn Examples 31 %rax // contents of rax is data (%rax) // data pointed to by rax 0x10(%rax) // get *(16 + rax) $0x4089a0(, %rax, 8) // Global array index // of 8-byte things (%ebx, %ecx, 2) // Add base and scaled index 4(%ebx, %ecx, 2) // Add base and scaled // index plus offset 4 movl (%ebx, %ecx, 1), %edx // edx <- *(ebx + (ecx * 1)) leal (%ebx, %ecx, 1), %edx // edx <- (ebx + (ecx * 1)) movl (%ebx, %ecx, 4), %edx // edx <- *(ebx + (ecx * 4)) leal (%ebx, %ecx, 4), %edx // edx <- (ebx + (ecx * 4)) scale factors support indexing larger types similar to: int *ebxp; int edxp = &(ebxp[ecx]);

© 2010 Noah Mendelsohn Control Flow

© 2010 Noah Mendelsohn Simple jumps 33.L4: …code here… j.L4 // jump back to L4

© 2010 Noah Mendelsohn Conditional jumps 34.L4: movq(%rdi,%rdx), %rcx leaq(%rax,%rcx), %rsi testq%rcx, %rcx cmovg%rsi, %rax addq$8, %rdx cmpq%r8, %rdx jne.L4 // conditional: jump iff %r8 != %rdx This technique is the key to compiling if statements, for loops, while loops, etc. …code here… Question: How does the jne get the result of the cmp?

© 2010 Noah Mendelsohn Conditional jumps 35.L4: movq(%rdi,%rdx), %rcx leaq(%rax,%rcx), %rsi testq%rcx, %rcx cmovg%rsi, %rax addq$8, %rdx cmpq%r8, %rdx jne.L4 // conditional: jump iff %r8 != %rdx This technique is the key to compiling if statements, for loops, while loops, etc. …code here… Answer: There is a flags register that tracks positive, negative, equal, etc. Question: How does the jne get the result of the cmp?

© 2010 Noah Mendelsohn The flags register tracks comparisions and results 36 Example: Zero flag set if last result is zero or last compare is equal Conditional jumps and moves test the flags

© 2010 Noah Mendelsohn Compare/Set flags Machine code for a simple function 37 void ifelse(int a, int b) { if (a > b) func1(a); else func2(b); } ifelse: subq$8, %rsp cmpl%esi, %edi jle.L2 callfunc1 jmp.L1.L2: movl%esi, %edi.p2align 4,,6 callfunc2.L1: addq$8, %rsp.p2align 4,,2 ret

© 2010 Noah Mendelsohn Jump if <= 0 Machine code for a simple function 38 void ifelse(int a, int b) { if (a > b) func1(a); else func2(b); } ifelse: subq$8, %rsp cmpl%esi, %edi jle.L2 callfunc1 jmp.L1.L2: movl%esi, %edi.p2align 4,,6 callfunc2.L1: addq$8, %rsp.p2align 4,,2 ret

© 2010 Noah Mendelsohn Jump always Machine code for a simple function 39 void ifelse(int a, int b) { if (a > b) func1(a); else func2(b); } ifelse: subq$8, %rsp cmpl%esi, %edi jle.L2 callfunc1 jmp.L1.L2: movl%esi, %edi.p2align 4,,6 callfunc2.L1: addq$8, %rsp.p2align 4,,2 ret

© 2010 Noah Mendelsohn Calling Functions

© 2010 Noah Mendelsohn Why have a standard “linkage” for calling functions?  Functions are compiled separately and linked together  We need to standardize enough that function calls will succeed  Note: optimizing compilers may “cheat” when caller and callee are in the same source file –More on this later 41 See course notes on “Calls and Returns”“Calls and Returns”

© 2010 Noah Mendelsohn An interesting example 42 int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } See course notes on “Calls and Returns”“Calls and Returns”

© 2010 Noah Mendelsohn The process memory illusion  Process thinks it's running in a private space  Separated into segments, from address 0  Stack: memory for executing subroutines  Heap: memory for malloc/new  Global static variables  Text segment: where program lives 43

© 2010 Noah Mendelsohn The process memory illusion  Process thinks it's running in a private space  Separated into segments, from address 0  Stack: memory for executing subroutines  Heap: memory for malloc/new  Global static variables  Text segment: where program lives 44 Stack Text (code) Static initialized Static uninitialized Heap (malloc’d) argv, environ Loaded with your program 0 We’re about to study in depth how function calls use the stack.

© 2010 Noah Mendelsohn Function calls on Linux/AMD 64  Caller pushes return address on stack  Where practical, arguments passed in registers  Exceptions: –Structs, etc. –Too many –What can’t be passed in registers is at known offsets from stack pointer!  Return values –In register, typically %rax for integers and pointers –Exception: structures  Each function gets a stack frame –Leaf functions that make no calls may skip setting one up 45

© 2010 Noah Mendelsohn Arguments/return values in registers 46 Arguments and return values passed in registers when types are suitable and when there aren’t too many Return values usually in %rax, %eax, etc. Callee may change these and some other registers! MMX and FP 87 registers used for floating point Read the specifications for full details! Operand Size Argument Number %rdi%rsi%rdx%rcx%r8%r9 32%edi%esi%edx%ecx%r8d%r9d 16%di%si%dx%cx%r8w%r9w 8%dil%sil%dl%cl%r8b%r9b

© 2010 Noah Mendelsohn The stack – general case 47 Before call ???? %rsp Arguments Return address After callq %rsp Arguments %rsp If callee needs frame ???? Return address Args to next call? Callee vars sub $0x{framesize},%rsp Arguments framesize

© 2010 Noah Mendelsohn A simple function 48 unsigned int times2(unsigned int a) { return a * 2; } times2: leaq(%rdi,%rdi), %rax ret Double the argument …and put it in return value register

© 2010 Noah Mendelsohn The stack – general case 49 Before call ???? %rsp Arguments Return address After callq %rsp Arguments %rsp If callee needs frame ???? Return address Args to next call? Callee vars sub $0x{framesize},%rsp Arguments framesize Must be multiple of 16 when a function is called! But now it’s not a multiple of 16… Here too!

© 2010 Noah Mendelsohn The stack – general case 50 Before call ???? %rsp Arguments Return address After callq %rsp Arguments %rsp If callee needs frame ???? Return address Waste 8 bytes sub $8,%rsp Arguments If we’ll be calling other functions with no arguments, very common to see sub $8,%rsp to re-establish alignment Must be multiple of 16 when a function is called!

© 2010 Noah Mendelsohn A function that calls another 51 extern unsigned int times2(unsigned int a); unsigned int times8(unsigned int i) { return 4 * times2(i); } times8: subq$8, %rsp calltimes2 salq$2, %rax addq$8, %rsp ret times2 can assume %rsp is multiple of 16

© 2010 Noah Mendelsohn Factorial Revisited 52 int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je.L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret

© 2010 Noah Mendelsohn Function calls on Linux/AMD 64 (cont.)  Much of what you’ve seen can be skipped for “leaf” functions in the call tree  Inlining: –If small procedure is in same source file (or included header): just merge the code –Unless function is static… I.e. private to source file, the compiler still needs to create the normal version, in case anyone outside calls it!  Optimizing compilers “cheat” –Don’t build full stack –Leave return address for called function (last thing A calls B; last thing B does is call C  B leaves return address on stack and branches to C instead of calling it…when C does normal return, it goes straight back to A!) This is tail call optimization. –Many other wild optimizations, always done so other functions can’t tell anything unusual is happening! 53

© 2010 Noah Mendelsohn Optimized version 54 int fact(int n) { if (n == 0) return 1; else return n * fact(n - 1); } fact:.LFB2: pushq %rbx.LCFI0: movq %rdi, %rbx movl $1, %eax testq %rdi, %rdi je.L4 leaq -1(%rdi), %rdi call fact imulq %rbx, %rax.L4: popq %rbx ret Lightly optimized = O1 (what we saw before) fact:.LFB2: testq %rdi, %rdi movl $1, %eax je.L6.p2align 4,,7.L5: imulq %rdi, %rax subq $1, %rdi jne.L5.L6: rep ; ret Heavily optimized = O2 What happened to the recursion?!?!? This version doesn’t need to create a stack frame either!

© 2010 Noah Mendelsohn Getting the details on function call linkages  Bryant and O’Halloran has excellent introduction –Watch for differences between 32 and 64 bit  The official specification: –System V Application Binary Interface: AMD64 Architecture Processor Supplement –Find it at: –See especially sections 3.1 and

© 2010 Noah Mendelsohn Summary  C code compiled to assembler  Data moved to registers for manipulation  Conditional and jump instructions for control flow  Stack used for function calls  Compilers play all sorts of tricks when compiling code 56