Download presentation
Presentation is loading. Please wait.
Published byAvice Carr Modified over 8 years ago
1
III:1 X86 Assembly – Buffer Overflow
2
III:2 Administrivia HW3 – skip the farmer game portion, moved to HW5 Due today HW4 – Endian swap in ASM and C Quiz2 – please do your correction in RED Do not erase original answers Quiz3 – due next Monday How am I doing? Please know that there is no way we will cover everything you need to know in the book. So read.
3
III:3 IA32 Linux Memory Layout Stack Runtime stack (8MB limit) Heap Dynamically allocated storage When call malloc(), calloc(), new() Data Statically allocated data E.g., arrays & strings declared in code Text Executable machine instructions Read-only Upper 2 hex digits = 8 bits of address FF 00 Stack Text Data Heap 08 8MB not drawn to scale
4
III:4 Memory Allocation Example char big_array[1<<24]; /* 16 MB */ char huge_array[1<<28]; /* 256 MB */ int beyond; char *p1, *p2, *p3, *p4; int useless() { return 0; } int main() { p1 = malloc(1 <<28); /* 256 MB */ p2 = malloc(1 << 8); /* 256 B */ p3 = malloc(1 <<28); /* 256 MB */ p4 = malloc(1 << 8); /* 256 B */ /* Some print statements... */ } FF 00 Stack Text Data Heap 08 not drawn to scale Where does everything go?
5
III:5 IA32 Example Addresses $esp0xffffbcd0 p3 0x65586008 p1 0x55585008 p40x1904a110 p20x1904a008 &p20x18049760 beyond 0x08049744 big_array 0x18049780 huge_array 0x08049760 main()0x080483c6 useless() 0x08049744 final malloc()0x006be166 address range ~2 32 FF 00 Stack Text Data Heap 08 80 not drawn to scale malloc() is dynamically linked address determined at runtime
6
III:6 How about some pointers to deal with the new boss? Sure. Try 0x0000A4F5, 0x00008EEF and 0x0000B32A. Pointers
7
III:7 C operators -> has very high precedence () has very high precedence monadic * just below OperatorsAssociativity () [] ->. left to right ! ~ ++ -- + - * & (type) sizeof right to left * / % left to right + - left to right > left to right >= left to right == != left to right & left to right ^ left to right | left to right && left to right || left to right ?: right to left = += -= *= /= %= &= ^= != >= right to left, left to right
8
III:8 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints
9
III:9 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int
10
III:10 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int
11
III:11 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int p is a pointer to an array[13] of int
12
III:12 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int p is a pointer to an array[13] of int f is a function returning ptr to an array[13] of pointers to functions returning int
13
III:13 C Pointer Declarations int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints
14
III:14 Avoiding Complex Declarations Use typedef to build up the declaration Instead of int (*(*x[3])())[5] : typedef int fiveints[5]; typedef fiveints* p5i; typedef p5i (*f_of_p5is)(); f_of_p5is x[3]; x is an array of 3 elements, each of which is a pointer to a function returning an array of 5 ints
15
III:15 Internet Worm and IM War November, 1988 Internet Worm attacks thousands of Internet hosts. How did it happen? July, 1999 Microsoft launches MSN Messenger (instant messaging system). Messenger clients can access popular AOL Instant Messaging Service (AIM) servers AIM server AIM client AIM client MSN client MSN server
16
III:16 Internet Worm and IM War (cont.) August 1999 Mysteriously, Messenger clients can no longer access AIM servers. Microsoft and AOL begin the IM war: AOL changes server to disallow Messenger clients Microsoft makes changes to clients to defeat AOL changes. At least 13 such skirmishes. How did it happen? The Internet Worm and AOL/Microsoft War were both based on stack buffer overflow exploits! many Unix functions do not check argument sizes. allows target buffers to overflow.
17
III:17 String Library Code Implementation of Unix function gets() No way to specify limit on number of characters to read Similar problems with other Unix functions strcpy : Copies string of arbitrary length scanf, fscanf, sscanf, when given %s conversion specification /* Get string from stdin */ char *gets(char *dest) { int c = getchar(); char *p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }
18
III:18 Vulnerable Buffer Code int main() { printf("Type a string:"); echo(); return 0; } /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); } unix>./bufdemo Type a string:1234567 1234567 unix>./bufdemo Type a string:12345678 Segmentation Fault unix>./bufdemo Type a string:123456789ABC Segmentation Fault
19
III:19 Buffer Overflow Disassembly 080484f0 : 80484f0:55 push %ebp 80484f1:89 e5 mov %esp,%ebp 80484f3:53 push %ebx 80484f4:8d 5d f8 lea 0xfffffff8(%ebp),%ebx 80484f7:83 ec 14 sub $0x14,%esp 80484fa:89 1c 24 mov %ebx,(%esp) 80484fd:e8 ae ff ff ff call 80484b0 8048502:89 1c 24 mov %ebx,(%esp) 8048505:e8 8a fe ff ff call 8048394 804850a:83 c4 14 add $0x14,%esp 804850d:5b pop %ebx 804850e:c9 leave 804850f:c3 ret 80485f2:e8 f9 fe ff ff call 80484f0 80485f7:8b 5d fc mov 0xfffffffc(%ebp),%ebx 80485fa:c9 leave 80485fb:31 c0 xor %eax,%eax 80485fd:c3 ret
20
III:20 Buffer Overflow Stack echo: pushl %ebp# Save %ebp on stack movl %esp, %ebp pushl %ebx# Save %ebx leal -8(%ebp),%ebx# Compute buf as %ebp-8 subl $20, %esp# Allocate stack space movl %ebx, (%esp)# Push buf on stack call gets# Call gets... /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); } Return Address Saved %ebp %ebp Stack Frame for main Stack Frame for echo [3][2][1][0] buf Before call to gets
21
III:21 Buffer Overflow Stack Example unix> gdb bufdemo (gdb) break echo Breakpoint 1 at 0x8048583 (gdb) run Breakpoint 1, 0x8048583 in echo () (gdb) print /x $ebp $1 = 0xffffc638 (gdb) print /x *(unsigned *)$ebp $2 = 0xffffc658 (gdb) print /x *((unsigned *)$ebp + 1) $3 = 0x80485f7 80485f2:call 80484f0 80485f7:mov 0xfffffffc(%ebp),%ebx # Return Point 0xffffc638 buf 0xffffc658 Return Address Saved %ebp Stack Frame for main Stack Frame for echo [3][2][1][0] Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 Before call to gets
22
III:22 Buffer Overflow Example #1 Overflow bufs, but no problem 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf ff c658 080485f7 00373635 Before call to gets Input 1234567
23
III:23 Buffer Overflow Example #2 Base pointer corrupted 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf ff c600 080485f7 38373635 Before call to gets Input 12345678... 804850a:83 c4 14 add $0x14,%esp # deallocate space 804850d:5b pop %ebx # restore %ebx 804850e:c9 leave # movl %ebp, %esp; popl %ebp 804850f:c3 ret # Return
24
III:24 Buffer Overflow Example #3 Return address corrupted 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf 43424139 08048500 38373635 Before call to gets Input 12345678 80485f2:call 80484f0 80485f7:mov 0xfffffffc(%ebp),%ebx # Return Point
25
III:25 Malicious Use of Buffer Overflow Input string contains byte representation of executable code Overwrite return address with address of buffer When bar() executes ret, will jump to exploit code int bar() { char buf[64]; gets(buf);... return...; } void foo(){ bar();... } Stack after call to gets() B return address A foo stack frame bar stack frame B exploit code pad data written by gets()
26
III:26 Exploits Based on Buffer Overflows Buffer overflow bugs allow remote machines to execute arbitrary code on victim machines Internet worm Early versions of the finger server (fingerd) used gets() to read the argument sent by the client: finger jka@ee.byu.edu Worm attacked fingerd server by sending phony argument: finger “exploit-code padding new-return- address” Exploit code: executed a root shell on the victim machine with a direct TCP connection to the attacker.
27
III:27 Exploits Based on Buffer Overflows Buffer overflow bugs allow remote machines to execute arbitrary code on victim machines IM War AOL exploited existing buffer overflow bug in AIM clients Exploit code: returned 4-byte signature (the bytes at some location in the AIM client) to server. When Microsoft changed code to match signature, AOL changed signature location.
28
III:28 Example Virus: Code Red Worm History June 18, 2001. Microsoft announces buffer overflow vulnerability in IIS Internet server July 19, 2001. over 250,000 machines infected by new virus in 9 hours White house must change its IP address. Pentagon shut down public WWW servers for day When CS:APP Web Site was set up Received strings of form GET /default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN N....NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u909 0%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u 6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b0 0%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0" 400 325 "-" "-"
29
III:29 Code Red Exploit Code Starts 100 threads running Spread self Generate random IP addresses & send attack string Between 1st & 19th of month Attack www.whitehouse.gov Send 98,304 packets; sleep for 4-1/2 hours; repeat Denial of service attack Between 21st & 27th of month Deface server’s home page After waiting 2 hours
30
III:30 Avoiding Overflow Vulnerability Use library routines that limit string lengths fgets instead of gets strncpy instead of strcpy Don’t use scanf with %s conversion specification Use fgets to read the string Or use %ns where n is a suitable integer /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ fgets(buf, 4, stdin); puts(buf); }
31
III:31 System-Level Protections Randomized stack offsets At start of program, allocate random amount of space on stack Makes it difficult for hacker to predict beginning of inserted code Non-executable code segments In traditional x86, segments are either “read- only” or “writeable” Can execute anything readable Better: add explicit “execute” permission Stack corruption detection Compiler inserts “canary” just beyond array and code to check it before function return Code aborts with error if canary changed unix> gdb bufdemo (gdb) break echo (gdb) run (gdb) print /x $ebp $1 = 0xffffc638 (gdb) run (gdb) print /x $ebp $2 = 0xffffbb08 (gdb) run (gdb) print /x $ebp $3 = 0xffffc6a8
32
III:32 Web-aside: Assembly & C Requirements to have asm + C integrated together? In assembly, follow register usage, parameter passing, and return value conventions Problems with inline assembly Not portable Knowing what registers are safe to use Advantages of GCC’s extended asm You inform compiler of source values required, results your code generates, and registers that will be overwritten GCC generates code to correctly set up source values, execute desired instructions, and correctly use computed results
33
III:33 Administrivia Quiz 3 – due Feb 7 Quiz 4 – due Feb 14 Quiz 5 – due Feb 21 Midterm – available after class on Feb 28 No class Mar 1, work on the exam Due 5PM Monday Mar 5 Take home, open book, open notes, open lecture slides NO COLLABORATION
34
III:34 BONUS SLIDES
35
III:35 IA32 Floating Point (x87) History 8086: first computer to implement IEEE FP separate 8087 FPU (floating point unit) 486: merged FPU and Integer Unit onto one chip Becoming obsolete with x86-64 Summary Hardware to add, multiply, and divide Floating point data registers Various control & status registers Floating Point Formats single precision (C float ): 32 bits double precision (C double ): 64 bits extended precision (C long double ): 80 bits Instruction decoder and sequencer FPU Integer Unit Memory
36
III:36 FPU Data Register Stack (x87) FPU register format (80 bit extended precision) FPU registers 8 registers %st(0) - %st(7) Logically form stack Top: %st(0) Bottom disappears (drops out) after too many pushs sexpfrac 0 63647879 “Top” %st(0) %st(1) %st(2) %st(3)
37
III:37 InstructionEffectDescription fldzpush 0.0 Load zero flds Addr push Mem[Addr] Load single precision real fmuls Addr %st(0) %st(0)* M[Addr]Multiply faddp%st(1) %st(0)+%st(1);pop Add and pop FPU instructions (x87) Large number of floating point instructions and formats ~50 basic instruction types load, store, add, multiply sin, cos, tan, arctan, and log Often slower than math lib (!) Sample instructions:
38
III:38 FP Code Example (x87) Compute inner product of two vectors Single precision arithmetic Common computation float ipf (float x[], float y[], int n) { int i; float result = 0.0; for (i = 0; i < n; i++) result += x[i]*y[i]; return result; } pushl %ebp # setup movl %esp,%ebp pushl %ebx movl 8(%ebp),%ebx # %ebx=&x movl 12(%ebp),%ecx # %ecx=&y movl 16(%ebp),%edx # %edx=n fldz # push +0.0 xorl %eax,%eax # i=0 cmpl %edx,%eax # if i>=n done jge.L3.L5: flds (%ebx,%eax,4) # push x[i] fmuls (%ecx,%eax,4) # st(0)*=y[i] faddp # st(1)+=st(0); pop incl %eax # i++ cmpl %edx,%eax # if i<n repeat jl.L5.L3: movl -4(%ebp),%ebx # finish movl %ebp, %esp popl %ebp ret # st(0) = result
39
III:39 Inner Product Stack Trace 1. fldz 0.0 %st(0) 2. flds (%ebx,%eax,4) 0.0 %st(1) x[0] %st(0) 3. fmuls (%ecx,%eax,4) 0.0 %st(1) x[0]*y[0] %st(0) 4. faddp 0.0+x[0]*y[0] %st(0) 5. flds (%ebx,%eax,4) x[0]*y[0] %st(1) x[1] %st(0) 6. fmuls (%ecx,%eax,4) x[0]*y[0] %st(1) x[1]*y[1] %st(0) 7. faddp %st(0) x[0]*y[0]+x[1]*y[1] Initialization Iteration 0Iteration 1 eax = i ebx = *x ecx = *y
40
III:40 x87 Floating Point Problems One of “least elegant features” of x86 Remnants of original co-processor design Compiler technology is much better today Stack registers are problematic All must be treated as caller-saved, too much memory traffic Can’t treat as true stack over multiple function calls Separate FP status word is problematic Set by FP comparison instructions, outcome used by branches Must transfer to int word, test bits explicitly “Even experienced programmers find this code arcane and difficult to read.”
41
III:41 SSE Floating Point Architecture Better compiler target than x87 Includes new registers and instructions In Intel CPUs since Pentium III Default for x86-64, but not IA32
42
III:42 Vector Instructions: SSE Family SIMD (single-instruction, multiple data) vector instructions New data types, registers, operations Parallel operation on small (length 2-8) vectors of integers or floats Example: Floating point vector instructions Available with Intel’s SSE (streaming SIMD extensions) family SSE starting with Pentium III: 4-way single precision SSE2 starting with Pentium 4: 2-way double precision All x86-64 have SSE3 (superset of SSE2, SSE) + x “4-way”
43
III:43 Intel Architectures (Focus Floating Point) x86-64 / em64t x86-32 x86-16 MMX SSE SSE2 SSE3 SSE4 8086 286 386 486 Pentium Pentium MMX Pentium III Pentium 4 Pentium 4E Pentium 4F Core 2 Duo time ArchitecturesProcessorsFeatures 4-way single precision FP 2-way double precision FP Our focus: SSE3 used for scalar (non-vector) floating point
44
III:44 SSE3 Registers All caller saved %xmm0 for floating point return value %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 %xmm5 %xmm6 %xmm7 %xmm8 %xmm9 %xmm10 %xmm11 %xmm12 %xmm13 %xmm14 %xmm15 Argument #1 Argument #2 Argument #3 Argument #4 Argument #5 Argument #6 Argument #7 Argument #8 128 bit = 2 doubles = 4 singles
45
III:45 SSE3 Registers Different data types and associated instructions Integer vectors: 16-way byte 8-way 2 bytes 4-way 4 bytes Floating point vectors: 4-way single 2-way double Floating point scalars: single double 128 bit LSB
46
III:46 SSE3 Instructions: Examples Single precision 4-way vector add: addps %xmm0 %xmm1 Single precision scalar add: addss %xmm0 %xmm1 + %xmm0 %xmm1 + %xmm0 %xmm1
47
III:47 SSE3 Instruction Names addpsaddss addpdaddsd packed (vector) single slot (scalar) single precision double precision our focus
48
III:48 SSE3 Basic Instructions Moves Usual operand form: reg → reg, reg → mem, mem → reg Arithmetic SingleDoubleEffect addssaddsd D ← D + S subsssubsd D ← D – S mulssmulsd D ← D x S divssdivsd D ← D / S maxssmaxsd D ← max(D,S) minssminsd D ← min(D,S) sqrtsssqrtsd D ← sqrt(S) SingleDoubleEffect movssmovsd D ← S
49
III:49 x86-64 FP Code Example Compute inner product of two vectors Single precision arithmetic Uses SSE3 instructions float ipf (float x[], float y[], int n) { int i; float result = 0.0; for (i = 0; i < n; i++) result += x[i]*y[i]; return result; } ipf: xorps %xmm1, %xmm1# result = 0.0 xorl %ecx, %ecx# i = 0 jmp.L8# goto middle.L10:# loop: movslq %ecx,%rax# icpy = i incl %ecx# i++ movss (%rsi,%rax,4), %xmm0# t = y[icpy] mulss (%rdi,%rax,4), %xmm0# t *= x[icpy] addss %xmm0, %xmm1# result += t.L8:# middle: cmpl %edx, %ecx# i:n jl.L10# if < goto loop movaps %xmm1, %xmm0# return result ret
50
III:50 SSE3 Conversion Instructions Conversions Same operand forms as moves InstructionDescription cvtss2sd single → double cvtsd2ss double → single cvtsi2ss int → single cvtsi2sd int → double cvtsi2ssq quad int → single cvtsi2sdq quad int → double cvttss2si single → int (truncation) cvttsd2si double → int (truncation) cvttss2siq single → quad int (truncation) cvttss2siq double → quad int (truncation)
51
III:51 x86-64 FP Code Example double funct(double a, float x, double b, int i) { return a*x - b/i; } a %xmm0 double x %xmm1 float b %xmm2 double i %edi int funct: cvtss2sd %xmm1, %xmm1 # %xmm1 = (double) x mulsd %xmm0, %xmm1 # %xmm1 = a*x cvtsi2sd %edi, %xmm0 # %xmm0 = (double) i divsd %xmm0, %xmm2 # %xmm2 = b/i movsd %xmm1, %xmm0 # %xmm0 = a*x subsd %xmm2, %xmm0 # return a*x - b/i ret
52
III:52 Constants double cel2fahr(double temp) { return 1.8 * temp + 32.0; } # Constant declarations.LC2:.long 3435973837 # Low order four bytes of 1.8.long 1073532108 # High order four bytes of 1.8.LC4:.long 0 # Low order four bytes of 32.0.long 1077936128 # High order four bytes of 32.0 # Code cel2fahr: mulsd.LC2(%rip), %xmm0 # Multiply by 1.8 addsd.LC4(%rip), %xmm0 # Add 32.0 ret Here: Constants in decimal format compiler decision hex more readable
53
III:53 Comments SSE3 floating point Uses lower ½ (double) or ¼ (single) of vector Finally departure from awkward x87 Assembly very similar to integer code x87 still supported Even mixing with SSE3 possible Not recommended For highest floating point performance Vectorization a must (but not in this course ) See next slide
54
III:54 Vector Instructions Starting with version 4.1.1, GCC can auto-vectorize to some extent -O3 or –ftree-vectorize No speed-up guaranteed – limited capability Intel’s C++ compiler (icc) currently much better Spice machines have 4.1.2 installed (May 2010) For highest performance vectorize yourself using intrinsics Intrinsics = C interface to vector instructions Future Intel AVX announced: 4-way double, 8-way single
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.