Presentation is loading. Please wait.

Presentation is loading. Please wait.

III:1 X86 Assembly – Buffer Overflow. III:2 Administrivia HW3 – skip the farmer game portion, moved to HW5  Due today HW4 – Endian swap in ASM and C.

Similar presentations


Presentation on theme: "III:1 X86 Assembly – Buffer Overflow. III:2 Administrivia HW3 – skip the farmer game portion, moved to HW5  Due today HW4 – Endian swap in ASM and C."— Presentation transcript:

1 III:1 X86 Assembly – Buffer Overflow

2 III:2 Administrivia HW3 – skip the farmer game portion, moved to HW5  Due today HW4 – Endian swap in ASM and C Quiz2 – please do your correction in RED  Do not erase original answers Quiz3 – due next Monday How am I doing? Please know that there is no way we will cover everything you need to know in the book. So read.

3 III:3 IA32 Linux Memory Layout Stack  Runtime stack (8MB limit) Heap  Dynamically allocated storage  When call malloc(), calloc(), new() Data  Statically allocated data  E.g., arrays & strings declared in code Text  Executable machine instructions  Read-only Upper 2 hex digits = 8 bits of address FF 00 Stack Text Data Heap 08 8MB not drawn to scale

4 III:4 Memory Allocation Example char big_array[1<<24]; /* 16 MB */ char huge_array[1<<28]; /* 256 MB */ int beyond; char *p1, *p2, *p3, *p4; int useless() { return 0; } int main() { p1 = malloc(1 <<28); /* 256 MB */ p2 = malloc(1 << 8); /* 256 B */ p3 = malloc(1 <<28); /* 256 MB */ p4 = malloc(1 << 8); /* 256 B */ /* Some print statements... */ } FF 00 Stack Text Data Heap 08 not drawn to scale Where does everything go?

5 III:5 IA32 Example Addresses $esp0xffffbcd0 p3 0x65586008 p1 0x55585008 p40x1904a110 p20x1904a008 &p20x18049760 beyond 0x08049744 big_array 0x18049780 huge_array 0x08049760 main()0x080483c6 useless() 0x08049744 final malloc()0x006be166 address range ~2 32 FF 00 Stack Text Data Heap 08 80 not drawn to scale malloc() is dynamically linked address determined at runtime

6 III:6 How about some pointers to deal with the new boss? Sure. Try 0x0000A4F5, 0x00008EEF and 0x0000B32A. Pointers

7 III:7 C operators -> has very high precedence () has very high precedence monadic * just below OperatorsAssociativity () [] ->. left to right ! ~ ++ -- + - * & (type) sizeof right to left * / % left to right + - left to right > left to right >= left to right == != left to right & left to right ^ left to right | left to right && left to right || left to right ?: right to left = += -= *= /= %= &= ^= != >= right to left, left to right

8 III:8 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints

9 III:9 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int

10 III:10 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int

11 III:11 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int p is a pointer to an array[13] of int

12 III:12 C Pointer Declarations: Test Yourself! int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints p is an array[13] of pointer to int p is a pointer to an array[13] of int f is a function returning ptr to an array[13] of pointers to functions returning int

13 III:13 C Pointer Declarations int *p p is a pointer to int int *p[13] p is an array[13] of pointer to int int *(p[13]) p is an array[13] of pointer to int int **p p is a pointer to a pointer to an int int (*p)[13] p is a pointer to an array[13] of int int *f() f is a function returning a pointer to int int (*f)() f is a pointer to a function returning int int (*(*f())[13])() f is a function returning ptr to an array[13] of pointers to functions returning int int (*(*x[3])())[5] x is an array[3] of pointers to functions returning pointers to array[5] of ints

14 III:14 Avoiding Complex Declarations Use typedef to build up the declaration Instead of int (*(*x[3])())[5] : typedef int fiveints[5]; typedef fiveints* p5i; typedef p5i (*f_of_p5is)(); f_of_p5is x[3]; x is an array of 3 elements, each of which is a pointer to a function returning an array of 5 ints

15 III:15 Internet Worm and IM War November, 1988  Internet Worm attacks thousands of Internet hosts.  How did it happen? July, 1999  Microsoft launches MSN Messenger (instant messaging system).  Messenger clients can access popular AOL Instant Messaging Service (AIM) servers AIM server AIM client AIM client MSN client MSN server

16 III:16 Internet Worm and IM War (cont.) August 1999  Mysteriously, Messenger clients can no longer access AIM servers.  Microsoft and AOL begin the IM war:  AOL changes server to disallow Messenger clients  Microsoft makes changes to clients to defeat AOL changes.  At least 13 such skirmishes.  How did it happen? The Internet Worm and AOL/Microsoft War were both based on stack buffer overflow exploits!  many Unix functions do not check argument sizes.  allows target buffers to overflow.

17 III:17 String Library Code Implementation of Unix function gets()  No way to specify limit on number of characters to read Similar problems with other Unix functions  strcpy : Copies string of arbitrary length  scanf, fscanf, sscanf, when given %s conversion specification /* Get string from stdin */ char *gets(char *dest) { int c = getchar(); char *p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }

18 III:18 Vulnerable Buffer Code int main() { printf("Type a string:"); echo(); return 0; } /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); } unix>./bufdemo Type a string:1234567 1234567 unix>./bufdemo Type a string:12345678 Segmentation Fault unix>./bufdemo Type a string:123456789ABC Segmentation Fault

19 III:19 Buffer Overflow Disassembly 080484f0 : 80484f0:55 push %ebp 80484f1:89 e5 mov %esp,%ebp 80484f3:53 push %ebx 80484f4:8d 5d f8 lea 0xfffffff8(%ebp),%ebx 80484f7:83 ec 14 sub $0x14,%esp 80484fa:89 1c 24 mov %ebx,(%esp) 80484fd:e8 ae ff ff ff call 80484b0 8048502:89 1c 24 mov %ebx,(%esp) 8048505:e8 8a fe ff ff call 8048394 804850a:83 c4 14 add $0x14,%esp 804850d:5b pop %ebx 804850e:c9 leave 804850f:c3 ret 80485f2:e8 f9 fe ff ff call 80484f0 80485f7:8b 5d fc mov 0xfffffffc(%ebp),%ebx 80485fa:c9 leave 80485fb:31 c0 xor %eax,%eax 80485fd:c3 ret

20 III:20 Buffer Overflow Stack echo: pushl %ebp# Save %ebp on stack movl %esp, %ebp pushl %ebx# Save %ebx leal -8(%ebp),%ebx# Compute buf as %ebp-8 subl $20, %esp# Allocate stack space movl %ebx, (%esp)# Push buf on stack call gets# Call gets... /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); } Return Address Saved %ebp %ebp Stack Frame for main Stack Frame for echo [3][2][1][0] buf Before call to gets

21 III:21 Buffer Overflow Stack Example unix> gdb bufdemo (gdb) break echo Breakpoint 1 at 0x8048583 (gdb) run Breakpoint 1, 0x8048583 in echo () (gdb) print /x $ebp $1 = 0xffffc638 (gdb) print /x *(unsigned *)$ebp $2 = 0xffffc658 (gdb) print /x *((unsigned *)$ebp + 1) $3 = 0x80485f7 80485f2:call 80484f0 80485f7:mov 0xfffffffc(%ebp),%ebx # Return Point 0xffffc638 buf 0xffffc658 Return Address Saved %ebp Stack Frame for main Stack Frame for echo [3][2][1][0] Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 Before call to gets

22 III:22 Buffer Overflow Example #1 Overflow bufs, but no problem 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf ff c658 080485f7 00373635 Before call to gets Input 1234567

23 III:23 Buffer Overflow Example #2 Base pointer corrupted 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf ff c600 080485f7 38373635 Before call to gets Input 12345678... 804850a:83 c4 14 add $0x14,%esp # deallocate space 804850d:5b pop %ebx # restore %ebx 804850e:c9 leave # movl %ebp, %esp; popl %ebp 804850f:c3 ret # Return

24 III:24 Buffer Overflow Example #3 Return address corrupted 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx buf ff c658 080485f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34333231 buf 43424139 08048500 38373635 Before call to gets Input 12345678 80485f2:call 80484f0 80485f7:mov 0xfffffffc(%ebp),%ebx # Return Point

25 III:25 Malicious Use of Buffer Overflow Input string contains byte representation of executable code Overwrite return address with address of buffer When bar() executes ret, will jump to exploit code int bar() { char buf[64]; gets(buf);... return...; } void foo(){ bar();... } Stack after call to gets() B return address A foo stack frame bar stack frame B exploit code pad data written by gets()

26 III:26 Exploits Based on Buffer Overflows Buffer overflow bugs allow remote machines to execute arbitrary code on victim machines Internet worm  Early versions of the finger server (fingerd) used gets() to read the argument sent by the client:  finger jka@ee.byu.edu  Worm attacked fingerd server by sending phony argument:  finger “exploit-code padding new-return- address”  Exploit code: executed a root shell on the victim machine with a direct TCP connection to the attacker.

27 III:27 Exploits Based on Buffer Overflows Buffer overflow bugs allow remote machines to execute arbitrary code on victim machines IM War  AOL exploited existing buffer overflow bug in AIM clients  Exploit code: returned 4-byte signature (the bytes at some location in the AIM client) to server.  When Microsoft changed code to match signature, AOL changed signature location.

28 III:28 Example Virus: Code Red Worm History  June 18, 2001. Microsoft announces buffer overflow vulnerability in IIS Internet server  July 19, 2001. over 250,000 machines infected by new virus in 9 hours  White house must change its IP address. Pentagon shut down public WWW servers for day When CS:APP Web Site was set up  Received strings of form GET /default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN N....NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u909 0%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u 6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b0 0%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0" 400 325 "-" "-"

29 III:29 Code Red Exploit Code Starts 100 threads running Spread self  Generate random IP addresses & send attack string  Between 1st & 19th of month Attack www.whitehouse.gov  Send 98,304 packets; sleep for 4-1/2 hours; repeat  Denial of service attack  Between 21st & 27th of month Deface server’s home page  After waiting 2 hours

30 III:30 Avoiding Overflow Vulnerability Use library routines that limit string lengths  fgets instead of gets  strncpy instead of strcpy  Don’t use scanf with %s conversion specification  Use fgets to read the string  Or use %ns where n is a suitable integer /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ fgets(buf, 4, stdin); puts(buf); }

31 III:31 System-Level Protections Randomized stack offsets  At start of program, allocate random amount of space on stack  Makes it difficult for hacker to predict beginning of inserted code Non-executable code segments  In traditional x86, segments are either “read- only” or “writeable”  Can execute anything readable  Better: add explicit “execute” permission Stack corruption detection  Compiler inserts “canary” just beyond array and code to check it before function return  Code aborts with error if canary changed unix> gdb bufdemo (gdb) break echo (gdb) run (gdb) print /x $ebp $1 = 0xffffc638 (gdb) run (gdb) print /x $ebp $2 = 0xffffbb08 (gdb) run (gdb) print /x $ebp $3 = 0xffffc6a8

32 III:32 Web-aside: Assembly & C Requirements to have asm + C integrated together?  In assembly, follow register usage, parameter passing, and return value conventions Problems with inline assembly  Not portable  Knowing what registers are safe to use Advantages of GCC’s extended asm  You inform compiler of source values required, results your code generates, and registers that will be overwritten  GCC generates code to correctly set up source values, execute desired instructions, and correctly use computed results

33 III:33 Administrivia Quiz 3 – due Feb 7 Quiz 4 – due Feb 14 Quiz 5 – due Feb 21 Midterm – available after class on Feb 28  No class Mar 1, work on the exam  Due 5PM Monday Mar 5  Take home, open book, open notes, open lecture slides  NO COLLABORATION

34 III:34 BONUS SLIDES

35 III:35 IA32 Floating Point (x87) History  8086: first computer to implement IEEE FP  separate 8087 FPU (floating point unit)  486: merged FPU and Integer Unit onto one chip  Becoming obsolete with x86-64 Summary  Hardware to add, multiply, and divide  Floating point data registers  Various control & status registers Floating Point Formats  single precision (C float ): 32 bits  double precision (C double ): 64 bits  extended precision (C long double ): 80 bits Instruction decoder and sequencer FPU Integer Unit Memory

36 III:36 FPU Data Register Stack (x87) FPU register format (80 bit extended precision) FPU registers  8 registers %st(0) - %st(7)  Logically form stack  Top: %st(0)  Bottom disappears (drops out) after too many pushs sexpfrac 0 63647879 “Top” %st(0) %st(1) %st(2) %st(3)

37 III:37 InstructionEffectDescription fldzpush 0.0 Load zero flds Addr push Mem[Addr] Load single precision real fmuls Addr %st(0)  %st(0)* M[Addr]Multiply faddp%st(1)  %st(0)+%st(1);pop Add and pop FPU instructions (x87) Large number of floating point instructions and formats  ~50 basic instruction types  load, store, add, multiply  sin, cos, tan, arctan, and log  Often slower than math lib (!) Sample instructions:

38 III:38 FP Code Example (x87) Compute inner product of two vectors  Single precision arithmetic  Common computation float ipf (float x[], float y[], int n) { int i; float result = 0.0; for (i = 0; i < n; i++) result += x[i]*y[i]; return result; } pushl %ebp # setup movl %esp,%ebp pushl %ebx movl 8(%ebp),%ebx # %ebx=&x movl 12(%ebp),%ecx # %ecx=&y movl 16(%ebp),%edx # %edx=n fldz # push +0.0 xorl %eax,%eax # i=0 cmpl %edx,%eax # if i>=n done jge.L3.L5: flds (%ebx,%eax,4) # push x[i] fmuls (%ecx,%eax,4) # st(0)*=y[i] faddp # st(1)+=st(0); pop incl %eax # i++ cmpl %edx,%eax # if i<n repeat jl.L5.L3: movl -4(%ebp),%ebx # finish movl %ebp, %esp popl %ebp ret # st(0) = result

39 III:39 Inner Product Stack Trace 1. fldz 0.0 %st(0) 2. flds (%ebx,%eax,4) 0.0 %st(1) x[0] %st(0) 3. fmuls (%ecx,%eax,4) 0.0 %st(1) x[0]*y[0] %st(0) 4. faddp 0.0+x[0]*y[0] %st(0) 5. flds (%ebx,%eax,4) x[0]*y[0] %st(1) x[1] %st(0) 6. fmuls (%ecx,%eax,4) x[0]*y[0] %st(1) x[1]*y[1] %st(0) 7. faddp %st(0) x[0]*y[0]+x[1]*y[1] Initialization Iteration 0Iteration 1 eax = i ebx = *x ecx = *y

40 III:40 x87 Floating Point Problems One of “least elegant features” of x86  Remnants of original co-processor design Compiler technology is much better today Stack registers are problematic  All must be treated as caller-saved, too much memory traffic  Can’t treat as true stack over multiple function calls Separate FP status word is problematic  Set by FP comparison instructions, outcome used by branches  Must transfer to int word, test bits explicitly “Even experienced programmers find this code arcane and difficult to read.”

41 III:41 SSE Floating Point Architecture Better compiler target than x87 Includes new registers and instructions In Intel CPUs since Pentium III Default for x86-64, but not IA32

42 III:42 Vector Instructions: SSE Family SIMD (single-instruction, multiple data) vector instructions  New data types, registers, operations  Parallel operation on small (length 2-8) vectors of integers or floats  Example: Floating point vector instructions  Available with Intel’s SSE (streaming SIMD extensions) family  SSE starting with Pentium III: 4-way single precision  SSE2 starting with Pentium 4: 2-way double precision  All x86-64 have SSE3 (superset of SSE2, SSE) + x “4-way”

43 III:43 Intel Architectures (Focus Floating Point) x86-64 / em64t x86-32 x86-16 MMX SSE SSE2 SSE3 SSE4 8086 286 386 486 Pentium Pentium MMX Pentium III Pentium 4 Pentium 4E Pentium 4F Core 2 Duo time ArchitecturesProcessorsFeatures 4-way single precision FP 2-way double precision FP Our focus: SSE3 used for scalar (non-vector) floating point

44 III:44 SSE3 Registers All caller saved %xmm0 for floating point return value %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 %xmm5 %xmm6 %xmm7 %xmm8 %xmm9 %xmm10 %xmm11 %xmm12 %xmm13 %xmm14 %xmm15 Argument #1 Argument #2 Argument #3 Argument #4 Argument #5 Argument #6 Argument #7 Argument #8 128 bit = 2 doubles = 4 singles

45 III:45 SSE3 Registers Different data types and associated instructions Integer vectors:  16-way byte  8-way 2 bytes  4-way 4 bytes Floating point vectors:  4-way single  2-way double Floating point scalars:  single  double 128 bit LSB

46 III:46 SSE3 Instructions: Examples Single precision 4-way vector add: addps %xmm0 %xmm1 Single precision scalar add: addss %xmm0 %xmm1 + %xmm0 %xmm1 + %xmm0 %xmm1

47 III:47 SSE3 Instruction Names addpsaddss addpdaddsd packed (vector) single slot (scalar) single precision double precision our focus

48 III:48 SSE3 Basic Instructions Moves  Usual operand form: reg → reg, reg → mem, mem → reg Arithmetic SingleDoubleEffect addssaddsd D ← D + S subsssubsd D ← D – S mulssmulsd D ← D x S divssdivsd D ← D / S maxssmaxsd D ← max(D,S) minssminsd D ← min(D,S) sqrtsssqrtsd D ← sqrt(S) SingleDoubleEffect movssmovsd D ← S

49 III:49 x86-64 FP Code Example Compute inner product of two vectors  Single precision arithmetic  Uses SSE3 instructions float ipf (float x[], float y[], int n) { int i; float result = 0.0; for (i = 0; i < n; i++) result += x[i]*y[i]; return result; } ipf: xorps %xmm1, %xmm1# result = 0.0 xorl %ecx, %ecx# i = 0 jmp.L8# goto middle.L10:# loop: movslq %ecx,%rax# icpy = i incl %ecx# i++ movss (%rsi,%rax,4), %xmm0# t = y[icpy] mulss (%rdi,%rax,4), %xmm0# t *= x[icpy] addss %xmm0, %xmm1# result += t.L8:# middle: cmpl %edx, %ecx# i:n jl.L10# if < goto loop movaps %xmm1, %xmm0# return result ret

50 III:50 SSE3 Conversion Instructions Conversions  Same operand forms as moves InstructionDescription cvtss2sd single → double cvtsd2ss double → single cvtsi2ss int → single cvtsi2sd int → double cvtsi2ssq quad int → single cvtsi2sdq quad int → double cvttss2si single → int (truncation) cvttsd2si double → int (truncation) cvttss2siq single → quad int (truncation) cvttss2siq double → quad int (truncation)

51 III:51 x86-64 FP Code Example double funct(double a, float x, double b, int i) { return a*x - b/i; } a %xmm0 double x %xmm1 float b %xmm2 double i %edi int funct: cvtss2sd %xmm1, %xmm1 # %xmm1 = (double) x mulsd %xmm0, %xmm1 # %xmm1 = a*x cvtsi2sd %edi, %xmm0 # %xmm0 = (double) i divsd %xmm0, %xmm2 # %xmm2 = b/i movsd %xmm1, %xmm0 # %xmm0 = a*x subsd %xmm2, %xmm0 # return a*x - b/i ret

52 III:52 Constants double cel2fahr(double temp) { return 1.8 * temp + 32.0; } # Constant declarations.LC2:.long 3435973837 # Low order four bytes of 1.8.long 1073532108 # High order four bytes of 1.8.LC4:.long 0 # Low order four bytes of 32.0.long 1077936128 # High order four bytes of 32.0 # Code cel2fahr: mulsd.LC2(%rip), %xmm0 # Multiply by 1.8 addsd.LC4(%rip), %xmm0 # Add 32.0 ret Here: Constants in decimal format compiler decision hex more readable

53 III:53 Comments SSE3 floating point  Uses lower ½ (double) or ¼ (single) of vector  Finally departure from awkward x87  Assembly very similar to integer code x87 still supported  Even mixing with SSE3 possible  Not recommended For highest floating point performance  Vectorization a must (but not in this course )  See next slide

54 III:54 Vector Instructions Starting with version 4.1.1, GCC can auto-vectorize to some extent  -O3 or –ftree-vectorize  No speed-up guaranteed – limited capability  Intel’s C++ compiler (icc) currently much better  Spice machines have 4.1.2 installed (May 2010) For highest performance vectorize yourself using intrinsics  Intrinsics = C interface to vector instructions Future  Intel AVX announced: 4-way double, 8-way single


Download ppt "III:1 X86 Assembly – Buffer Overflow. III:2 Administrivia HW3 – skip the farmer game portion, moved to HW5  Due today HW4 – Endian swap in ASM and C."

Similar presentations


Ads by Google