Strings COEN 296A Spring 2006
Strings Strings are a fundamental concept, but they are not a built-in data type in C/C++. C-Strings C-style string: character array terminated by first null character. Wide string: wide character array terminated by first null character. C++ - Strings Several Classes Standard Template Class: std::basic_string, std::string, std::wstring No inter-operability between C and C++ style strings.
Strings C-style strings consist of a contiguous sequence of characters terminated by and including the first null character. A pointer to a string points to its initial character. The length of a string is the number of bytes preceding the null character The value of a string is the sequence of the values of the contained characters, in order. hello\0 length
Strings Common Errors: Unbounded string copies Null-termination errors Truncation Write outside array bounds Off-by-one errors Improper data sanitization
Strings What’s wrong? #include void main(void) { char Password[80]; puts("Enter 8 character password:"); gets(Password); } gets: The program reads from standard input until a newline character is read or an end of file (EOF) condition is encountered. Programmer does not know the size of input. Standard (vulnerable) solution allocates a much bigger buffer than expected input. UNBOUNDED STRING COPY
Strings What’s wrong? #include int main(int argc, char *argv[]) { char name [2048]; strcpy(name, argv[1]); strcat(name, " = "); strcat(name, argv[2]); return 0; }
Strings Same Problem: The standard arguments can be of arbitrary length. However, here we can get the length of the string before- hand: #include int main(int argc, char *argv[]) { char *buff = (char *)malloc(strlen(argv[1])+1); if (buff != NULL) { strcpy(buff, argv[1]); printf("argv[1] = %s.\n", buff); } else { /* Couldn't get the memory - recover */ } return 0; }
Strings C++ allows the same type of mistake. #include using namespace std; int main() { char buf[12]; cin >> buf; cout<< "echo: " << buf << endl; return 0; } Overflows buffer when input is longer than 11 characters. cin extraction ends with a valid white space, a null character, or an EOF. UNBOUNDED STRING COPY
Strings Correct Version: #include using namespace std; int main() { char buf[12]; cin.width(12); cin >> buf; cout<< "echo: " << buf << endl; return 0; } Set the field width (ios_base::width) to a positive value. cin.width(12) limits the extraction of characters so that at most 12 characters, including the terminating 0 character are read at a time.
Strings What is wrong with this code? #include int main(int argc, char *argv[]) { 1 char a[16]; 2 char b[16]; 3 char c[32]; 5 strcpy(a, " abcdef"); 6 strcpy(b, " abcdef"); 7 strcpy(c, a); 8 strcat(c, b); 9 printf("a = %s\n", a); 10 return 0; } Static allocation for character arrays a, b, c fails to allocate space for the null-termination character. The strcpy (6) might overwrite this null byte. This depends on how the compiler allocates memory. If the byte is overwritten, then a points to an array of 32 bytes, and the strcat (7) writes beyond the bound of c. Experience will vary depending on the compiler and debug / release version. This errors can lay dormant for a long time until waken up by a simple program change. NULL TERMINATION ERROR
From ISO/IEC 9899:1999 The strncpy function char *strncpy(char * restrict s1, const char * restrict s2, size_t n); copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. 260) 260) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-terminated.
Strings String Truncation Functions that restrict the number of bytes are often recommended to mitigate against buffer overflow vulnerabilities strncpy() instead of strcpy() fgets() instead of gets() snprintf() instead of sprintf() Strings that exceed the specified limits are truncated Truncation results in a loss of data, and in some cases, to software vulnerabilities
Strings: Off-by-One Errors 1. int main(int argc, char* argv[]) { 2. char source[10]; 3. strcpy(source, " "); 4. char *dest = (char *)malloc(strlen(source)); 5. for (int i=1; i <= 11; i++) { 6. dest[i] = source[i]; 7. } 8. dest[i] = '\0'; 9. printf("dest = %s", dest); 10. } source is 10 B long, but gets 10 characters. Value returned by strlen does not take zero byte into account, hence, dest is too small. for loop variable starts with 1, but string indices start with 0 for loop stop condition is off. Assignment (8) is out-of-bound write.
Strings String Errors without Functions Since C-style strings are character arrays, it is possible to perform insecure string manipulations without explicitly calling any “dangerous” functions, such as strcpy(), strcat(), gets(), streadd(), strecpy(), … #include int main(int argc, char *argv[]) { int i = 0; char buff[128]; char *arg1 = argv[1]; while (arg1[i] != '\0' ) { buff[i] = arg1[i]; i++; } buff[i] = '\0'; printf("buff = %s\n", buff); }
Strings Improper Data Sanitization A much bigger problem, but here is a simple example: An application inputs an address from a user and writes the address to a buffer [Viega 03] sprintf(buffer, "/bin/mail %s < /tmp/ ", addr ); The buffer is then executed using the system() call. The risk is, of course, that the user enters the following string as an address: cat /etc/passwd | mail
Strings
Commercial Message Learn how to find cyber-crime Find out what Law & Order does not show. Learn about exploits. TAKE COEN 252
Strings A buffer overflow occurs when data is written outside of the boundaries of the memory allocated to a particular data structure. Source Memory Allocated Memory (8 Bytes) 11 Bytes of Data Copy Operation Other Memory
Strings Buffer overflow occur because we usually do not check bounds. Standard library functions do not check bounds. Programmers do not check bounds. Not all buffer overflows are exploitable.
Strings Process Memory Organization Code or Text: Instructions and read only data Data: Initialized data, uninitialized data, static variables, global variables Heap: Dynamically allocated variables Stack: Local variables, return addresses, etc.
Strings: Stack Management When calling a subroutine / function: Stack stores the return address Stack stores arguments, return values Stack stores variables local to the subroutine Information pushed on the stack for a subroutine call is called a frame. Address of frame is stored in the frame or base point register. epb on Intel architectures
Strings: Stack Management #include bool IsPasswordOkay(void) { char Password[8]; gets(Password); if (!strcmp(Password, “badprog")) return(true); else return(false); } void main() { bool PwStatus; puts("Enter password:"); PwStatus = IsPasswordOkay(); if (PwStatus == false){ puts("Access denied"); exit(-1); } else puts("Access granted"); }
Strings: Stack Management Storage for PwStatus (4 bytes) Caller EBP – Frame Ptr OS (4 bytes) Return Addr of main – OS (4 Bytes) … Program stack before call to IsPasswordOkay() puts("Enter Password:"); PwStatus=ISPasswordOkay(); if (PwStatus==true) puts("Hello, Master"); else puts("Access denied"); Stack
Strings: Stack Management Storage for Password (8 Bytes) Caller EBP – Frame Ptr main (4 bytes) Return Addr Caller – main (4 Bytes) Storage for PwStatus (4 bytes) Caller EBP – Frame Ptr OS (4 bytes) Return Addr of main – OS (4 Bytes) … Program stack during call to IsPasswordOkay() puts("Enter Password:"); PwStatus=ISPasswordOkay(); if (PwStatus ==true) puts("Hello, Master"); else puts("Access denied"); bool IsPasswordOkay(void) { char Password[8]; gets(Password); if (!strcmp(Password,"badprog")) return(true); else return(false) } Stack
Strings: Stack Management Program stack after call to IsPasswordOkay() puts("Enter Password:"); PwStatus=ISPasswordOkay(); if (PwStatus ==true) puts("Hello, Master"); else puts("Access denied"); Storage for Password (8 Bytes) Caller EBP – Frame Ptr main (4 bytes) Return Addr Caller – main (4 Bytes) Storage for PwStatus (4 bytes) Caller EBP – Frame Ptr OS (4 bytes) Return Addr of main – OS (4 Bytes) … Stack
Strings: Buffer Overflow Example What happens if we enter more than 7 characters of an input string? #include bool IsPasswordOkay(void) { char Password[8]; gets(Password); if (!strcmp(Password, “badprog")) return(true); else return(false); } void main() { bool PwStatus; puts("Enter password:"); PwStatus = IsPasswordOkay(); if (PwStatus == false){ puts("Access denied"); exit(-1); } else puts("Access granted"); }
Strings Buffer Overflow Example bool IsPasswordOkay(void) { char Password[8]; gets(Password); if (!strcmp(Password,"badprog")) return(true); else return(false) } Storage for Password (8 Bytes) “ ” Caller EBP – Frame Ptr main (4 bytes) “9012” Return Addr Caller – main (4 Bytes) “3456” Storage for PwStatus (4 bytes) “7890” Caller EBP – Frame Ptr OS (4 bytes) “\0” Return Addr of main – OS (4 Bytes) … Stack The return address and other data on the stack is over written because the memory space allocated for the password can only hold a maximum 7 character plus the NULL terminator.
Strings: Buffer Overflow Example A specially crafted string “abcdefghijklW ► *!” produced the following result:
Strings: Buffer Overflow Example The string “abcdefghijklW ► *!” overwrote 9 extra bytes of memory on the stack changing the callers return address thus skipping the execution of line 3 Storage for Password (8 Bytes) “abcdefgh” Caller EBP – Frame Ptr main (4 bytes) “ijkl” Return Addr Caller – main (4 Bytes) “W ► *!” (return to line 4 was line 3) Storage for PwStatus (4 bytes) “/0” Caller EBP – Frame Ptr OS (4 bytes) Return Addr of main – OS (4 Bytes) Stack LineStatement 1puts("Enter Password:"); 2PwStatus=ISPasswordOkay (); 3if (PwStatus ==true) 4puts("Hello, Master"); 5else puts("Access denied");
Exploitation of Buffer Overflows A buffer overflow can be exploited by Changing the return address in order to change the program flow (arc-injection) Change the return address to point into the buffer where it contains some malicious code (Code injection)
Exploitation of Buffer Overflows The get password program can be exploited to execute arbitrary code by providing the following binary data file as input: " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0-0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF-BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ This exploit is specific to Red Hat Linux 9.0 and GCC
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The first 16 bytes of binary data fill the allocated storage space for the password. NOTE: Even though the program only allocated 12 bytes for the password, the version of the gcc compiler used allocates stack data in multiples of 16 bytes
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The next 12 bytes of binary data fill the extra storage space that was created by the compiler to keep the stack aligned on a16-byte boundary.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The next 12 bytes of binary data fill the extra storage space that was created by the compiler to keep the stack aligned on a16-byte boundary.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The next 4 bytes overwrite the return address. The new return address is 0X BF FF F9 E0 (little- endian)
Exploitation of Buffer Overflows
" " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code. Purpose of malicious code is to call execve with a user provided set of parameters. In this program, instead of spawning a shell, we just call the linux calculator program.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code: xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff #set to NULL word Create a zero value and use it to NULL terminate the argument list. This is necessary to terminate the argument list.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code: xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff #set to NULL word mov $0xb,%al #set code for execve Set the value of register al to 0xb. This value indicates a system call to execve.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code: mov $0xb,%al #set code for execve mov $0xbffffa03,%ebx #ptr to arg 1 mov $0xbffff9fb,%ecx #ptr to arg 2 mov 0xbffff9ff,%edx #ptr to arg 3 This puts the pointers to the arguments into ebc, ecx, and edx registers.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code: mov $0xbffffa03,%ebx #ptr to arg 1 mov $0xbffff9fb,%ecx #ptr to arg 2 mov 0xbffff9ff,%edx #ptr to arg 3 int $80 # make system call to execve Now make the system call to execve. The arguments are in the registers.
Exploitation of Buffer Overflows " " E0 F9 FF BF " a· +" C0 A3 FF F9 FF BF B0 0B BB 03 FA FF BF B9 FB "1+ú · +¦+· +¦v" 030 F9 FF BF 8B 15 FF F9 FF BF CD 80 FF F9 FF BF 31 "· +ï§ · +-Ç · +1" F F E 2F C 0A "111/usr/bin/cal “ The malicious code: Last part are the arguments.
Exploitation of Buffer Overflows./BufferOverflow < exploit.bin now executes /usr/bin/cal\0.
Exploitation of Buffer Overflows
#include int get_buff(char *user_input) { char buff[4]; memcpy(buff, user_input, sizeof(user_input)); return 0; } int main(int argc, char *argv[]) { get_buff(argv[1]); return 0; }
Exploitation of Buffer Overflows ebp (main) return addr(main) buff[4] esp ebp stack frame main ebp (frame 2) f() eip (leave/ret) f() argptr " f() arg data " ebp (frame 3) g() eip (leave/ret) g() argptr " g() arg data " ebp (orig) return addr(main) buff[4] Frame 1 Frame 2 Orig frame esp ebp Return address has been replaced with address of f() Stack before and after executing get_buff(argv[1]) with attacker provided string
Exploitation of Buffer Overflows mov esp, ebp pop ebp ret Frame pointer (now pointing to Frame 2) is moved into the stack pointer. Control is returned to the address on the stack, which has been overwritten with the address of the arbitrary function f() Exploited function get_buf returns
Exploitation of Buffer Overflows ebp (frame 2) f() eip (leave/ret) f() argptr " f() arg data " ebp (frame 3) g() eip (leave/ret) g() argptr " g() arg data " ebp (orig) return addr(main) buff[4] Frame 1 Frame 2 Orig frame esp ebp When f() returns, it pops the stored eip off the stack and transfers control to this address.
ebp (main) return addr(main) buff[4] esp ebp stack frame main mov esp, ebp pop ebp ret leave ret -or- leave/ret ebp (frame 2) f() eip (leave/ret) f() argptr " f() arg data " ebp (frame 3) g() eip (leave/ret) g() argptr " g() arg data " ebp (frame 4) h() eip (leave/ret) h() argptr " h() arg data " ebp (orig) return addr(main) buff[4] Frame 1 Frame 2 Frame 3 Orig frame esp ebp
Exploitation of Buffer Overflows Result: Control is returned to the address of an arbitrary function f(). This function is provided with arguments installed on the stack. Attacker could have added additional function calls or attacker could have returned to the main function for continuation of the program. For example, attacker could first call setuid(), then call system()
Exploitation of Buffer Overflows These are not the only exploit strategies.
Mitigation Strategies Include strategies designed to prevent buffer overflows from occurring detect buffer overflows and securely recover without allowing the failure to be exploited Prevention strategies can statically allocate space dynamically allocate space
Mitigation Strategies Statically Allocated Space Impossible to add data after buffer is filled. Because the static approach discards excess data, actual program data can be lost. Consequently, the resulting string must be fully validated. #include int myfunc(const char *arg) { char buff[100]; if (strlen(arg) >= sizeof(buff)) { abort(); } int main(char * argv[]) { myfunc(argv[1]); return 0; } Validating Input
Mitigation Strategies Statically Allocated Space Never use gets() Impossible to tell how many characters gets () will read. Use fgets() instead. fgets() has two arguments: number of characters to be read (including terminating zero) input stream However, fgets() retains the newline character. fgets() allows to read partial lines, but we can check for the newline character at the end. Buffer-overflow still possible if we specify more characters than the buffer contains.
Mitigation Strategies Statically Allocated Space Never use gets() gets_s() [ISO/IEC TR 24731] more secure replacement for gets() Reads only from stream pointed to by stdin Has one rsize_t -argument that specifies maximum size. Returns pointer to character array if successful. Else, returns NULL pointer.
Mitigation Strategies Statically Allocated Space #include #define BUFFSIZE 8 int _tmain(int argc, _TCHAR* argv[]) { char buff[BUFFSIZE]; gets(buff); printf("gets: %s.\n", buff); if (fgets(buff, BUFFSIZE, stdin) == NULL) { printf("read error.\n"); abort(); } printf("fgets: %s.\n", buff); if (gets_s(buff, BUFFSIZE) == NULL) { printf("invalid input.\n); abort(); } printf("gets_s: %s.\n", buff); return 0; }
Mitigation Strategies Statically Allocated Space Work by the international standardization working group for the programming language C (ISO/IEC JTC1 SC22 WG14) ISO/IEC TR defines less error-prone versions of C standard functions strcpy_s() instead of strcpy() strcat_s() instead of strcat() strncpy_s() instead of strncpy() strncat_s() instead of strncat()
Mitigation Strategies Statically Allocated Space ISO/IEC TR Mitigate against Buffer overrun attacks Default protections associated with program-created file Do not produce unterminated strings Do not unexpectedly truncate strings Preserve the null terminated string data type Support compile-time checking Make failures obvious Have a uniform pattern for the function parameters and return type
Mitigation Strategies Statically Allocated Space ISO/IEC TR The strcpy_s() functionhas the following signature: errno_t strcpy_s( char * restrict s1, rsize_t s1max, const char * restrict s2); Similar to strcpy() but has an extra argument of type rsize_t that specifies the maximum length of the destination buffer.
Mitigation Strategies Statically Allocated Space ISO/IEC TR strcpy_s() Copies characters from a source string to a destination character array up to and including the terminating null character. Only succeeds when the source string can be fully copied to the destination without overflowing the destination buffer. A constraint violation occurs when source and destination pointers are null. max length of the destination buffer is: equal to zero greater than RSIZE_MAX less than or equal to the length of the source string. When a constraint violation is detected: the destination string is set to the null string. the function returns a non-zero value.
Mitigation Strategies Statically Allocated Space ISO/IEC TR Already available in Microsoft Visual C Functions are still capable of overflowing a buffer if the maximum length of the destination buffer is incorrectly specified The ISO/IEC TR functions are not “fool proof” useful in preventive maintenance legacy system modernization
Mitigation Strategies Statically Allocated Space Strsafe.h Microsoft set of string handling functions. Guarantees that all strings are null-terminated. Writes do not occur past end of destination buffer. Programmer still has to input the actual start address. uses the correct length.
Mitigation Strategies Statically Allocated Space strlcpy, strlcat Miller and de Raadt, Usenix 99 for OpenBSD, later FreeBSD, Solaris, Mac OS X size_t strlcpy(char * destination, const char * source, size_t size); A string (of non-zero length) copied by strlcpy is always nul-terminated. The function takes the length of the destination, as a parameter, avoiding buffer overflows where a source string is bigger than a destination. strlcpy() returns the length of the source string, which can be compared to size to check for truncation.
Mitigation Strategies Dynamically Allocated Space Dynamically allocated buffers dynamically resize as additional memory is required. Dynamic approaches do not discard excess data. Inputs can exhaust memory on a machine consequently be used in denial-of-service attacks
Mitigation Strategies Dynamically Allocated Space SafeStr Matt Messier and John Viega Uses XXL for message handling. Provides API with dynamic strings Buffer overflows should not be possible Format string problems should be impossible The API should be capable of tracking whether strings are "trusted", a la Perl's taint mode
Mitigation Strategies Dynamically Allocated Space Vstr Implements dynamic strings. Simple access to strings via readv() / writev() Many additional functions printf like function splitting of strings into parameter/record chunks (a la perl). substituting data in a Vstr string moving data from one Vstr string to another (or within a Vstr string). comparing strings (without regard for case, or taking into account version information) searching for data in strings (with or without regard for case). counting spans of data in a string (the equivalent of strspn() in ISO C). parsing data from a Vstr string (Ie. numbers, or ipv4 addresses).
Detection & Recovery Compiler generated runtime checks: Visual C++ provides native runtime checks for common runtime errors: stack pointer corruption overrun of local arryas.
Detection & Recovery Nonexecutable stack Prevent executable code from running in the stack segment. This prevents only one type of buffer overflow exploits. Arc injection, heap buffer overflow etc. still work. Can have poorer performance. Can break legacy code.
Detection & Recovery Stackgap Since many stack-based buffer overflow exploits need to know the location of the stack in memory: Stackgap introduces a randomly sized gap before allocating local memory variables on the stack. No performance penalty, though poorer memory utilization. Makes buffer overflow exploits harder, but not impossible.
Detection & Recovery Runtime Bound Checkers Idea: Retool the compiler to do bounds checking as in Java. Problem: Performance can be horrible. Catches almost all, but not all out of bounds data accesses.
Detection & Recovery Canaries Protect the return address with a canary. A buffer overflow will kill the canary. If the canary is dead, stop program execution. Implemented in StackGuard, ProPolice, Visual C++.NET