String Representation in C

Slides:



Advertisements
Similar presentations
Introduction to C Programming
Advertisements

 2003 Prentice Hall, Inc. All rights reserved Fundamentals of Characters and Strings Character constant –Integer value represented as character.
Strings.
1 Chapter 10 Strings and Pointers. 2 Introduction  String Constant  Example: printf(“Hello”); “Hello” : a string constant oA string constant is a series.
Lecture 09 Strings, IDEs. METU Dept. of Computer Eng. Summer 2002 Ceng230 - Section 01 Introduction To C Programming by Ahmet Sacan Mon July 29, 2002.
Lecture 9. Lecture 9: Outline Strings [Kochan, chap. 10] –Character Arrays/ Character Strings –Initializing Character Strings. The null string. –Escape.
Lecture 20 Arrays and Strings
What is a pointer? First of all, it is a variable, just like other variables you studied So it has type, storage etc. Difference: it can only store the.
Current Assignments Homework 5 will be available tomorrow and is due on Sunday. Arrays and Pointers Project 2 due tonight by midnight. Exam 2 on Monday.
C strings (Reek, Ch. 9) 1CS 3090: Safety Critical Programming in C.
C-Strings Joe Meehean. C-style Strings String literals (e.g., “foo”) in C++ are stored as const char[] C-style strings characters (e.g., ‘f’) are stored.
University of Washington CSE 351 : The Hardware/Software Interface Section 5 Structs as parameters, buffer overflows, and lab 3.
Strings in C. Strings are Character Arrays Strings in C are simply arrays of characters. – Example:char s [10]; This is a ten (10) element array that.
Introduction to C programming
Chapter 6 Buffer Overflow. Buffer Overflow occurs when the program overwrites data outside the bounds of allocated memory It was one of the first exploited.
Chapter 9 Character Strings 9.1 Character String Constants A character string constant is a sequence of characters enclosed in double quotation mark. Examples.
CSC141- Introduction to Computer programming Teacher: AHMED MUMTAZ MUSTEHSAN Lecture – 21 Thanks for Lecture Slides:
13. Strings. String Literals String literals are enclosed in double quotes: "Put a disk in drive A, then press any key to continue\n“ A string literal.
 2003 Prentice Hall, Inc. All rights reserved. 11 Fundamentals of Characters and Strings Character constant –Integer value of a character –Single quotes.
C++ Programming Lecture 19 Strings The Hashemite University Computer Engineering Department (Adapted from the textbook slides)
Strings. String Literals String literals are enclosed in double quotes: "Put a disk in drive A, then press any key to continue\n“ A string literal may.
CSE 251 Dr. Charles B. Owen Programming in C1 Strings and File I/O.
13. Strings. String Literals String literals are enclosed in double quotes: "Put a disk in drive A, then press any key to continue\n“ A string literal.
Principles of Programming - NI Chapter 10: Character & String : In this chapter, you’ll learn about; Fundamentals of Strings and Characters The difference.
String in C++. 2 Using Strings in C++ Programs String library or provides functions to: - manipulate strings - compare strings - search strings ASCII.
EC-111 Algorithms & Computing Lecture #10 Instructor: Jahan Zeb Department of Computer Engineering (DCE) College of E&ME NUST.
Dynamic Allocation in C
Buffer Overflow By Collin Donaldson.
Introduction Programs which manipulate character data don’t usually just deal with single characters, but instead with collections of them (e.g. words,
ECE Application Programming
Protecting Memory What is there to protect in memory?
C Characters and Strings
Strings CSCI 112: Programming in C.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
INC 161 , CPE 100 Computer Programming
Pointers & Arrays 1-d arrays & pointers 2-d arrays & pointers.
ECE Application Programming
Strings (Continued) Chapter 13
Characters and Strings
C Characters and Strings
Fundamentals of Characters and Strings
Lesson One – Creating a thread
Chapter 6: Data Types Lectures # 10.
Lecture 8 String 1. Concept of strings String and pointers
C Programming Tutorial – Part I
C Basics.
Module 2 Arrays and strings – example programs.
Strings A string is a sequence of characters treated as a group
Computer Science 210 Computer Organization
Arrays in C.
Character Strings Lesson Outline
CSCI206 - Computer Organization & Programming
Computer Science 210 Computer Organization
CS111 Computer Programming
C Stuff CS 2308.
Strings.
INC 161 , CPE 100 Computer Programming
Pointers and Pointer-Based Strings
String in C++.
C-strings In general, a string is a series of characters treated as a unit. Practically all string implementations treat a string as a variable-length.
Strings.
Strings Dr. Soha S. Zaghloul updated by Rasha ALEidan
EECE.2160 ECE Application Programming
C Strings Prabhat Kumar Padhy
EECE.2160 ECE Application Programming
C++ Programming Lecture 20 Strings
Strings #include <stdio.h>
EECE.2160 ECE Application Programming
EECE.2160 ECE Application Programming
Memory, Arrays & Pointers
Presentation transcript:

String Representation in C There is no special type for (character) strings in C; rather, char arrays are used. char Word[7] = "foobar"; Word[0] Word[1] Word[2] Word[3] Word[4] Word[5] Word[6] 'f' 'o' 'b' 'a' 'r' '\0' C treats char arrays as a special case in a number of ways. If storing a character string (to use as a unit), you must ensure that a special character, the string terminator '\0' is stored in the first unused cell. Failure to understand and abide by this is a frequent source of errors.

Some C String Library Functions The C Standard Library includes a number of functions that support operations on memory and strings, including: Copying: size_t memcpy(void* restrict s1, const void* restrict s2, size_t n); Copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1. char* strcpy(char* restrict s1, const char* restrict s2); Copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1. string.h

C String Library Hazards The memcpy() and strcpy() functions illustrate classic hazards of the C library. If the target of the parameter s1 to memcpy() is smaller than n bytes, then memcpy() will attempt to write data past the end of the target, likely resulting in a logic error and possibly a runtime error. A similar issue arises with the target of s2. The same issue arises with strcpy(), but strcpy() doesn't even take a parameter specifying the maximum number of bytes to be copied, so there is no way for strcpy() to even attempt to enforce any safety measures. Worse, if the target of the parameter s1 to strcpy() is not properly 0-terminated, then the strcpy() function will continue copying until a 0-byte is encountered, or until a runtime error occurs. Either way, the effect will not be good.

Safer Copying For safer copying: char* strncpy(char* restrict s1,const char* restrict s2, size_t n); Copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written. Returns the value of s1. (Of course, this raises the hazard of an unreported truncation if s2 contains more than n characters that were to be copied to s1, and null termination of the destination is not guaranteed in that case.)

Another C String Library Function Length: size_t strlen(const char* s); Computes the length of the string pointed to by s. Returns the number of characters that precede the terminating null character. Hazard: if there's no terminating null character then strlen() will read until it encounters a null byte or a runtime error occurs.

More C String Library Functions Concatenation: char* strcat(char* restrict s1, const char* restrict s2); Appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1. char* strncat(char* restrict s1, const char* restrict s2, size_t n); Appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. A terminating null character is always appended to the result.

More C String Library Functions Comparison: int strcmp(const char* s1, const char* s2); Compares the string pointed to by s1 to the string pointed to by s2. The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2. int strncmp(const char* s1, const char* s2, size_t n); Compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2. The strncmp function returns an integer greater than, equal to, or less than zero, accordingly as the possibly null-terminated array pointed to by s1 is greater than, equal to, or less than the possibly null-terminated array pointed to by s2.

Bad strcpy()! Ubu > gcc -o str03 -m32 -std=c99 -Wall str03.c Ubu > str03 s1: & R: the C Programming Language s2: K & R: the C Programming Language #include <stdio.h> #include <stdlib.h> #include <string.h> int main() { char s1[] = "K & R: the C Programming Language"; char s2[1]; strcpy(s2, s1); // s2 is too small! printf("s1: %s\n", s1); printf("s2: %s\n", s2); return 0; }

x86-64 Aside BTW, here's what happened when the same code was compiled for a 64-bit target: Ubu > gcc -o str03_64 -std=c99 -Wall str03.c Ubu > str03_64 s1: & R: the C Prrogramming Language s2: K& R: the C Prrogramming Language There are no profound lessons here, but note that the behavior is interestingly different. When you're debugging, it may be useful to know whether you have a binary for a 32- or 64-bit environment.

ISO Standard Alternative Some alternatives have been proposed, with alleged safety advantages: errno_t strcpy_s(char* restrict s1, rsize_t s1max, const char* restrict s2); Runtime constraints: Neither s1 nor s2 shall be a null pointer. s1max shall not be greater than RSIZE_MAX. s1max shall not equal zero. s1max shall be greater than strnlen_s(s2, s1max). Copying shall not take place between objects that overlap. If there is a runtime-constraint violation, then if s1 is not a null pointer and s1max is greater than zero and not greater than RSIZE_MAX, then strcpy_s sets s1[0] to the null character. This is not supported by glibc, nor is it likely to be in the future.

OpenBSD Alternative The OpenBSD movement proposed another alternative: size_t strlcpy(char* s1, const char* s2, size_t n); Similar to strncpy() but truncates, if necessary, to ensure null-termination of the copy. This is not supported by glibc, nor is it likely to be in the future.

The Devil's Function The C language included the regrettable function: char* gets(char* s); The intent was to provide a method for reading character data from standard input to a char array. The obvious flaw is the omission of any indication to gets() as to the size of the buffer pointed to by the parameter s. Imagine what might happen if the buffer was far too small. Imagine what might happen if the buffer was on the stack. The function is officially deprecated, but it is still provided by gcc and on Linux systems.

Some Historical Perspective There's an interesting recent column, by Poul-Henning Kamp, on the costs and consequences of the decision to use null-terminated arrays to represent strings in C (and other languages influenced by the design of C): . . . Should the C language represent strings as an address + length tuple or just as the address with a magic character (NUL) marking the end? This is a decision that the dynamic trio of Ken Thompson, Dennis Ritchie, and Brian Kernighan must have made one day in the early 1970s, and they had full freedom to choose either way. I have not found any record of the decision, which I admit is a weak point in its candidacy: I do not have proof that it was a conscious decision.

Some Historical Perspective As far as I can determine from my research, however, the address + length format was preferred by the majority of programming languages at the time, whereas the address + magic_marker format was used mostly in assembly programs. As the C language was a development from assembly to a portable high-level language, I have a hard time believing that Ken, Dennis, and Brian gave it no thought at all. Using an address + length format would cost one more byte of overhead than an address + magic_marker format, and their PDP computer had limited core memory. In other words, this could have been a perfectly typical and rational IT or CS decision, like the many similar decisions we all make every day; but this one had quite atypical economic consequences. . . . http://queue.acm.org/detail.cfm?id=2010365