Presentation is loading. Please wait.

Presentation is loading. Please wait.

Character and String definitions, algorithms, library functions Characters and Strings.

Similar presentations


Presentation on theme: "Character and String definitions, algorithms, library functions Characters and Strings."— Presentation transcript:

1 Character and String definitions, algorithms, library functions Characters and Strings

2 Character and String Processing A common programming issue involves manipulation of text, usually referred to as string, or text, processing To achieve solutions typically requires capabilities to: perform input and output of characters and strings query what a single character is, or is not determine if a character, a substring, or any of a set of characters is included, or not, in a string determine the attributes of a character (eg. upper versus lower case) or string (eg. length) convert between character string and machine representations of different data types break large strings into smaller substrings recognized by tokens join substrings into larger strings (catenation)

3 Characters and Strings in C The concept of a string refers to a sequence of items. The sequence, or string, may contain zero or more elements, and a delimiter that denotes the end (termination) of the string. A string of characters, in computer science terms, usually refers to a vector, or list, of char values ASCII is commonly used UniCode is another In the C language, the special delimiter character ‘\0’ (called character null) is recognized by the compiler and assigned a specific integer value Strings of bits (or other encoded symbols) provides abstraction possibilities for more general strings.

4 Fundamentals Defining a string container Example: #define STRLEN 256 char strName [ STRLEN ] ; Example: char strName [ ] ; char * strPtr ; Initialization Example: char strName1 [ ] = “My name is Bob!” ; const char * strStatic = “String that cannot be changed!” ; char strName2 [ ] = { ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ } ; Example: char strName [ 50 ] ; int k ; for( k=0; k<49; k++ ) strName[k] = ‘#’ ; // Fill with # symbols strName[49] = ‘\0’ ; Consider a variation of the second example, using pointers: char strName [ 50 ], * strPtr ; int k ; for( k=0, strPtr = strName ; k<49; k++, strPtr++ ) *strPtr = ‘#’ ; *strPtr = ‘\0’ ; H le ol \0 Sequence of characters (value of the string) Delimiter (character null, terminal) String length

5 Character Handling Library The C language standard supports the notion of char data type, and the delimiter character code ‘\0’. We do not need to know the details of how character data is represented in bit form In programming and algorithm design it is useful to know and use a wide variety of functions that query or manipulate (transform) both individual character data as well as strings of characters We will discuss functions from four libraries #include #include and #include #include We start with the character function library,

6 Character Handling Library Begin with character query functions General prototype form: int isValueRange ( int c ) ; // Returns 1 if a match, or 0 ValueRange refers to a single value or a range of values Note that the input argument c has the date type int Intuition would suggest c should be char type Technical considerations (involving representation of non- ASCII data encodings) recommend for using int, recalling that char is a compatible sub-type of int (and short int). Function PrototypeFunction Description int isblank( int c ); Returns a positive value if c is ‘ ‘ ; otherwise 0 int isdigit( int c );Returns a positive value if c is a base-10 digit in the range ‘0‘ to ‘9’ ; otherwise 0 int isalpha( int c );Returns a positive value if c is an alphabetic character in the range ‘a‘ to ‘z’, or ‘A‘ to ‘Z’ ; otherwise 0 int isalnum( int c );Returns a positive value if c is an alphabetic character, or a base-10 digit ; otherwise 0 int isxdigit( int c );Returns a positive value if c is a base-16 (hexadecimal) digit in the range ‘0‘ to ‘9’, or ‘a’ to ‘f’, or ‘A’ to ‘F’ ; otherwise 0

7 Character Handling Library Additional query functions provide information about the nature of the character data Transformative functions modify the character data Function PrototypeFunction Description int islower( int c );Returns a positive value if c is a lower case alphabetic character in the range ‘a‘ to ‘z’ ; otherwise 0 int isuppper( int c );Returns a positive value if c is an upper case alphabetic character in the range ‘A‘ to ‘Z’ ; otherwise 0 int tolower( int c );Returns the value c if c is a lower case alphabetic character, or the upper case variant of the same alphabetic character (Ex. tolower( ‘A’ ) returns ‘a’) int toupper( int c );Returns the value c if c is an upper case alphabetic character, or the lower case variant of the same alphabetic character (Ex. toupper( ‘e’ ) returns ‘E’)

8 Character Handling Library And still more query functions for non-alphanumeric character data (eg. graphical, control signals, punctuation) Function PrototypeFunction Description int isspace( int c );Returns >0 if c is any valid white space character data (including blank, newline, tab, etc); otherwise 0 int iscntrl( int c );Returns >0 if c is any valid control character data (including ‘\n’, ‘\b’, ‘\r’, ‘\a’ etc); otherwise 0 int ispunct( int c );Returns >0 if c is any valid, printable punctuation character data (including ‘,’, ‘.’, ‘;’, ‘:’ etc.); otherwise 0 int isprint( int c );Returns >0 if c is any valid, printable character data; otherwise 0 int isgraph( int c );Returns >0 if c is any valid character data representing a graphical symbol (such as ‘ ’, ‘#’, ‘$’ etc, and including extensions to ASCII); otherwise 0

9 Example: Counting characters Problem: Determine the frequencies of occurrence for each alphabetic character (ignoring case) in a text file. Solution: #include #include int main ( ) { int N=0, K, C[26] ; double F[26] ; char Ch ; for( K=0; K<26; K++ ) { C[K]=0; F[K]=0.0; } for( Ch=getchar(); Ch != EOF; N++, Ch=getchar() ) { if( isalpha( Ch ) ) { K = toupper( Ch ) – ‘A’ ; C[K]++ ; } for( K=0; K<26; K++) { F[K] = C[K] * 1.0 / N ; printf( “Frequency of letter %c: %lf\n”, (char) (K+’A’), F[K] ) ; } return 0 ; }

10 String Conversion Functions: Purpose of these functions is to convert a string (or portion) to (1) an integer or (2) a floating point type General prototype form: resultType strtoOutputType ( const char * nPtr, char **endPtr [, int base ] ) ; nPtr points at the input string (protected as constant) doublelong int resultType refers to one of double, long int, or unsigned long int dl OutputType refers to one of d, l, or ul base refers to the base of the input string (0, or 2..36) endPtr points at the position within the input string where a valid numeric representation terminates - 1 2 3. 8 9 5 $ b C \0 nPtr endPtr

11 String Conversion Functions Function PrototypeFunction Description double strtod( const char * nPtr, char **endPtr ); If nPtr points at a valid string representation of a signed real number (possibly followed by additional character data), return a double value; Else return 0 if no part of the input string can be converted. Return a pointer (through *endPtr) to the character following the last convertible character – if no part of the input string is convertible then *endPtr is set to nPtr. - 1 2 3. 8 9 5 $ b C \0 nPtr endPtr Example usage: double D ; const char * S = “ -123.895 $bC” ; char * EP ; D = strtod( S, &EP ) ; if( EP != S ) printf( “Value converted is %lf\n”, D ) ; else printf( “No value could be converted\n” ) ; Note that one can also determine the size of the initial substring used to determine the double value returned, namely: int NumChars ; NumChars = ( EP – S ) / sizeof( char ) ; // sizeof(char) usually 1

12 String Conversion Functions Function PrototypeFunction Description long strtol( const char * nPtr, char **endPtr, int base ); If nPtr points at a valid string representation of a signed integer number (possibly followed by additional character data), return a long int value; Else return 0 if no part of the input string can be converted. Return a pointer (through *endPtr) to the character following the last convertible character – if no part of the input string is convertible then *endPtr is set to nPtr. The input string may use any base digits in the range 0 to (base-1). unsigned long strtoul( const char * nPtr, char **endPtr, int base ); Performs analogously to strtol() for string to unsigned long int conversion. - 1 2 3 4. $ b C \0 nPtr endPtr long int LI ; const char * S = “ -1234.$bC” ; char * EP ; LI = strtol( S, &EP, 0 ) ; // 0 base => base = 8, 10, 16 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ;

13 String Conversion Functions The base argument value (for integer conversions only!) defines the base of the input string. For base=0, the input string digits may be in base 8, 10 or 16. The case base=1 is not used. For 2 <= base <= 36 the characters that are interpretable as base digits lie in the range from 0 to (base-1) BaseBase digits (upper or lower case alpha chars) 00, 1, …, F 20, 1 100, 1, 2, …, 9 130, 1, …, 9, A, B, C 240, 1, …, 9, A, B, …, N 360, 1, …, 9, A, B, …, Z long int LI ; const char * S = “ –Ab2$” ; char * EP ; LI = strtol( S, &EP, 13 ) ; // base = 13 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ; // Value outputted is the negative of: // A*13*13 + b*13 + 2 = 1690+143+2 = 1835 (base-10)

14 String Conversion Functions The C standard utilities library also includes two additional conversion functions for long long int, both signed and unsigned. Function PrototypeFunction Description long long strtoll( const char * nPtr, char **endPtr, int base ); Performs analogously to strtol() for string to long long int conversion, with identical treatment of non-convertible strings, treatment of *endPtr and base. unsigned long long strtoull( const char * nPtr, char **endPtr, int base ); Performs analogously to strtoul() for string to unsigned long long int conversion, with identical treatment of non-convertible strings, treatment of *endPtr and base.

15 Useful Functions The C standard input/output library contains useful functions I/O of characters and strings Conversion to and from character and internal data representations

16 Useful Functions Function Prototype and Description int getchar( void ); Fetches and returns a single character from the input stream (stdin); if end of file is signalled then the return value is EOF int putchar( int C ); Outputs a single character to the output stream (stdout). Returns the same character if successful; otherwise returns EOF on failure char * fgets( char * S, int N, FILE * stream); Fetches all characters up to either (a) a new line ‘\n’, or (b) EOF, or (c) N-1 characters have been inputted, and then appends a delimiter ‘\0’ to make a string. The pointer S points to the inputted string. Input is from the input stream (typically stdin, but can be from a text file). Returns a pointer to the input string, or NULL if failure occurs (as with EOF). int puts( const char * S ); Outputs the string of characters S, followed by a newline ‘\n’. Returns a non- zero integer result (typically the number of characters outputted), or EOF on failure. #include int main () { int C ; // can also use char while( (C = getchar() ) != EOF && C != ‘\n’ ) putchar( C ) ; return 0 ; } CAUTION: When stdin is the keyboard, remember that pressing the Enter key to signal input generates a character and this must be accounted for. #include #define MAX 256 int main () { char S [ MAX ], * sPtr ; while( (sPtr = fgets( S, MAX, stdin )) != NULL ) puts( S ) ; return 0 ; }

17 Useful Functions The functions sprintf() and sscanf() are used for processing of character (string) data and machine representations of data (according to different data types). All data processing is done in RAM – no I/O is involved! Function Prototype and Description int sprintf( char * S, const char * format [, …] ); Used in the same way as printf(), except that the string of characters produced is directed to the string argument S, according to the format string (and referenced parameters). int sscanf( char * S, const char * format [, …] ); Used in the same way as scanf(), except that the string S contains the “input” data to be processed according to the format string (and referenced parameters). #include int main () { int A ; float X ; char S[100], M[100] ; char FormatStr[7] = “%d%f%s” ; scanf( FormatStr, &A, &X, S ); printf( FormatStr, A, X, S ) ; fgets( M, 100, stdin ); sscanf( M, FormatStr, &A, &X, S ); sprintf( M, FormatStr, A, X, S ); puts( M ); return 0 ; }

18 String Manipulation Functions Two functions are provided to perform copying of one string into another string. Function Prototype and Description char * strcpy( char * Dest, const char * Src); Copies the source string Src to the destination Dest. If Src is shorter, or equal in length, to Dest, the entire string is copied. If Src is longer than Dest, only those characters that will fit are copied – note that this may leave Dest without a delimiter ‘\0’ (which fails to define a proper string). char * strncpy( char * Dest, const char * Src, size_t N); Copies the first N characters of the source string Src to the destination Dest. If N is less than the length of Dest, the entire Src string is copied – if the length of Src is less than N then the entire Src string is copied and as many ‘\0’ as needed are inserted to fill up to N characters is performed.. If N is greater than the length of Dest, only those characters that will fit are copied – note that this may leave Dest without a delimiter ‘\0’ (which fails to define a proper string). Remember that strncpy() does not append the delimiter automatically!

19 String Manipulation Functions Joining together of two strings is called string catenation (also called concatenation). For instance, one might combine various words and phrases to form sentences and paragraphs. Function Prototype and Description char * strcat( char * S1, const char * S2); Copy string S2 to a position in S1, following the string already in S1. Note that the original ‘\0’ delimiter in S1 is overwritten by the first character in the S2 string, so that only one delimiter occurs at the end of the modified S1 string. If the total number of characters is greater than the capacity of S1 then a logical error will likely ensue. char * strncat( char * S1, const char * S2, size_t N); Copy the first N characters of the string S2 to a position in S1, following the string already in S1. The original ‘\0’ delimiter in S1 is overwritten by the first character in the S2 string, and only one delimiter occurs at the end of the modified S1 string inserted by strncat() automatically. If the total number of characters is greater than the capacity of S1 then a logical error will likely ensue.

20 String Comparison Functions Comparison of two strings is based on the notion of lexical ordering. All characters are encoded (eg. ASCII) and the numeric values of the characters defines the possible orderings. String comparisons are done based on both (a) character by character comparison, and (b) use of relative length of each string. Function Prototype and Description int strcmp( const char * S1, const char * S2); Compares strings S1 and S2. Returns 0 if S1 is fully equivalent to S2, a positive number if S1 is lexically greater than S2, and a negative number is S1 is lexically less than S2. int strncmp( const char * S1, const char * S2, size_t N); Compares up to the first N characters of the strings S1 and S2. Returns 0 if S1 is fully equivalent to S2, a positive number if S1 is lexically greater than S2, and a negative number is S1 is lexically less than S2. Note that if the length of either S1 or S2 is less than N, the comparison is done only for the characters present in each string.

21 Strings - Search Functions C provides functions for searching for various characters and substrings within a string This is a huge advantage in text processing Function Prototype and Description char * strchr( const char * S, int C); Locates the position in S of the first occurrence of C. Returns the pointer value to where C is first located; otherwise returns NULL. size_t strspn( const char * S1, const char * S2 ); String S1 is searched, and returns the length of the initial substring segment in S1 that contains characters only found in S2. size_t strcspn( const char * S1, const char * S2 ); String S1 is searched, and returns the length of the initial substring segment in S1 that contains characters not found in S2.

22 Strings - Search Functions Function Prototype and Description char * strpbrk( const char * S1, const char * S2 ); Locates the first occurrence in S1 of any character found in S2, and returns a pointer to that position in S1. Otherwise a NULL value is returned. char * strrchr( const char * S1, int C ); Locates the last occurrence in S1 of any character found in S2, and returns a pointer to that position in S1. Otherwise a NULL value is returned. char * strstr( const char * S1, const char * S2 ); Locates the first occurrence in S1 of the entire string S2. Otherwise a NULL value is returned.

23 Strings - Search Functions Consider the problem of a string of text S1 that contains various words (substrings) separated by specially designated characters used as delimiters (and contained in a string S2). The objective is to extract the words from the text. This can be accomplished using the function strtok() repeatedly. Each identified substring in S1, delimited by a character in S2, is called a token. Thus, strtok() is called the string tokenizer function. Function Prototype and Description char * strtok( char * S1, const char * S2 ); The first call to strtok() states the argument S1 and provides the string of delimiters S2. Returns a pointer to the next token found in S1. Each subsequent call to strtok() uses NULL as the first argument (instead of the string S1), and the function remembers where it left off from the last time it was called. Each time strtok() is called, it points to the next token found and also replaces the delimiter character by ‘\0’. Thus, S1 is modified! Thus, a sequence of calls to strtok() breaks S1 into token substrings.

24 Strings - Search Functions #include int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by blanks.” ; char * tokenPtr ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, “ “ ) ; // First time use S; ‘ ‘ is the only delimiter while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, “ “ ) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }

25 Strings - Search Functions #include int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by various characters.” ; char * tokenPtr, * DelimList = “.,;:$“ ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, DelimList ) ; // First time use S; various delimiters while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, DelimList ) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }

26 Memory Functions in C also provides functions for dealing with blocks of data in RAM The blocks may be characters, or other data types, hence the functions typically return a void * pointer value. A void * pointer value can be assigned to any other pointer type, and vice versa. However, void * pointers cannot be dereferenced, thus the size of the block must be specified as an argument. None of the functions discussed perform checks for terminating null characters (delimiters). Function Prototype and Description void * memcpy( void * S1, const void * S2, size_t N ); Copies N characters (bytes) from the object S2 into the object S1. A pointer to the resulting object (S1) is returned, otherwise NULL is returned on failure. Note: The result of this function is undefined if S1 and S2 overlap! void * memmove( void * S1, const void * S2, size_t N ); Copies N characters (bytes) from the object S2 into the object S1. A pointer to the resulting object (S1) is returned, otherwise NULL is returned on failure. Note: This function utilizes a temporary memory space to perform the copying, hence the operation is always defined.

27 Memory Functions in Function Prototype and Description int memcmp( const void * S1, const void * S2, size_t N ); Compares the first N characters (bytes) of S1 and S2. Returns 0 if S1==S2, >0 if S1>S2, and <0 if S1<S2. void * memchr( const void * S1, int C, size_t N ); Locates the first occurrence of the character C in the first N characters (bytes) of S1. If C is found, a pointer to C in S1 is returned. Otherwise, NULL is returned. void * memset( void * S1, int C, size_t N ); Copies the character (byte) C to the first N positions of S1. A pointer to S1 is returned, or NULL on failure. Note: the type of C is modified to unsigned char to enable copying to blocks of arbitrary data type.

28 Other Functions in Function Prototype and Description size_t strlen( const char * S ); Determines and returns the number of characters in S, not including the ‘\0’ delimiter. char * strerror( int errornum ); Outputs to stdout an error message (defined by others as standard messages) referenced by an error number code. For instance, the statement printf( “%s\n”, strerror( 2 ) ) ; might generate the output string: No such file or directory

29 Secure C programming C11 standard with Annex K Addresses issues related to robustness of array based manipulation of character data (and other data containers) Stack overflow detection Array overflow detection Read more: CERT guideline INT05-C www.securecoding.cert.org Additional online Appendices E-H for the textbook www.pearsonhighered.com/deitel/

30 Summary Concepts of character and strings, query functions, transformation functions, search functions, generalization to abstract strings (memory functions).

31 Topic Summary Characters and Strings in the C language Multiple library sources Query functions Transformative functions Conversion functions Memory functions Reading – Chapter 8 Review Pointers as well, especially the const qualifier, and also the use of ** for modifying pointer values on return (through arguments) from functions. Reading – Chapter 9: Formatted Input and Output This chapter is straightforward and is assigned for self-directed independent study and learning – it will be tested! Practice, practice, practice !


Download ppt "Character and String definitions, algorithms, library functions Characters and Strings."

Similar presentations


Ads by Google