String C and Data Structures Baojian Hua
What ’ s a String? A string is a sequence of characters: Every character ci (0 ≤ i<n) is taken from some character set (say the ASCII or the UniCode) Ex: “hello, world”, “string1\tstring2\n” Essentially, a string is a linear list But different operations
Isn ’ t String a char*? C ’ s convention for string representation C has no built-in string type Every string is a char array (char *) terminated with char ‘\0’ Operations (see the library ): char *strcpy (char *s, const char *ct); char *strcat (char *s, const char *ct); … Such operations are array-based and thus efficient
Problems with C String? Weakness of C ’ s “ char * ” string: Strings are unchangeable See demo of C’s “char *”… Why? Some strings can not even be represented Ex: “aaa\0bbb\0c” Why?
Problems with C String? Some operations are dangerous: Ex: strcpy (“ab”, “1234”) Notorious source of bugs it’s programmers’ duty to prevent these Some viruses take advantage this… Robert Morris’s worm in 1988 the world’s first wide-spread See demo for this…
“ string ” ADT We want an ADT “string”: hides the concrete representation of string offers more flexible operations and cures security problems But to be compatible with C, ‘\0’ is not allowed in string
Abstract Data Types in C: Interface // in file “str.h” #ifndef STR_H #define STR_H typedef struct strStruct *str; str newStr (char *s); int size (str s); int isEmpty (str s); int nth (str s, int n); str concat (str s1, str s2); str sub (str s, int i, int j); void append (str s1, str s2); #endif
Array-based Implementation // in file “str.c” #include “str.h” struct strStruct { char *s; int size; }; // What’s the difference // with extensible array? 0 n s size str
Operations: “ new ” str newStr (char *s) { int len = strlen (s); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((len+1) * sizeof(char)); while (*(p->s)++ = *s++); p->size = len; return p; }
Operations: “ new ” str newStr (char *s) { int len = strlen (s); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((len+1) * sizeof(char)); while (*(p->s)++ = *s++); p->size = len; return p; } p
Operations: “ new ” str newStr (char *s) { int len = strlen (s); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((len+1) * sizeof(char)); while (*(p->s)++ = *s++); p->size = len; return p; } 0 len p
Operations: “ new ” str newStr (char *s) { int len = strlen (s); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((len+1) * sizeof(char)); while (*(p->s)++ = *s++); p->size = len; return p; } 0 len p
Operations: “ new ” str newStr (char *s) { int len = strlen (s); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((len+1) * sizeof(char)); while (*(p->s)++ = *s++); p->size = len; return p; } 0 len s size p
Operations: “ size ” int size (str s) { return s->size; } 0 size s p
Operations: “ nth ” char nth (str s, int n) { if (n =s->size) error (“invalid index”); return (s->s)[n]; } 0 size s s
Operations: “ concat ” s size s1 s size s2 s size p
Operations: “ concat ” str concat (str s1, str s2) { int n1 = size (s1); int n2 = size (s2); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc ((n1+n2+1) *sizeof(char)); // str copy, leave to you …; p->size = n1+n2; return p; }
Operations: “ append ” s size s1 s size s2
Operations: “ append ” str concat (str s1, str s2) { int n1 = size (s1); int n2 = size (s2); char *t; t = (char *)malloc ((n1+n2+1) *sizeof(char)); // str copy, leave to you …; p->size = n1+n2; return p; }
Operations: “ sub ” s size s s q fromto
Operations: “ sub ” str sub (str s, int from, int to) { int n = size (s); if (from>to || from =n) error (“invalid index”); str p = (str)malloc (sizeof (*p)); p->s = (char *)malloc((to-from+2)*sizeof(char)); // str copy, leave to you …; p->size = to-from+1; return p; }
Summary The string representation discussed in this class is more functional than procedural functional: data never change, instead, we always make new data from older ones Java and ML also have functional strings In general, functional data structures are easy to implement, maintain and reason about and thus have much to be recommended