Text Manipulation Chapter 7 Attaway MATLAB 5E
Text Terminology Text in MATLAB can be represented using: character vectors (in single quotes) string arrays, introduced in R2016b (in double quotes) Many functions that manipulate text can be called using either a character vector or a string Additionally, new functions have been created for the string type Prior to R2016b, the word “string” was used for what are now called character vectors
Characters Both character vectors and string scalars are composed of groups of characters Characters include letters of the alphabet, digits, punctuation marks, white space, and control characters Control characters are characters that cannot be printed, but accomplish a task (such as a backspace or tab) White space characters include the space, tab, newline, and carriage return Leading blanks are blank spaces at the beginning Trailing blanks are blank spaces at the end Individual characters in single quotation marks are the type char (so, the class is char) Function newline returns a newline character
Character Vectors Character vectors: consist of any number of characters are contained in single quotation marks are displayed using single quotation marks are of the type, or class, char Examples: ‘x’, ‘ ’, ‘abc’, ‘hi there’ Since these are vectors of characters, many built-in functions and operators that we’ve seen already work with character vectors as well as numbers – e.g., length to get the length of a character vector, or the transpose operator You can also index into a character vector to get individual characters or to get subsets (called a substring for a character vector or a string)
String Scalars A string scalar is a single string, and is used to store a group of characters such as a word Strings can be created using double quotes, e.g. “Awesome” using the string function, e.g. string(‘Awesome’) Strings are displayed using double quotes The class of a string scalar is ‘string’
Dimensions of character vectors A single character is a 1 x 1 scalar >> letter = 'x'; >> size(letter) ans = 1 1 A character vector is 1 x n where n is the length >> myword = 'Hello'; >> size(myword) 1 5
Dimensions of strings A string is a 1 x 1 scalar >> mystr = "Awesome"; >> size(mystr) ans = 1 1 A character vector is contained in a string; curly braces can be used to index: >> mystr(1) "Awesome" >> mystr{1} 'Awesome' >> mystr{1}(1) 'A'
Length of strings We have seen that the length function can be used to find the number of characters in a character vector However, since a string is a 1 x 1 scalar, the length of a string is 1 Use the strlength function for strings: >> length("Ciao") ans = 1 >> strlength("Ciao") 4 Even an empty string is a 1 x 1 scalar, so the length is 1 (not 0)
Groups of strings Groups of strings can be stored in string arrays character matrices cell arrays (in Chapter 8) String arrays are very easy to use: >> greets = ["hi", "ciao"] greets = 1×2 string array "hi" "ciao" >> greets(1) ans = "hi"
Character matrices Creating a column vector of character vectors actually creates a character matrix (a matrix in which every element is a single character) There are 2 ways to do this: Using [ ] and separating with semicolons Using char Since all rows in a matrix must have the same number of characters, shorter character vectors must be padded with blank spaces so that all are the same length; the built-in function char will do that automatically So, string arrays are so much easier!!
Creating Character Matrices Both [ ] and char can be used to create a matrix in which every row has a character vector: >> cmat = ['Hello';'Hi '; 'Ciao ']; >> cmat = char('Hello', 'Hi', 'Ciao’); Both of these will create a matrix cmat: Shorter strings are padded with blanks e.g. ‘Hi ’ Again, string arrays are so much easier! H e l o i C a
Operations on Character Vectors Character vectors can be created using single quotes The input function creates a character vector (don’t forget the second argument, ‘s’) >> cvword = input('Enter a word: ','s') Enter a word: science cvword = 'science' The blanks function creates a character vector with n blank spaces >> bb = blanks(3) bb = ' '
Operations on Strings Strings are created using double quotes or by passing a character vector to the string function Without any arguments, the string function creates an empty string (which, don’t forget, is still a 1 x 1 scalar so the length is 1) The plus function or operator can join, or concatenate, two strings together >> "abc" + "xyz" ans = "abcxyz"
Text Functions Many functions can have either character vectors or strings as input arguments Generally, if a character vector is passed to the function, a character vector will be returned, and if a string is passed, a string will be returned For example, changing case using upper or lower: >> upper('abc') ans = 'ABC' >> lower("HELP") "help"
Removing Characters deblank removes trailing blanks strtrim removes leading and trailing blanks (Note: neither function removes blanks in the middle of strings) strip removes leading and/or trailing characters; either whitespace characters by default or else a specified character erase removes all occurrences of a substring within a string or character vector (so could be used to erase blanks in the middle of strings)
String Concatenation There are several ways to concatenate, or join, strings (using +) and character vectors (using [ ]) >> "abc"+"d" ans = "abcd" >> ['abc' 'd'] 'abcd' The strcat function can be used; it will remove trailing blanks for character vectors but not for strings >> strcat('Hi ', 'everyone') 'Hieveryone' >> strcat("Hi ", "everyone") "Hi everyone”
The sprintf function sprintf works just like fprintf, but instead of printing, it creates text– so it can be used to customize the format of text So, sprintf can be used to create customized text to pass to other functions (e.g., title, input) >> maxran = randi([1, 50]); >> prompt = sprintf('Enter an integer from 1 to %d: ', maxran); >> mynum = input(prompt); Enter an integer from 1 to 46: 33 Any time text is required as an input, sprintf can create customized text
String Comparisons strcmp compares two strings or character vectors and returns logical 1 if they are identical or 0 if not (or not the same length) Variations: strncmp compares only the first n characters strcmpi ignores case (upper or lower) strncmpi compares n characters, ignoring case
The equality operator To use the equality operator with character vectors, they must be the same length, and each element will be compared. For strings, however, it will simply return 1 or 0: >> 'cat' == 'car' ans = 1 1 0 >> 'cat' == 'mouse' Matrix dimensions must agree. >> "cat" == "car" >> "cat" == "mouse"
Find and replace functions Note that the word “string” here can mean string or character vector strfind(string, substring): finds all occurrences of the substring within the string; returns a vector of the indices of the beginning of the strings, or an empty vector if the substring is not found strrep(string, oldsubstring, newsubstring): finds all occurrences of the old substring within the string, and replaces with the new substring the old and new substrings can be different lengths count(string, substring): counts the number of occurrences of a substring within a string
The strtok function The strtok function takes a string and breaks it into two pieces and returns both strings It looks for a delimiter (by default a blank space) and returns a token which is the beginning of the string up to the delimiter, and also the rest of the string, including the delimiter A second argument can be passed for the delimiter So – no characters are lost; all characters from the original string are returned in the two output strings Since the function returns two strings, the call to strtok should be in an assignment statement with two variables on the left to store the two strings
Examples of strtok >> mystring = 'Isle of Skye'; >> [first, rest] = strtok(mystring) first = Isle rest = of Skye >> length(rest) ans = 8 >> [f, r] = strtok(rest, 'y') f = of Sk r = ye
Functions of String Arrays These functions work with string arrays, but not character vectors: strings: preallocates a string array to empty strings strjoin: concatenates strings in a string array together strsplit: splits a string into string scalars join: concatenates strings in corresponding elements of columns in a string array Also, the plus function or operator can concatenate the same string to all string scalars in a string array
“is” Functions to determine type ischar true if the input argument is a character vector isstring true if the input argument is a string isStringScalar true if the input argument is a string scalar
“is” and true/false Functions functions for strings and character vectors: isletter true if the input argument is a letter of the alphabet isspace true if the input argument is a white space character isstrprop determines whether the characters in a string are in a category specified by second argument, e.g. ‘alphanumeric’ contains determines whether a substring is somewhere within a string endsWith/startsWith determine whether text ends with (or starts with) a specified string
String/Number Functions Converting from strings to numbers and vice versa: int2str converts from an integer to a character vector storing the integer num2str converts a real number to a character vector containing the number string converts number(s) to strings str2num (and str2double) converts from a string or character vector containing number(s) to a number array Note: this is different from converting to/from ASCII equivalents
Common Pitfalls Trying to use == to compare character vectors for equality, instead of the strcmp function (and its variations) Confusing sprintf and fprintf. The syntax is the same, but sprintf creates a string whereas fprintf prints Trying to create a vector of character vectors with varying lengths (use string arrays instead) Forgetting that when using strtok, the second argument returned (the “rest” of the string) contains the delimiter.
Programming Style Guidelines Make sure the correct string comparison function is used; for example, strcmpi if ignoring case is desired