Insight Through Computing 15. Strings Operations Subscripting Concatenation Search Numeric-String Conversions Built-Ins: int2str,num2str, str2double
Insight Through Computing Previous Dealings N = input(‘Enter Degree: ’) title(‘The Sine Function’) disp( sprintf(‘N = %2d’,N) )
Insight Through Computing A String is an Array of Characters x!’ A a 7 * x ! This string has length 9.
Insight Through Computing Why are Stirngs Important? 1.Numerical Data often encoded as strings 2. Genomic calculation/search
Insight Through Computing Numerical Data is Often Encoded in Strings For example, a file containing Ithaca weather data begins with the string W07629N4226 Longitude: 76 o 29’ West Latitude: 42 o 26’ North
Insight Through Computing What We Would Like to Do W07629N4226 Get hold of the substring ‘07629’ Convert it to floating format so that it can be involved in numerical calculations.
Insight Through Computing Format Issues 9 as an IEEE floating point number: 9 as a character: blablahblah otherblabla Different Representation
Insight Through Computing Genomic Computations Looking for patterns in a DNA sequence: ‘ATTCTGACCTCGATC’ ACCT
Insight Through Computing Genomic Computations Quantifying Differences: ATTCTGACCTCGATC ATTGCTGACCTCGAT Remove?
Insight Through Computing Working With Strings
Insight Through Computing Strings Can Be Assigned to Variables S = ‘N = 2’ N = 2; S = sprintf(‘N = %1d’,N) ‘N = 2’ S sprintf produces a formatted string using fprintf rules
Insight Through Computing Strings Have a Length s = ‘abc’; n = length(s); % n = 3 s = ‘’; % the empty string n = length(s) % n = 0 s = ‘ ‘; % single blank n = length(s) % n = 1
Insight Through Computing Concatenation This: S = ‘abc’; T = ‘xy’ R = [S T] is the same as this: R = ‘abcxy’
Insight Through Computing Repeated Concatenation This: s = ‘’; for k=1:5 s = [s ‘z’]; end is the same as this: z = ‘zzzzz’
Insight Through Computing Replacing and Appending Characters s = ‘abc’; s(2) = ‘x’ % s = ‘axc’ t = ‘abc’ t(4) = ‘d’ % t = ‘abcd’ v = ‘’ v(5) = ‘x’ % v = ‘ x’
Insight Through Computing Extracting Substrings s = ‘abcdef’; x = s(3) % x = ‘c’ x = s(2:4) % x = ‘bcd’ x = s(length(s)) % x = ‘f’
Insight Through Computing Colon Notation s( : ) Starting Location Ending Location
Insight Through Computing Replacing Substrings s = ‘abcde’; s(2:4) = ‘xyz’ % s = ‘axyze’ s = ‘abcde’ s(2:4) = ‘wxyz’ % Error
Insight Through Computing Question Time s = ‘abcde’; for k=1:3 s = [ s(4:5) s(1:3)]; end What is the final value of s ? A abcde B. bcdea C. eabcd D. deabc
Insight Through Computing Problem: DNA Strand x is a string made up of the characters ‘A’, ‘C’, ‘T’, and ‘G’. Construct a string Y obtained from x by replacinig each A by T, each T by A, each C by G, and each G by C x: ACGTTGCAGTTCCATATG y: TGCAACGTCAAGGTATAC
Insight Through Computing function y = Strand(x) % x is a string consisting of % the characters A, C, T, and G. % y is a string obtained by % replacing A by T, T by A, % C by G and G by C.
Insight Through Computing Comparing Strings Built-in function strcmp strcmp(s1,s2) is true if the strings s1 and s2 are identical.
Insight Through Computing How y is Built Up x: ACGTTGCAGTTCCATATG y: TGCAACGTCAAGGTATAC Start: y: ‘’ After 1 pass: y: T After 2 passes: y: TG After 3 passes: y: TGC
Insight Through Computing for k=1:length(x) if strcmp(x(k),'A') y = [y 'T']; elseif strcmp(x(k),'T') y = [y 'A']; elseif strcmp(x(k),'C') y = [y 'G']; else y = [y 'C']; end
Insight Through Computing A DNA Search Problem Suppose S and T are strings, e.g., S: ‘ACCT’ T: ‘ATGACCTGA’ We’d like to know if S is a substring of T and if so, where is the first occurrance?
Insight Through Computing function k = FindCopy(S,T) % S and T are strings. % If S is not a substring of T, % then k=0. % Otherwise, k is the smallest % integer so that S is identical % to T(k:k+length(S)-1).
Insight Through Computing A DNA Search Problem S: ‘ACCT’ T: ‘ATGACCTGA’ strcmp(S,T(1:4)) False
Insight Through Computing A DNA Search Problem S: ‘ACCT’ T: ‘ATGACCTGA’ strcmp(S,T(2:5)) False
Insight Through Computing A DNA Search Problem S: ‘ACCT’ T: ‘ATGACCTGA’ strcmp(S,T(3:6)) False
Insight Through Computing A DNA Search Problem S: ‘ACCT’ T: ‘ATGACCTGA’ strcmp(S,T(4:7))) True
Insight Through Computing Pseudocode First = 1; Last = length(S); while S is not identical to T(First:Last) First = First + 1; Last = Last + 1; end
Insight Through Computing Subscript Error S: ‘ACCT’ T: ‘ATGACTGA’ strcmp(S,T(6:9)) There’s a problem if S is not a substring of T.
Insight Through Computing Pseudocode First = 1; Last = length(s); while Last<=length(T) &&... ~strcmp(S,T(First:Last)) First = First + 1; Last = Last + 1; end
Insight Through Computing Post-Loop Processing Loop ends when this is false: Last<=length(T) &&... ~strcmp(S,T(First:Last))
Insight Through Computing Post-Loop Processing if Last>length(T) % No Match found k=0; else % There was a match k=First; end The loop ends for one of two reasons.
Insight Through Computing Numeric/String Conversion
Insight Through Computing String-to-Numeric Conversion An example… Convention: W07629N4226 Longitude: 76 o 29’ West Latitude: 42 o 26’ North
Insight Through Computing String-to-Numeric Conversion S = ‘W07629N4226’ s1 = s(2:4); x1 = str2double(s1); s2 = s(5:6); x2 = str2double(s2); Longitude = x1 + x2/60 There are 60 minutes in a degree.
Insight Through Computing Numeric-to-String Conversion x = 1234; s = int2str(x); % s = ‘1234’ x = pi; s = num2str(x,’%5.3f’); % s =‘3.142’
Insight Through Computing Problem Given a date in the format ‘mm/dd’ specify the next day in the same format
Insight Through Computing y = Tomorrow(x) x y 02/28 03/01 07/13 07/14 12/31 01/01
Insight Through Computing Get the Day and Month month = str2double(x(1:2)); day = str2double(x(4:5)); Thus, if x = ’02/28’ then month is assigned the numerical value of 2 and day is assigned the numerical value of 28.
Insight Through Computing L = [ ]; if day+1<=L(month) % Tomorrow is in the same month newDay = day+1; newMonth = month;
Insight Through Computing L = [ ]; else % Tomorrow is in the next month newDay = 1; if month <12 newMonth = month+1; else newMonth = 1; end
Insight Through Computing The New Day String Compute newDay (numerical) and convert… d = int2str(newDay); if length(d)==1 d = ['0' d]; end
Insight Through Computing The New Month String Compute newMonth (numerical) and convert… m = int2str(newMonth); if length(m)==1; m = ['0' m]; end
Insight Through Computing The Final Concatenation y = [m '/' d];
Insight Through Computing Some other useful string functions str= ‘Cs 1112’; length(str) % 7 isletter(str) % [ ] isspace(str) % [ ] lower(str) % ‘cs 1112’ upper(str) % ‘CS 1112’ ischar(str) % Is str a char array? True (1) strcmp(str(1:2),‘cs’) % Compare strings str(1:2) & ‘cs’. False (0) strcmp(str(1:3),‘CS’) % False (0)
Insight Through Computing ASCII characters (American Standard Code for Information Interchange) ascii code Character:: 65‘A’ 66‘B’ 67‘C’: 90‘Z’: ascii code Character:: 48‘0’ 49‘1’ 50‘2’: 57‘9’:
Insight Through Computing Character vs ASCII code str= ‘Age 19’ %a 1-d array of characters code= double(str) %convert chars to ascii values str1= char(code) %convert ascii values to chars
Insight Through Computing Arithmetic and relational ops on characters ‘c’-‘a’ gives 2 ‘6’-‘5’ gives 1 letter1=‘e’; letter2=‘f’; letter1-letter2 gives -1 ‘c’>’a’ gives true letter1==letter2 gives false ‘A’ + 2 gives 67 char(‘A’+2) gives ‘C’
Insight Through Computing Example: toUpper Write a function toUpper(cha) to convert character cha to upper case if cha is a lower case letter. Return the converted letter. If cha is not a lower case letter, simply return the character cha. Hint: Think about the distance between a letter and the base letter ‘a’ (or ‘A’). E.g., a b c d e f g h … A B C D E F G H … Of course, do not use Matlab function upper ! distance = ‘g’-‘a’ = 6 = ‘G’-‘A’
Insight Through Computing function up = toUpper(cha) % up is the upper case of character cha. % If cha is not a letter then up is just cha. up= cha; cha is lower case if it is between ‘a’ and ‘z’
Insight Through Computing function up = toUpper(cha) % up is the upper case of character cha. % If cha is not a letter then up is just cha. up= cha; if ( cha >= 'a' && cha <= 'z' ) % Find distance of cha from ‘a’ end
Insight Through Computing function up = toUpper(cha) % up is the upper case of character cha. % If cha is not a letter then up is just cha. up= cha; if ( cha >= 'a' && cha <= 'z' ) % Find distance of cha from ‘a’ offset= cha - 'a'; % Go same distance from ‘A’ end
Insight Through Computing function up = toUpper(cha) % up is the upper case of character cha. % If cha is not a letter then up is just cha. up= cha; if ( cha >= 'a' && cha <= 'z' ) % Find distance of cha from ‘a’ offset= cha - 'a'; % Go same distance from ‘A’ up= char('A' + offset); end