Chapter 7: The String class We’ll go through some of this quickly!
Strings as objects Strings are objects. Each String is an instance of the class String They can be constructed thus: String s = new String("Hi mom!"); Strings are so common, Java provides a handier way of creating them: String s = "Hi mom!"; Strings have methods: loads of them!!!
Some methods for Strings String s1 = "harry"; String s2 = "harold"; if ( s1.equals(s2) ) System.out.println("Strings are the same"); if ( s1.compareTo(s2) < 0 ) System.out.println(s1 + "...." + s2); else System.out.println(s2 + "...." + s1);
Comparing Strings s1.compareTo("hi mom!") returns a value < 0 if this s1 is ordered before "hi mom" returns a value > 0 if "hi mom" is ordered before s1 returns 0 if they are the same Ordering is called lexicographic order
Quick exercise Write a program which reads 2 strings and writes them out in lexicographic order, smallest first Can you figure out which characters are ordered before which? Is this the same as the telephone book?
String concatenation String s1 = "Hi" + " mom"; String s2 = "your lucky number is " + number; String s3 = s1.concat(s2); String s3 = s1 + s2; A number will be first turned into a String, then concatenated.
Substrings A String is stored as an array of characters: Hi_Mom! index: character: public String substring(int beginIndex) public String substring(int beginIndex, int endIndex)
Some things to remember The index of the first character is 0, not 1 substring(a, b) returns the characters from position a to position b-1, not b!!!! substring(a) returns the characters from position a to the end of the String
Other String methods return the character at a given index: –public char charAt(int index) get the length of a String: –public int length()
Processing string contents: StringTokeniser All the strings we’ve seen before have been short (a word or two). To process long strings (such as sentences) we need to be able to split up strings into their parts (words, numbers, etc.). The parts of a sentence are called tokens. “This is a string with 9 tokens in it.” token How do we recognise tokens? They are separated by delimiters (in the sentence above, blank spaces).
Java.util.StringTokenizer // this program uses a StringTokenizer object to split a sentence // into words and print each word on a different line import java.util.StringTokenizer; public class TestTokenizer{ public static void main(String[] args){ String test = “This is a test string.”; StringTokenizer testTokenizer = new StringTokenizer(test); // ‘testTokenizer’ is an object that will give // us the consecutive tokens in the String ‘test’. while (testTokenizer.hasMoreTokens()) { System.out.println(testTokenizer.nextToken()); } // the testTokeniser object has methods called // hasMoreTokens() and nextToken(), which tell // us whether there are more tokens left in the string // test, and give us the next token from that string } }
using StringTokenizer To tokenize a String (e.g. split it into words), we create a new StringTokenizer object, giving the StringTokenizer constructor the string we want to split up: String test = “This is a test string.”; StringTokenizer testTokenizer = new StringTokenizer(test); The StringTokenizer object now has inside it the string to tokenize. We get the next token by asking that object for nextToken(). The object looks in the string it was given at construction, and returns the next token to us. We can find out how many tokens are in our string in total by asking that object for countTokens(). The object looks in the string it was given at construction, and tells us how many tokens are in it.
Delimiters for StringTokenizer By default, a StringTokenizer object splits up a String using blank spaces, tabs ( ‘\t’ ), new line ( ‘\n’ ), return ( ‘r’ ) as delimiters. We can use different delimiters, by giving a String containing the delimiters we want to use as arguments to nextToken(): nextToken(“,\t\n\r”); Uses spaces, commas, tabs, newlines returns as delimiters. We can also specify the delimiters we want to use when we construct our StringTokenizer object: StringTokenizer st=new StringTokenizer(test,“,\t\n\r”); If we want the delimiters to be returned as tokens, we specify that in the constructor as well (normally they’re not returned).
UML for StringTokenizer UML means Unified Modelling Language; it’s a way of summarising object oriented programs quickly. Read the first bit of Liang, Appendix G (p. 903), which explains UML. Here’s the UML for StringTokenizer: StringTokenizer +countTokens(): int +hasMoreTokens(): boolean +nextToken(): String +nextToken(delim: String): String The + means “publically accessible method”. : int means this method returns an integer.