Session 05 Java Strings and Files
Exercise Complete the “quick-and-dirty” class CharacterCounter containing only a main() method that displays the number of non-space characters on the command line after the command. For example: $ java CharacterCounter 0 $ java CharacterCounter a 1 $ java CharacterCounter a bc def ghij 10
CharacterCount template public class CharacterCounter { public static void main( String[] args ) { int characterCount = 0 ; } // end main } // end class CharacterCounter
StringTokenizer Useful tool for processing a String object Allows you to sequentially walk down a String and extract “words”/tokens that are delimited by specified characters What delimiter normally aids us in parsing a long string into words?
StringTokenizer General usage of a StringTokenizer: – create one using a constructor that takes a string argument to process – send one of two messages: hasMoreTokens() and nextToken – use a stereotypical loop to process a sequence of strings A default StringTokenizer uses spaces as delimiters.
StringTokenizer Example import java.util.StringTokenizer; public class EchoWordsInArgumentV1 { public static void main( String[] args ) { StringTokenizer words = new StringTokenizer(args[0]); while( words.hasMoreElements() ) { String word = words.nextToken(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV1
StringTokenizer Example $ java EchoWordsInArgumentV1 "StringTokenizer, please process me." StringTokenizer, please process me. Notice the quotes ( “” ) in the command line so the whole string is read as args[0]. The comma ( “,” ) and period ( “.”) are part of the words and not delimiters by default.
StringTokenizer Example 2 Fortunately, we can construct a StringTokenizer that uses specified characters for delimiters. The designer of the StringTokenizer was planning ahead for future usage!!! $ java EchoWordsInArgumentV2 "StringTokenizer, please process me." StringTokenizer please process me
StringTokenizer Example 2 import java.util.StringTokenizer; public class EchoWordsInArgumentV2 { public static void main( String[] args ) { String delimiters = ".?!()[]{}|?/&\\,;:-\'\"\t\n\r"; StringTokenizer words = new StringTokenizer( args[0], delimiters ); while( words.hasMoreElements() ) { String word = words.nextToken(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV2
UNIX/Linux pipe “|” character on the command line Allows the output of one program to be sent as input to another program, like the UNIX “sort” utility. $ java EchoWordsInArgumentV2 "StringTokenizer, please process me.” | sort StringTokenizer me please process Is this sorted? How can we fix this?
StringTokenizer Example 3 import java.util.StringTokenizer; public class EchoWordsInArgumentV3 { public static void main( String[] args ) { String delimiters = ".?!()[]{}|?/&\\,;:-\'\"\t\n\r"; StringTokenizer words = new StringTokenizer( args[0], delimiters ); while( words.hasMoreElements() ) { String word = words.nextToken(); word = word.toLowerCase(); System.out.println( word ); } // end while } // end main } // end class EchoWordsInArgumentV3
StringTokenizer Example 3 $ java EchoWordsInArgumentV3 "StringTokenizer, please process me." | sort me please process stringtokenizer
Java File I/O Allows us to write and read “permanent” information to and from disk How would file I/O help improve the capabilities of the MemoPadApp?
Java File I/O Example: Echo.java echoes all the words in one file to an output file, one per line. $ java Echo hamlet.txt hamlet.out $ less hamlet.out 1604 the tragedy of hamlet prince of denmark by william shakespeare...
Study Echo.java’s File I/O have constructors that allow convenient and flexible processing send input message: readLine() send output messages: print() and println() use a stereotypical loop to process a file of lines use of the stereotypical StringTokenizer loop as inner loop
import java.io.*; import java.util.StringTokenizer; public class Echo { public static void main( String[] args ) throws IOException { String delimiters = ".?!()[]{}|?/&\\,;:-\'\"\t\n\r"; BufferedReader inputFile = new BufferedReader(new FileReader(args[0]) ); PrintWriter outputFile = new PrintWriter( new FileWriter( args[1] ) ); String buffer = null; while( true ) { buffer = inputFile.readLine(); if ( buffer == null ) break; buffer = buffer.toLowerCase(); StringTokenizer tokens = new StringTokenizer( buffer, delimiters ); while( tokens.hasMoreElements() ) { String word = tokens.nextToken(); outputFile.println( word ); } // end while } // end while(true)... } // end main } // end class Echo
wc - UNIX/Linux utility wc prints the number of lines, words, and characters in a file to standard output. For example: $ wc hamlet.txt hamlet.txt
Exercise Using Echo.java as your starting point, create a WordCount.java program that does the same thing as wc, i.e., prints the number of lines, words, and characters in a file to standard output. For example: $ java WordCount hamlet.txt
import java.io.*; import java.util.StringTokenizer; public class WordCount { public static void main( String[] args ) throws IOException { String delimiters = ".?!()[]{}|?/&\\,;:-\'\"\t\n\r"; BufferedReader inputFile = new BufferedReader( new FileReader( args[0] ) ); String buffer = null; int chars = 0; int words = 0; int lines = 0; while( true ) { buffer = inputFile.readLine(); if ( buffer == null ) break; lines++; buffer = buffer.toLowerCase(); StringTokenizer tokens = new StringTokenizer( buffer, delimiters ); while( tokens.hasMoreElements() ) { String word = tokens.nextToken(); words++; chars += word.length(); } // end while } // end while( true )... System.out.println( "" + lines + " " + words + " " + chars ); } // end main } // end class WordCount
Why the difference in the number of words and number of characters? $ wc hamlet.txt hamlet.txt $ java WordCount hamlet.txt