17-Mar-16 Characters and Strings
2 Characters In Java, a char is a primitive type that can hold one single character A character can be: A letter or digit A punctuation mark A space, tab, newline, or other whitespace A control character Control characters are holdovers from the days of teletypes—they are things like backspace, bell, end of transmission, etc.
3 char literals A char literal is written between single quotes (also known as apostrophes): 'a' 'A' '5' '?' ' ' Some characters cannot be typed directly and must be written as an “escape sequence”: Tab is '\t' Newline is '\n' Some characters must be escaped to prevent ambiguity: Single quote is '\'' (quote-backslash-quote-quote) Backslash is '\\'
4 Additional character literals \n newline \t tab \b backspace \r return \f form feed \\ backslash \' single quote \" double quote
5 Character encodings A character is represented as a pattern of bits The number of characters that can be represented depends on the number of bits used For a long time, ASCII (American Standard Code for Information Interchange) has been used ASCII is a seven-bit code (allows 128 characters) ASCII is barely enough for English Omits many useful characters: ¢ ½ ç “ ”
6 Unicode Unicode is a new standard for character encoding that is designed to replace ASCII “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” Java uses a subset of Unicode to represent characters The Java subset uses two bytes for every character Java 1.5 expands this by allowing some three-byte characters Except for having these extra characters available, it seldom makes any difference to how you program
7 Unicode character literals The rest of the ASCII characters can be written as octal numbers from \0 to \377 Any Unicode character (in the Java subset) can be written as a hexadecimal number between \u0000 and \uFFFF Since there are over possible Unicode characters, the list occupies an entire book This makes it hard to look up characters Unicode “letters” in any alphabet can be used in identifiers
8 Glyphs and fonts A glyph is the printed representation of a character For example, the letter ‘A’ can be represented by any of the glyphs A A A A A font is a collection of glyphs Unicode describes characters, not glyphs
9 Strings A String is a kind of object, and obeys all the rules for objects In addition, there is extra syntax for string literals and string concatenation A string is made up of zero or more characters The string containing zero characters is called the empty string
10 String literals A string literal consists of zero or more characters enclosed in double quotes "" "Hello" "This is a String literal." To put a double quote character inside a string, it must be backslashed: "\"Wait,\" he said, \"Don't go!\"" Inside a string, a single quote character does not need to be backslashed (but it can be)
11 String concatenation Strings can be concatenated (put together) with the + operator "Hello, " + name + "!" Anything “added” to a String is converted to a string and concatenated Concatenation is done left to right: "abc" gives "abc35" "abc" gives "8abc" 3 + (5 + "abc") gives "35abc"
12 Newlines The character '\n' represents a “newline” (actually, it’s an LF, the linefeed character) When “printing” to the screen, you can go to a new line by printing a newline character You can also go to a new line by using System.out.println with no argument or with one argument When writing to the internet, you should use "\r\n" instead of println because println is platform-specific On UNIX, println uses LF for a newline On Macintosh, println uses CR instead of LF for a newline On Windows, println uses CR - LF for a newline When you use the character constants, you are in control of what is actually output
13 System.out.print and println System.out.println can be called with no arguments (parameters), or with one argument System.out.print is called with one argument The argument may be any of the 8 primitive types The argument may be any object Java can print any object, but it doesn’t always do a good job Java does a good job printing Strings Java typically does a poor job printing types you define
14 Printing your objects In any class, you can define the following instance method: public String toString() {... } This method can return any string you choose If you have an instance x, you can get its string representation by calling x.toString() If you define your toString() method exactly as above, it will be used whenever your object is converted to a String This happens during concatenation: "My object is " + myObject toString() is also used by System.out.print and System.out.println
15 Constructing a String You can construct a string by writing it as a literal: "This is special syntax to construct a String." Since a string is an object, you could construct it with new : new String("This also constructs a String.") But using new for constructing a string is foolish, because you have to write the string as a literal to pass it in to the constructor You’re doing the same work twice!
16 String methods This is only a sampling of string methods All are called as: myString.method(params) length() -- the number of characters in the String charAt(index) -- the character at (integer) position index, where index is between 0 and length-1 equals( anotherString ) -- equality test (because == doesn’t do quite what you expect Hint: Use "expected".equals( actual ) rather than actual.equals("expected") to avoid NullPointerException s Don’t learn all 48 String methods unless you use them a lot—instead, learn to use the API!
17 Vocabulary escape sequence -- a code sequence for a character, beginning with a backslash ASCII -- an 7-bit standard for encoding characters Unicode -- a 16-bit standard for encoding characters glyph -- the printed representation of a character font -- a collection of glyphs empty string -- a string containing no characters concatenate -- to join strings together
18 The End