Data Representation Prepared by Dr P Marais (Modified by D Burford)
Bits, Bytes, Words Bit is a single binary digit Byte is 8 bits Word can be 16-bits (2 bytes) or 32-bits (4 bytes).
Basic data types in Java byte:8-bits, 1 byte (surprisingly!) short:16-bits, 2 bytes int:32-bits, 4 bytes long:64-bits, 8 bytes float:32-bits, 4 bytes double:64-bits, 8 bytes
Bits, Bytes, Words Bytes are often written using hexadecimal One byte (8-bits) can be made up of 2 hexadecimal digits (4-bits each). E.g. –byte b = 15; // 0F in hex –byte b = 255; // FF in hex –int i = 1,000,000; // 3B 9A CA 00 in hex
Byte Ordering “Endianess”: ordering of bytes in computer memory 012……nn+1n+2… Bytes in memory … low address high address n+3
Byte Ordering Big Endian: –Bytes ordered from Most Significant Byte (MSB) to Least Significant Byte (LSB) Little Endian: –Bytes ordered from LSB to MSB
Byte Ordering: Example E.g. How is F (32-bit number) represented in memory? 012……nn+1n+2… … low address high address n+3
Byte Ordering: Example E.g. How is F (32-bit number) represented in memory? high address 012……nn+1n+2… … F low address n+3 BIG ENDIAN MSBLSB
Byte Ordering: Example E.g. How is F (32-bit number) represented in memory? high address 012……nn+1n+2… … 0F low address n+3 LITTLE ENDIAN LSBMSB
Byte Ordering Problems with multi-byte data: floats, ints etc. Sun is Big Endian, Intel is Little Endian Bit ordering issues as well: endian on MSb/LSb
Character Representations Characters represented using “character set” –ASCII (8-bit) –Unicode (16-bit)
Character Representations: ASCII ASCII - American Standard Code for Information Interchange 8-bits means 256 characters (0-255) ASCII codes for roman alphabet, numbers, keyboard symbols and basic network control
Character Representations: Unicode Unicode: 16-bits, quite new: subsumes ASCII, extensible, supported by Java 16-bits means characters ( ) Handles many languages, not just roman alphabet, symbols