Download presentation
Presentation is loading. Please wait.
Published byIrma Atkins Modified over 9 years ago
1
LING 408/508: Programming for Linguists Lecture 2 August 26 th
2
Today’s Topics continuing on from last time … Homework 1
3
Adminstrivia No class on – Monday September 7 th (Labor Day) – Wednesday November 11 th (Veterans Day) – Week after September 11 th (out of town), plus Monday 21 st – Monday October 12 th
4
Introduction: data types what if you want to store even larger numbers than 32 bits? – Binary Coded Decimal (BCD) – 1 byte can code two digits (0-9 requires 4 bits) – 1 nibble (4 bits) codes the sign (+/-), e.g. hex C/D 23232 2121 2020 0000 23232 2121 2020 0001 23232 2121 2020 1001 0 1 9 2014 2 bytes (= 4 nibbles) +2014 2.5 bytes (= 5 nibbles) 23232 2121 2020 1100 C 23232 2121 2020 1101 D credit (+) debit (-)
5
Introduction: data types Typically, 64 bits (8 bytes) are used to represent floating point numbers (double precision) – c = 2.99792458 x 10 8 (m/s) – coefficient: 52 bits (implied 1, therefore treat as 53) – exponent: 11 bits (usually not 2’s complement, unsigned with bias 2 (10-1) -1 = 511) – sign: 1 bit (+/-) C: float double wikipedia x86 CPUs have a built-in floating point coprocessor (x87) 80 bit long registers e.g. probabilities
6
Introduction: data types Next time, we'll talk about the representation of characters (letters, symbols, etc.)
7
Example 1 Recall the speed of light: c = 2.99792458 x 10 8 (m/s) 1.Can a 4 byte integer be used to represent c exactly? – 4 bytes = 32 bits – 32 bits in 2’s complement format – Largest positive number is – 2 31 -1 = 2,147,483,647 – c = 299,792,458
8
Example 2 Recall the speed of light: c = 2.99792458 x 10 8 (m/s) 2.How much memory would you need to encode c using BCD notation? – 9 digits – each digit requires 4 bits (a nibble) – BCD notation includes a sign nibble – total is 5 bytes
9
Example 3 Recall the speed of light: c = 2.99792458 x 10 8 (m/s) 3.Can the 64 bit floating point representation ( double ) encode c without loss of precision? – Recall significand precision: 53 bits (52 explicitly stored) – 2 53 -1 = 9,007,199,254,740,991 – almost 16 digits
10
Example 4 Recall the speed of light: c = 2.99792458 x 10 8 (m/s) The 32 bit floating point representation ( float ) – sometimes called single precision - is composed of 1 bit sign, 8 bits exponent (unsigned with bias 2 (8-1) -1), and 23 bits coefficient (24 bits effective). Can it represent c without loss of precision? – 2 24 -1 = 16,777,215 – Nope
11
Homework 1 For both solutions, show your work, i.e. how you derived your answer Pi () is an irrational number – can't be represented precisely! wikipedia
12
Homework 1 1.Encode Pi as accurately as possible using both the 64 and 32 bit floating point representations Instruction: draw the diagram and fill in the 1's and 0's 2.How many decimal places of precision is provided by each of the 64 and 32 bit floating point representations?
13
Homework 1 Hints How to encode 1: (bias: 01111 + 0 = 2 0, frac: 1000… remember: there is an implicit leading 1, = 1.000… in binary)
14
Homework 1 Hints How to encode 2: (exp: 10000 = bias 01111 + 1 = 2 1, frac: 1000…) = 10.00… in binary
15
Homework 1 Hints How to encode 3: (exp: 10000 = bias 01111 + 1 = 2 1, frac: 1100…) = 11.000… in binary
16
Homework 1 Hints How to encode 4: (exp: 10001 = bias 01111 + 10 = 2 2, frac: 1000…) = 100.0… in binary
17
Homework 1 Hints How to encode 5: (exp: 10001 = bias 01111 + 10 = 2 2, frac: 1010…) = 101.0… in binary
18
Homework 1 Hints How to encode 6: (exp: 10001 = bias 01111 + 10 = 2 2, frac: 1100…) = 110.0… in binary
19
Homework 1 Hints How to encode 7: (exp: 10001 = bias 01111 + 10 = 2 2, frac: 1110…) = 111.0… in binary
20
Homework 1 Hints How to encode 8: (exp: 10001 = bias 01111 + 100 = 2 3, frac: 1000…) = 1000.0… in binary
21
Homework 1 Hints Decimal 3.5 is 1.11 x 2 1 = 11.1 in binary
22
Homework 1 Hints Decimal 3.25 is 1.101 x 2 1 = 11.01 in binary
23
Homework 1 Hints Decimal 3.125 is 1.1001 x 2 1 = 11.001 in binary
24
Homework 1 Due Friday night – (by midnight in my emailbox) Required format (for all homeworks unless otherwise specified): – Plain text or PDF formats only (no.doc,.docx etc.) – Single file only – cut and paste into one document (no multiple attachments) – Subject line: 408/508 Homework 1 – First line: your full name
25
Introduction: data types How about letters, punctuation, etc.? ASCII – American Standard Code for Information Interchange – Based on English alphabet (upper and lower case) + space + digits + punctuation + control (Teletype Model 33) – Question: how many bits do we need? – 7 bits + 1 bit parity – Remember everything is in binary … C: char Teletype Model 33 ASR Teleprinter (Wikipedia)
26
Introduction: data types order is important in sorting! 0-9: there’s a connection with BCD. Notice: code 30 (hex) through 39 (hex)
27
Introduction: data types Parity bit: – transmission can be noisy – parity bit can be added to ASCII code – can spot single bit transmission errors – even/odd parity: receiver understands each byte should be even/odd – Example: 0 (zero) is ASCII 30 (hex) = 011000 even parity: 0110000, odd parity: 0110001 – Checking parity: Exclusive or (XOR): basic machine instruction – A xor B true if either A or B true but not both – Example: (even parity 0) 0110000 xor bit by bit 0 xor 1 = 1 xor 1 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 x86 assemby language: 1.PF: even parity flag set by arithmetic ops. 2.TEST: AND (don’t store result), sets PF 3.JP: jump if PF set Example: MOV al, TEST al, al JP
28
Introduction: data types UTF-8 – standard in the post-ASCII world – backwards compatible with ASCII – (previously, different languages had multi-byte character sets that clashed) – Universal Character Set (UCS) Transformation Format 8-bits (Wikipedia)
29
Introduction: data types Example: – あ Hiragana letter A: UTF-8: E38182 – Byte 1: E = 1110, 3 = 0011 – Byte 2: 8 = 1000, 1 = 0001 – Byte 3: 8 = 1000, 2 = 0010 – い Hiragana letter I: UTF-8: E38184 Shift-JIS (Hex): あ : 82A0 い : 82A2
30
Introduction: data types How can you tell what encoding your file is using? Detecting UTF-8 – Microsoft: 1 st three bytes in the file is EF BB BF (not all software understands this; not everybody uses it) – HTML: (not always present) – Analyze the file: Find non-valid UTF-8 sequences: if found, not UTF-8… Interesting paper: – http://www- archive.mozilla.org/projects/intl/UniversalCharsetDetection.html
31
Introduction: data types Filesystem: – different on different computers: sometimes a problem if you mount filesystems across different systems Examples: – FAT32 (File Allocation Table)DOS, Windows, memory cards – ExFAT (Extended FAT)SD cards (> 4GB files) – NTFS (New Technology File System) Windows – ext4 (Fourth Extended Filesystem)Linux – HFS+ (Hierarchical File System Plus)Macs limited to 4GB max file size
32
Introduction: data types Filesystem: – different on different computers: sometimes a problem if you mount filesystems across different systems Files: – Name (Path from / root) – Type (e.g..docx,.pptx,.pdf,.html,.txt) – Owner(usually the Creator) – Permissions (for the Owner, Group, or Everyone) – need to be opened (to read from or write to) – Mode: read/write/append – Binary/Text in all programming languages: open command
33
Introduction: data types Text files: – text files have lines: how do we mark the end of a line? – End of line (EOL) control character(s): LF 0x0A (Mac/Linux), CR 0x0D (Old Macs), CR+LF 0x0D0A (Windows) – End of file (EOF) control character: (EOT) 0x04 (aka Control-D) binaryvision.nl programming languages: NUL used to mark the end of a string
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.