LING 388: Computers and Language

Slides:



Advertisements
Similar presentations
ICS312 Set 2 Representation of Numbers and Characters.
Advertisements

The Binary Numbering Systems
Digital Fundamentals Floyd Chapter 2 Tenth Edition
1.6 Signed Binary Numbers.
LING 408/508: Programming for Linguists Lecture 2 August 28 th.
© 2009 Pearson Education, Upper Saddle River, NJ All Rights ReservedFloyd, Digital Fundamentals, 10 th ed Digital Fundamentals Tenth Edition Floyd.
Computers Organization & Assembly Language
Lecture 11: Machine Processing Intro to IT COSC1078 Introduction to Information Technology Lecture 11 Machine Processing James Harland
IT253: Computer Organization
Lec 3: Data Representation Computer Organization & Assembly Language Programming.
ECEN2102 Digital Logic Design Lecture 1 Numbers Systems Abdullah Said Alkalbani University of Buraimi.
Binary, Decimal and Hexadecimal Numbers Svetlin Nakov Telerik Corporation
Digital Logic Design Lecture 3 Complements, Number Codes and Registers.
ICS312 Set 1 Representation of Numbers and Characters.
Data Representation Dr. Ahmed El-Bialy Dr. Sahar Fawzy.
LING 408/508: Programming for Linguists Lecture 2 August 26 th.
1 Information Representation in Computer Lecture Nine.
Monday, January 14 Homework #1 is posted on the website Homework #1 is posted on the website Due before class, Jan. 16 Due before class, Jan. 16.
CS 2130 Lecture 23 Data Types.
Chapter 1: Binary Systems
Data Encoding COSC Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.
Tokens in C  Keywords  These are reserved words of the C language. For example int, float, if, else, for, while etc.  Identifiers  An Identifier is.
Chapter 1 Representing Data in a Computer. 1.1 Binary and Hexadecimal Numbers.
CS 125 Lecture 3 Martin van Bommel. Overflow In 16-bit two’s complement, what happens if we add =
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
1 CE 454 Computer Architecture Lecture 4 Ahmed Ezzat The Digital Logic, Ch-3.1.
Data Encoding COSC 1301.
Compsci 210 Tutorial Two CompSci Semester Two 2016.
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Programming and Data Structure
Lec 3: Data Representation
Dr. Clincy Professor of CS
Lecture No. 4 Number Systems
Number Representation
Data Representation Binary Numbers Binary Addition
Standard Data Encoding
CSCI 198: Lecture 4: Data Representation
ECE Application Programming
Microprocessor Systems Design I
Chapter 3 Data Storage.
Numbers in a Computer Unsigned integers Signed magnitude
CSCI 161: Lecture 4: Data Representation
Microprocessor Systems Design I
Binary, Decimal and Hexadecimal Numbers
Even/odd parity (1) Computers can sometimes make errors when they transmit data. Even/odd parity: is basic method for detecting if an odd number of bits.
CS1010 Programming Methodology
Data Structures Mohammed Thajeel To the second year students
What to bring: iCard, pens/pencils (They provide the scratch paper)
BEE1244 Digital System and Electronics BEE1244 Digital System and Electronic Chapter 2 Number Systems.
LING 388: Computers and Language
Data Representation Data Types Complements Fixed Point Representation
LING 408/508: Programming for Linguists
LING 408/508: Computational Techniques for Linguists
Representing Information
Digital Logic & Design Lecture 03.
Dr. Clincy Professor of CS
COMS 161 Introduction to Computing
Bits and Bytes Topics Representing information as bits
COMS 161 Introduction to Computing
COMS 161 Introduction to Computing
Storing Negative Integers
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
Homework Homework Continue Reading K&R Chapter 2 Questions?
LING 388: Computers and Language
Comp Org & Assembly Lang
COMS 161 Introduction to Computing
LING 388: Computers and Language
ECE 120 Midterm 1 HKN Review Session.
Presentation transcript:

LING 388: Computers and Language Lecture 5

Administrivia Homework 3 graded Quick Homework 4 out today I'll be away next two weeks (my apologies) Colton Flowers, a HLT student, will take you through Python exercises for the next four lectures Expect to do hands-on work

Today's Topics Continue with Underneath Python Binary and hexadecimal (base 16) Integer representation Floating point numbers (Homework 4) Character sets

Introduction: data types Integers (32 bits; 4 bytes; 1 word) -2,147,483,648 (231-1 -1) to 2,147,483,647 (232-1 -1) using the 2’s complement representation Why? super-convenient for arithmetic operations “to convert a positive integer X to its negative counterpart, flip all the bits, and add 1” Example: 00001010 = 23 + 21 = 10 (decimal) 11110101 + 1 = 11110110 = -10 (decimal) 11110110 flip + 1 = 00001001 + 1 = 00001010 Addition: -10 + 10 = 11110110 + 00001010 = 0 (ignore overflow)

Introduction: data types Typically 32 bits (4 bytes) are used to store an integer range: -2,147,483,648 (231-1 -1) to 2,147,483,647 (232-1 -1) what if you want to store even larger numbers? Binary Coded Decimal (BCD) code each decimal digit separately, use a string (sequence) of decimal digits … 231 230 229 228 227 226 225 224 … 27 26 25 24 23 22 21 20 byte 3 byte 2 byte 1 byte 0 C: int

Introduction: data types what if you want to store even larger numbers? Binary Coded Decimal (BCD) 1 byte can code two digits (0-9 requires 4 bits) 1 nibble (4 bits) codes the sign (+/-), e.g. hex C/D 23 22 21 20 2 1 4 2 bytes (= 4 nibbles) 23 22 21 20 1 1 + 2 1 4 2.5 bytes (= 5 nibbles) 23 22 21 20 1 9 credit (+) debit (-) 23 22 21 20 1 23 22 21 20 1 C D

Introduction: data types Typically, 64 bits (8 bytes) are used to represent floating point numbers (double precision) c = 2.99792458 x 108 (m/s) (Speed of light) c = 299,792,458 coefficient: 52 bits (implied 1, therefore treat as 53) exponent: 11 bits (not 2’s complement, unsigned with bias) sign: 1 bit (+/-) C: float double x86 CPUs have a built-in floating point coprocessor (x87) 80 bit long registers wikipedia

Example 1 Recall the speed of light (from last side): c = 2.99792458 x 108 (m/s) Can a 4 byte integer be used to represent c exactly? 4 bytes = 32 bits 32 bits in 2’s complement format Largest positive number is 231-1 = 2,147,483,647 c = 299,792,458 (in integer notation)

Example 2 Recall again the speed of light: c = 2.99792458 x 108 (m/s) How much memory would you need to encode c using BCD notation? 9 digits each digit requires 4 bits (a nibble) BCD notation includes a sign nibble total is 5 bytes

Example 3 Recall the speed of light: c = 2.99792458 x 108 (m/s) Can the 64 bit floating point representation (double) encode c without loss of precision? Recall significand precision: 53 bits (52 explicitly stored) 253-1 = 9,007,199,254,740,991 almost 16 digits

Example 4 Recall the speed of light: c = 2.99792458 x 108 (m/s) The 32 bit floating point representation (float) – sometimes called single precision - is composed of 1 bit sign, 8 bits exponent (unsigned with bias 2(8-1)-1), and 23 bits coefficient (24 bits effective). Can it represent c without loss of precision? 224-1 = 16,777,215 Nope

hw4.xls

Quick Homework 4: Representing π Using the supplied spreadsheet, compute the best 32 bit floating point approximation to π that you can come up with. Submit a snapshot of the spreadsheet to me by email anytime next week Reminder: one PDF file Subject: 388 Your Name Homework 4

Introduction: data types char How about letters, punctuation, etc.? ASCII American Standard Code for Information Interchange Based on English alphabet (upper and lower case) + space + digits + punctuation + control (Teletype Model 33) Question: how many bits do we need? 7 bits + 1 bit parity Remember everything is in binary … Teletype Model 33 ASR Teleprinter (Wikipedia)

Introduction: data types order is important in sorting! 0-9: there’s a connection with BCD. Notice: code 30 (hex) through 39 (hex)

Introduction: data types x86 assemby language: PF: even parity flag set by arithmetic ops. TEST: AND (don’t store result), sets PF JP: jump if PF set Example: MOV al,<char> TEST al, al JP <location if even> <go here if odd> Parity bit: transmission can be noisy parity bit can be added to ASCII code can spot single bit transmission errors even/odd parity: receiver understands each byte should be even/odd Example: 0 (zero) is ASCII 30 (hex) = 011000 even parity: 0110000, odd parity: 0110001 Checking parity: Exclusive or (XOR): basic machine instruction A xor B true if either A or B true but not both (even parity 0) 0110000 xor bit by bit 0 xor 1 = 1 xor 1 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0

Introduction: data types UTF-8 standard in the post-ASCII world backwards compatible with ASCII (previously, different languages had multi-byte character sets that clashed) Universal Character Set (UCS) Transformation Format 8-bits (Wikipedia)

Introduction: data types Example: あ Hiragana letter A: UTF-8: E38182 Byte 1: E = 1110, 3 = 0011 Byte 2: 8 = 1000, 1 = 0001 Byte 3: 8 = 1000, 2 = 0010 い Hiragana letter I: UTF-8: E38184 Shift-JIS (Hex): あ: 82A0 い: 82A2

Introduction: data types How can you tell what encoding your file is using? Detecting UTF-8 Microsoft: 1st three bytes in the file is EF BB BF (not all software understands this; not everybody uses it) HTML: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" > (not always present) Analyze the file: Find non-valid UTF-8 sequences: if found, not UTF-8… Interesting paper: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html

Introduction: data types Text files: text files have lines: how do we mark the end of a line? End of line (EOL) control character(s): LF 0x0A (Mac/Linux), CR 0x0D (Old Macs), CR+LF 0x0D0A (Windows) End of file (EOF) control character: (EOT) 0x04 (aka Control-D) programming languages: NUL used to mark the end of a string binaryvision.nl