Introduction to Computing Using Python Chapter 6  Encoding of String Characters  Randomness and Random Sampling.

Slides:



Advertisements
Similar presentations
Chapter 3 Data Representation.
Advertisements

Representing Information as Bit Patterns Lecture 4 CSCI 1405, CSCI 1301 Introduction to Computer Science Fall 2009.
Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
מבנה מחשב תרגול 2 ייצוג תווים בחומרה. A programmer that doesn’t care about characters encoding in not much better than a medical doctor who doesn’t believe.
Introduction to Computers and Programming. Some definitions Algorithm: Algorithm: A procedure for solving a problem A procedure for solving a problem.
Data Representation (in computer system) Computer Fundamental CIM2460 Bavy LI.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 12 Data.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Introduction to Computing Using Python More Built-in Container Classes  Container Class dict  Encoding of String Characters  Randomness and Random Sampling.
Dale & Lewis Chapter 3 Data Representation
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Introduction to Computing Using Python More Built-in Container Classes  Container Class dict  Container Class set  Container Class tuple  Encoding.
©Brooks/Cole, 2003 Chapter 2 Data Representation.
Chapter 2 Data Representation. Define data types. Visualize how data are stored inside a computer. Understand the differences between text, numbers, images,
ASCII and Unicode.
Chapter 3 Representing Numbers and Text in Binary Information Technology in Theory By Pelin Aksoy and Laura DeNardis.
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Chapter 1 Data Storage(2) Yonsei University 1 st Semester, 2014 Sanghyun Park.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
Computer Math CPS120: Data Representation. Representing Data The computer knows the type of data stored in a particular location from the context in which.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Fill in the blanks: (1) _________ has only two possible values 0 and 1. (2) There are __________bits in a byte. (3) 1 kilobyte of memory space can store.
Data & Data Types & Simple Math Operation 1 Data and Data Type Standard I/O Simple Math operation.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
CISC1100: Binary Numbers Fall 2014, Dr. Zhang 1. Numeral System 2  A way for expressing numbers, using symbols in a consistent manner.  " 11 " can be.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 3 Mathematical Functions, Strings, and Objects.
DATA REPRESENTATION CHAPTER DATA TYPES Different types of data (Fig. 2.1) The computer industry uses the term “MULTIMEDIA” to define information.
The Information School of the University of Washington Oct 13fit digital1 Digital Representation INFO/CSE 100, Fall 2006 Fluency in Information Technology.
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
The Information School of the University of Washington 15-Oct-2004cse digital1 Digital Representation INFO/CSE 100, Spring 2005 Fluency in Information.
Data Encoding COSC Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.
Data Representation. How is data stored on a computer? Registers, main memory, etc. consists of grids of transistors Transistors are in one of two states,
Characters CS240.
Types Chapter 2. C++ An Introduction to Computing, 3rd ed. 2 Objectives Observe types provided by C++ Literals of these types Explain syntax rules for.
Characters and Strings
Representing Characters in a Computer System Representation of Data in Computer Systems.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
1.4 Representation of data in computer systems Character.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
Binary Representation in Text
Binary Representation in Text
Chapter 8 & 11: Representing Information Digitally
Chapter 3 Data Storage.
Representing Information as bit patterns
Data Encoding Characters.
TOPICS Information Representation Characters and Images
Representing Characters
Ch2: Data Representation
String Encodings and Penny Math
Presenting information as bit patterns
COMS 161 Introduction to Computing
Chapter 2 Data Representation.
Plan Attendance Files Posted on Campus Cruiser Homework Reminder
How Computers Store Data
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
Data Representation Chapter 2 Computer HW (Von Neumann Model) Program
functions: argument, return value
String Encodings and Penny Math
C Programming Language
Lecture 36 – Unit 6 – Under the Hood Binary Encoding – Part 2
Digital Representation of Data
Presentation transcript:

Introduction to Computing Using Python Chapter 6  Encoding of String Characters  Randomness and Random Sampling

Introduction to Computing Using Python Character encodings A string ( str ) object contains an ordered sequence of characters which can be any of the following: lowercase and uppercase letters in the English alphabet: a b c … z and A B C … Z decimal digits: punctuation:,. : ; ‘ “ ! ? etc. Mathematical operators and common symbols: = + - / * $ # & etc. More later Each character is mapped to a specific bit encoding, and this encoding maps back to the character. For many years, the standard encoding for characters in the English language was the American Standard Code for Information Interchange (ASCII)

Introduction to Computing Using Python ASCII For many years, the standard encoding for characters in the English language was the American Standard Code for Information Interchange (ASCII) The code for a is 97, which is in binary or 0x61 in hexadecimal notation The encoding for each ASCII character fits in 1 byte (8 bits)

Introduction to Computing Using Python Built-in functions ord() and char() >>> ord('a') 97 >>> ord('?') 63 >>> ord('\n') 10 >>> >>> ord('a') 97 >>> ord('?') 63 >>> ord('\n') 10 >>> Function ord() takes a character (i.e., a string of length 1) as input and returns its ASCII code Function chr() takes an ASCII encoding (i.e., a non-negative integer) and returns the corresponding character >>> ord('a') 97 >>> ord('?') 63 >>> ord('\n') 10 >>> chr(10) '\n' >>> chr(63) '?' >>> chr(97) 'a' >>> >>> ord('a') 97 >>> ord('?') 63 >>> ord('\n') 10 >>> chr(10) '\n' >>> chr(63) '?' >>> chr(97) 'a' >>>

Introduction to Computing Using Python Beyond ASCII A string object contains an ordered sequence of characters which can be any of the following: lowercase and uppercase letters in the English alphabet: a b c … z and A B C … Z decimal digits: punctuation:,. : ; ‘ “ ! ? etc. Mathematical operators and common symbols: = + - / * $ # & etc. Characters from languages other than English Technical symbols from math, science, engineering, etc. There are only 128 characters in the ASCII encoding Unicode has been developed to be the universal character encoding scheme

Introduction to Computing Using Python Unicode In Unicode, every character is represented by an integer code point. The code point is not necessarily the actual byte representation of the character; it is just the identifier for the particular character >>> '\u0061' 'a' >>> >>> '\u0061' 'a' >>> The code point for letter a is the integer with hexadecimal value 0x0061 Unicode conveniently uses a code point for ASCII characters that is equal to their ASCII code With Unicode, we can write strings in english >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> '\u0409\u0443\u0431\u043e\u043c\u0438\u0 440' 'Љубомир' >>> >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> '\u0409\u0443\u0431\u043e\u043c\u0438\u0 440' 'Љубомир' >>> >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> '\u0409\u0443\u0431\u043e\u043c\u0438\u0 440' 'Љубомир' >>> '\u4e16\u754c\u60a8\u597d!' ' 世界您好 !' >>> >>> '\u0061' 'a' >>> '\u0064\u0061d' 'dad' >>> '\u0409\u0443\u0431\u043e\u043c\u0438\u0 440' 'Љубомир' >>> '\u4e16\u754c\u60a8\u597d!' ' 世界您好 !' >>> With Unicode, we can write strings in english cyrillic With Unicode, we can write strings in english cyrillic chinese … escape sequence \u indicates start of Unicode code point

Introduction to Computing Using Python String comparison, revisited Unicode code points, being integers, give a natural ordering to all the characters representable in Unicode >>> s1 = '\u0021' >>> s1 '!' >>> s2 = '\u0409' >>> s2 'Љ' >>> s1 < s2 True >>> >>> s1 = '\u0021' >>> s1 '!' >>> s2 = '\u0409' >>> s2 'Љ' >>> s1 < s2 True >>> Unicode was designed so that, for any pair of characters from the same alphabet, the one that is earlier in the alphabet will have a smaller Unicode code point.

Introduction to Computing Using Python Unicode Transformation Format (UTF) A Unicode string is a sequence of code points that are numbers from 0 to 0x10FFFF. Unlike ASCII codes, Unicode code points are not what is stored in memory; the rule for translating a Unicode character or code point into a sequence of bytes is called an encoding. There are several Unicode encodings: UTF-8, UTF-16, and UTF-32. UTF stands for Unicode Transformation Format. UTF-8 has become the preferred encoding for and web pages The default encoding when you write Python 3 programs is UTF-8. In UTF-8, every ASCII character has an encoding that is exactly the 8-bit ASCII encoding.

Introduction to Computing Using Python Assigning an encoding to “raw bytes” When a file is downloaded from the web, it does not have an encoding the file could be a picture or an executable program, i.e. not a text file the downloaded file content is a sequence of bytes, i.e. of type bytes >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> s = content.decode('utf-8') >>> type(s) >>> s 'This is a text document\nposted on the\nWWW.\n' >>> >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> s = content.decode('utf-8') >>> type(s) >>> s 'This is a text document\nposted on the\nWWW.\n' >>> The bytes method decode() takes an encoding description as input and returns a string that is obtained by applying the encoding to the sequence of bytes the default is UTF-8 >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> s = content.decode('utf-8') >>> type(s) >>> s 'This is a text document\nposted on the\nWWW.\n' >>> s = content.decode() >>> s 'This is a text document\nposted on the\nWWW.\n' >>> >>> content b'This is a text document\nposted on the\nWWW.\n' >>> type(content) >>> s = content.decode('utf-8') >>> type(s) >>> s 'This is a text document\nposted on the\nWWW.\n' >>> s = content.decode() >>> s 'This is a text document\nposted on the\nWWW.\n' >>> The bytes method decode() takes an encoding description as input and returns a string that is obtained by applying the encoding to the sequence of bytes

Introduction to Computing Using Python Randomness Some apps need numbers generated “at random” (i.e., from some probability distribution): scientific computing financial simulations cryptography computer games Truly random numbers are hard to generate Most often, a pseudorandom number generator is used numbers only appear to be random they are really generated using a deterministic process The Python standard library module random provides a pseudo random number generator as well useful sampling functions

>>> import random >>> random.randrange(1, 7) 2 >>> >>> import random >>> random.randrange(1, 7) 2 >>> Introduction to Computing Using Python Standard Library module random Function randrange() returns a “random” integer number from a given range >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> random.randrange(1, 7) 2 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> random.randrange(1, 7) 2 >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> random.randrange(1, 7) 2 >>> random.uniform(0, 1) >>> random.uniform(0, 1) >>> random.uniform(0, 1) >>> >>> import random >>> random.randrange(1, 7) 2 >>> random.randrange(1, 7) 1 >>> random.randrange(1, 7) 4 >>> random.randrange(1, 7) 2 >>> random.uniform(0, 1) >>> random.uniform(0, 1) >>> random.uniform(0, 1) >>> range is from 1 up to (but not including) 7 Function uniform() returns a “random” float number from a given range Example usage: simulate the throws of a die

Introduction to Computing Using Python Standard Library module random >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> random.choice(names) 'Bob' >>> random.choice(names) 'Ann' >>> random.choice(names) 'Cal' >>> random.choice(names) 'Cal' >>> >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> random.choice(names) 'Bob' >>> random.choice(names) 'Ann' >>> random.choice(names) 'Cal' >>> random.choice(names) 'Cal' >>> >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> random.choice(names) 'Bob' >>> random.choice(names) 'Ann' >>> random.choice(names) 'Cal' >>> random.choice(names) 'Cal' >>> random.sample(names, 3) ['Ike', 'Hal', 'Bob'] >>> random.sample(names, 3) ['Flo', 'Bob', 'Ike'] >>> random.sample(names, 3) ['Ike', 'Ann', 'Hal'] >>> >>> names = ['Ann', 'Bob', 'Cal', 'Dee', 'Eve', 'Flo', 'Hal', 'Ike'] >>> import random >>> random.shuffle(names) >>> names ['Hal', 'Dee', 'Bob', 'Ike', 'Cal', 'Eve', 'Flo', 'Ann'] >>> random.choice(names) 'Bob' >>> random.choice(names) 'Ann' >>> random.choice(names) 'Cal' >>> random.choice(names) 'Cal' >>> random.sample(names, 3) ['Ike', 'Hal', 'Bob'] >>> random.sample(names, 3) ['Flo', 'Bob', 'Ike'] >>> random.sample(names, 3) ['Ike', 'Ann', 'Hal'] >>> Defined in module random are functions shuffle(), choice(), sample(), …