Lecture 3: Data representation

Slides:



Advertisements
Similar presentations
Information Representation
Advertisements

Chapter 03 Data Representation
Connecting with Computer Science, 2e
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
09/17/06 Hofstra University – Overview of Computer Science, CSC005 1 Chapter 3 Data Representation.
Chapter Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression ratios.
Data Representation in Computers
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 12 Data.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
Dale & Lewis Chapter 3 Data Representation
CS105 INTRODUCTION TO COMPUTER CONCEPTS DATA REPRESENTATION Instructor: Cuong (Charlie) Pham.
©Brooks/Cole, 2003 Chapter 2 Data Representation.
Chapter 2 Data Representation. Define data types. Visualize how data are stored inside a computer. Understand the differences between text, numbers, images,
CPS120 Introduction to Computer Science Lecture 4
Chapter 3 The Information Layer: Data Representation.
(2.1) Fundamentals  Terms for magnitudes – logarithms and logarithmic graphs  Digital representations – Binary numbers – Text – Analog information 
CSCI-235 Micro-Computers in Science Hardware Part II.
Lecture 3 Data Representation
Computers and Scientific Thinking David Reed, Creighton University Data Representation 1.
Chapter 3 Data Representation (slides modified by Erin Chambers)
Chapter 11 Fluency with Information Technology 4 th edition by Lawrence Snyder (slides by Deborah Woodall : 1.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 14 Introduction to Computer Graphics.
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Chapter 3 Representation. Key Concepts Digital vs Analog How many bits? Some standard representations Compression Methods 3-2.
3-1 Data and Computers Computers are multimedia devices, dealing with a vast array of information categories. Computers store, present, and help us modify.
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
Chapter 2 : Business Information Business Data Communications, 6e.
Marr CollegeHigher ComputingSlide 1 Higher Computing: COMPUTER SYSTEMS Part 1: Data Representation – 6 hours.
Quiz # 1 Chapters 1,2, & 3.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Data Representation, Number Systems and Base Conversions
DATA REPRESENTATION CHAPTER DATA TYPES Different types of data (Fig. 2.1) The computer industry uses the term “MULTIMEDIA” to define information.
CSCI-100 Introduction to Computing Hardware Part II.
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
Chapter 3 Data Representation. 2 Compressing Files.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 03 Nell Dale & John Lewis. 3-2 Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression.
HONR101 Analytics in a Big Data World Monday, January 18,
Information in Computers. Remember Computers Execute algorithms Need to be told what to do And to whom to do it.
Software Design and Development Storing Data Part 2 Text, sound and video Computing Science.
Data Representation. In our everyday lives, we communicate with each other using analogue data. This data takes the form of: Sound Images Letters Numbers.
Fundamentals of Data Representation Yusung Kim
DATA Unit 2 Topic 2. Different Types of Data ASCII code: ASCII - The American Standard Code for Information Interchange is a standard seven-bit code that.
Chapter 1: Data Storage.
Chapter 03 Data Representation.
Binary Representation in Text
Binary Representation in Text
Computer Science: An Overview Eleventh Edition
GCSE COMPUTER SCIENCE Topic 3 - Data 3.2 Data Representation.
Invitation to Computer Science, C++ Version, Fourth Edition
Data Representation.
Lec 3: Data Representation
Introduction to Computers
Folders out, planners out…
CHAPTER 2 - DIGITAL DATA REPRESENTATION AND NUMBERING SYSTEMS
CHAPTER 2 - DIGITAL DATA REPRESENTATION AND NUMBERING SYSTEMS
Ch2: Data Representation
CS105 Introduction to Computer Concepts Data Representation
The Building Blocks: Binary Numbers, Boolean Logic, and Gates
Computer Systems – Unit 1
Chapter 2 Data Representation.
WJEC GCSE Computer Science
Chapter 3 - Binary Numbering System
Presentation transcript:

Lecture 3: Data representation Information Layer Lecture 3: Data representation

Brainteaser

Brainteaser Bi-quinary representation?! What in a world is this? How does it work??!!

Layers of Computing Systems Communications Applications Operating Systems Programming Hardware Information

Information Information Layer 01001010101010 numbers audio text Images video 01001010101010

Background In the not-so-distant past, computers dealt almost exclusively with numeric and textual data But now computers are truly multimedia devices. They can store, present, and help us modify many different types of data, including: Numbers Text Audio Images and graphics Video

Background Ultimately, all of this data is stored as binary digits. Each document, picture, and sound bite is somehow represented as strings of 1s and 0s.

Analog and Digital Information The natural world, for the most part, is continuous and infinite. A number line is continuous, with values growing infinitely large and small. Computers, on the other hand, are finite. Computer memory and other hardware devices have only so much room to store and manipulate a certain amount of data.

Analog and Digital Information Information can be represented in one of two ways: analog or digital. Analog data is a continuous representation, analogous to the actual information it represents. Digital data is a discrete representation, breaking the information up into separate elements.

Analog and Digital Information

Analog and Digital Information Computers cannot work well with analog information. So instead, we digitize information by breaking it into pieces and representing those pieces separately. Electronic signals are far easier to maintain if they transfer only binary data. An analog signal continually fluctuates in voltage up and down. Digital signal has only a high or low state, corresponding to the two binary digits.

Binary representations One bit can be either 0 or 1. There are no other possibilities. Therefore, one bit can represent only two things. For example, if we wanted to classify a food as being either sweet or sour, we would need only 1 bit What if we need to represent gear of a car (park, drive, reverse, or neutral)? 2 bits for four different states 00 - park 01 - drive 10 - reverse 11 - neutral

Bit combinations

Bit combinations How many bits are needed to represent 25 unique states? 5 bits = 25 = 32 options How many bits are needed to represent “regions” of Uzbekistan, a.k.a. “tumans” or “viloyats”? 13 regions 4 bits = 24 = 16 options

Representing numeric data

Representing negative numbers Signed-Magnitude Representation Number representation in which the sign represents the ordering of the number (negative and positive) and the value represents the magnitude There are two representations of zero. There is plus zero and minus zero. Two representations of zero within a computer can cause unnecessary complexity, so other representations of negative numbers are used.

Representing negative numbers Fixed size numbers can represent numbers as just integer values, where half of them represent negative numbers. The sign is determined by the magnitude of the number. E.g.: If maximum number of decimal digits we can represent is two, 1 - 49 = positive numbers 1 to 49 50 – 99 = negative numbers (-50) to (-1)

Addition under different schemes Add positive number and a negative number Add a negative number and positive number Add two negative numbers The result equals to 102, but carry is discarded, thus result is 2

Subtraction under different schemes A – B = A+(-B) E.g.Negative number minus positive number

Ten’s complement Negative(I) = 10k - I, where k is the number of digits E.g. Two digit representation: -(3) = 102-3=97 Three digit representation: -(3) = 103-3=997

Two’s complement Invert the bits and add 1 +2 =00000010 +2 =00000010 Invert 11111101 Add 1 00000001 -2 =11111110 01111111 . 00000010 00000001 00000000 11111111 11111110 10000010 10000001 10000000 127 126 . 2 1 -1 -2 -126 -127 -128

Representing text

Representing text Character set – a list of characters and the codes to represent each one ASCII – American Standard Code for Information Interchange 128 unique characters 8 bit per character Unicode Character set Superset of ASCII (256 characters in unicode correspond to ASCII character set) Represent every character in every language used in the entire world and special scientific symbols 16 bit per character

ASCII Codes expressed as decimal numbers, but these values get translated to their binary equivalent for storage. ASCII characters have a distinct order based on the codes used to store them. Each character has a relative position (before or after) every other character. both the uppercase and lowercase letters are in order

Example of ASCII files Suppose you're editing a text file with a text editor. Because you're using a text editor, you're pretty much editing an ASCII file. you type in "cat“ = That is, the letters 'c', then 'a', then 't'. Then, you save the file and quit. What happens? If you look up an ASCII table, you will discover the ASCII code for 0x63, 0x61, 0x74 (the 0x merely indicates the values are in hexadecimal, instead of decimal/base 10). Here's how it looks: Each time you type in an ASCII character and save it, an entire byte is written which corresponds to that character. This includes punctuations, spaces, and etc. Thus, when you type a 'c', it's being saved as 0110 0011 to a file. ASCII 'c' 'a' 't' Hex 63 61 74 Binary 0110   0011 0110   0001 0111   1000

Unicode character set Goal: represent every character in every language used in entire world and scientific symbols 16 bit per character Unicode is a superset of ASCII 256 characters in the Unicode character set correspond exactly to the extended ASCII character set

Demo Unicode (works in MS Word only) ASCII ANSI hold down the Alt key and press the three-digit code on the numeric keypad. uppercase A = Alt + 065 (65 is a decimal representation of letter “A” in ASCII) ANSI hold down the Alt key, but instead use a four-digit code. British pound = Alt + 0163 (the four-digit code) on the numeric keypad (163 is a decimal representation of letter “£” in ANSI) Unicode (works in MS Word only) type the character code, press Alt, and then press X Dollar sign $ = type 0024, press Alt, and then press X

Data compression Data compression - reducing the amount of space needed to store a piece of data. Compression ratio - The size of the compressed data divided by the size of the uncompressed data Lossy compression - A technique in which there is loss of information Lossless compression - A technique in which there is no loss of information

Text compression Keyword encoding Run-length encoding Huffman encoding Replacing a frequently used word (or part of the word) with a single character Run-length encoding Replacing a long series of a repeated character with a count of the repetition Huffman encoding Using a variable-length binary string to represent a character so that frequently used characters have short codes

Keyword encoding “The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how these separate systems work in concert.” TOTAL: 352 characters Encoded version “The human body is composed of many independent systems, such ^ ~ circulatory system, ~ respiratory system, + ~ reproductive system. Not only & each system work independently, they & interact + cooperate ^ %. Overall health is a function of ~ %-being of separate systems, ^ % ^ how # separate systems work in concert.” TOTAL: 317 characters Compression rate: 317/352=0.9

Run-length encoding A.k.a. recurrence coding Sequence of repeated characters is replaced by flag character, followed by the repeated character, followed by a single digit that indicates how many times the character is repeated E.g. * is a flag AAAAAAA -> *A7

Run-length encoding TOTAL: 35 characters Decoded as: *n5*x9ccc*h6 some other text *k8eee TOTAL: 35 characters Decoded as: nnnnnxxxxxxxxxccchhhhhh some other text kkkkkkkkeee TOTAL: 51 characters compression ratio 35/51 = 0.68.

Run-length encoding for bitmap In raw format the first three rows would be 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 Using run length encoding the first three rows would be 16 0 16 0 2 0 12 1 2 0

Huffman encoding Use variable-length bit strings to represent each character Use only few bits to represent frequently used characters Use longer bit strings for rarely used characters E.g. DOORBELL -> 1011110110111101001100100

Decoding Huffman code Since the variable length encoding is used, won’t we get confused when trying to decode a string? we do not know how many bits we should include for each character!! Read the book and get the answer ready for the tutorial! 

Representing audio data

Representing audio data A series of air compressions vibrate a membrane in our ear, which sends signals to our brain. Thus a sound is defined in nature by the wave of air that interacts with our eardrum.

Representing audio data To represent audio information on a computer, we must digitize the sound wave, somehow breaking it into discrete, manageable pieces. To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value - sampling. sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction.

Representing audio data Sampling an audio signal

Representing audio data CD surface contains microscopic pits that represent binary digits. A low intensity laser is pointed at the disk. If surface smooth = laser reflects strongly If surface pitted = laser reflects poorly A receptor analyzes the reflection and produces appropriate string of binary data The signal is reproduced and sent to the speaker

Representing images and graphics

Representing images and graphics Our retinas have three types of color photoreceptor cone cells that respond to different sets of frequencies (red, green, and blue) On computer colour represented as an RGB Three numbers that indicate the relative contribution of each of these three primary colours 0 no contribution 255 full contribution

Digitized images and graphics Pixels - Individual dot used to represent a picture; stands for picture elements Each pixel composed of a single colour Resolution - The number of pixels used to represent a picture Raster-graphics format - Storing image information pixel by pixel E.g. BMP, GIF, JPEG

Vector graphics Vector graphics - Representation of an image in terms of lines and shapes Use formula to calculate the shape and represent image Eg. Corel draw – famous vector graphics editor

Representing video

Representing video Video information is one of the most complex types of information to capture and compress to get a result that makes sense to the human eye. Video clips contain the equivalent of many still images, each of which must be compressed.

Representing video Video codec refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network. Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video. The goal therefore is not to lose information that affects the viewer’s senses.

Homework Computer Science Illuminated, Chapter 3, end of the book exercises http://paulbourke.net/dataformats/compress/