Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,

Slides:



Advertisements
Similar presentations
Chapter 3 Data Representation
Advertisements

Information Representation
Dale & Lewis Chapter 3 Data Representation. Representing color Similarly to how color is perceived in the human eye, color information is encoded in combinations.
Chapter 03 Data Representation
1 Data Compression Engineering Math Physics (EMP) Steve Lyon Electrical Engineering.
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
09/17/06 Hofstra University – Overview of Computer Science, CSC005 1 Chapter 3 Data Representation.
Chapter Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression ratios.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 12 Data.
Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Dale & Lewis Chapter 3 Data Representation
CS105 INTRODUCTION TO COMPUTER CONCEPTS DATA REPRESENTATION Instructor: Cuong (Charlie) Pham.
Chapter 3 Data Representation.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Fundamentals Rawesak Tanawongsuwan
©Brooks/Cole, 2003 Chapter 2 Data Representation.
Chapter 2 Data Representation. Define data types. Visualize how data are stored inside a computer. Understand the differences between text, numbers, images,
Lecture 10 Data Compression.
CPS120 Introduction to Computer Science Lecture 4
Chapter 3 The Information Layer: Data Representation.
Data Usually computing systems are complex devices, dealing with a vast array of information categories.
CSCI-235 Micro-Computers in Science Hardware Part II.
Chapter 3 Data Representation.
Lecture 3 Data Representation
Computers and Scientific Thinking David Reed, Creighton University Data Representation 1.
TOPIC 4 INTRODUCTION TO MEDIA COMPUTATION: DIGITAL PICTURES Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach.
Chapter 3 Data Representation (slides modified by Erin Chambers)
Computer Math CPS120: Data Representation. Representing Data The computer knows the type of data stored in a particular location from the context in which.
Chapter 11 Fluency with Information Technology 4 th edition by Lawrence Snyder (slides by Deborah Woodall : 1.
1 Perception, Illusion and VR HNRS 299, Spring 2008 Lecture 14 Introduction to Computer Graphics.
Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Chapter 3 Representation. Key Concepts Digital vs Analog How many bits? Some standard representations Compression Methods 3-2.
3-1 Data and Computers Computers are multimedia devices, dealing with a vast array of information categories. Computers store, present, and help us modify.
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
Common file formats  Lesson Objective: Understanding common file formats and their differences.  Learning Outcome:  Describe the type of files which.
Chapter 2 : Business Information Business Data Communications, 6e.
Quiz # 1 Chapters 1,2, & 3.
Lecture 7: Intro to Computer Graphics. Remember…… DIGITAL - Digital means discrete. DIGITAL - Digital means discrete. Digital representation is comprised.
1 MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - TEXT. 2 What is Text? the basic element of most multimedia the basic element of most multimedia consisting of.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
CSCI-100 Introduction to Computing Hardware Part II.
Chapter 1 Background 1. In this lecture, you will find answers to these questions Computers store and transmit information using digital data. What exactly.
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
Chapter 03 Data Representation. 2 Chapter Goals Distinguish between analog and digital information Explain data compression and calculate compression.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
Chapter 3 Data Representation. 2 Compressing Files.
Lecture 3 Data Representation. The Transition from Text-Based Computing to the Graphical OS In the early 1980’s desktop computers began to be introduced.
Chapter 03 Nell Dale & John Lewis. 3-2 Chapter Goals Distinguish between analog and digital information. Explain data compression and calculate compression.
HONR101 Analytics in a Big Data World Monday, January 18,
TOPIC 4 INTRODUCTION TO MEDIA COMPUTATION: DIGITAL PICTURES Notes adapted from Introduction to Computing and Programming with Java: A Multimedia Approach.
Chapter 3: Data Representation Chapter 3 Data Representation Page 17 Computers use bits to represent all types of data, including text, numerical values,
Information in Computers. Remember Computers Execute algorithms Need to be told what to do And to whom to do it.
Software Design and Development Storing Data Part 2 Text, sound and video Computing Science.
Fundamentals of Data Representation Yusung Kim
Data Representation CT101 – Computing Systems.
Chapter 03 Data Representation.
GCSE COMPUTER SCIENCE Topic 3 - Data 3.2 Data Representation.
Chapter 3 Data Representation Text Characters
Lecture 3: Data representation
Binary Representation in Audio and Images
Computer Science Higher
Data Compression.
CHAPTER 2 - DIGITAL DATA REPRESENTATION AND NUMBERING SYSTEMS
CHAPTER 2 - DIGITAL DATA REPRESENTATION AND NUMBERING SYSTEMS
Web Design and Development
CS105 Introduction to Computer Concepts Data Representation
Chapter 2 Data Representation.
Chapter 8 – Compression Aims: Outline the objectives of compression.
Presentation transcript:

Chapter 3 Data Representation

2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present, and help us modify Numbers Text Audio Images and graphics Video

3 Analog and Digital Information Computers are finite. Computer memory and other hardware devices have only so much room to store and manipulate a certain amount of data. The goal of data representation is to represent enough of the world to satisfy our computational needs and our senses of sight and sound.

4 Analog and Digital Information Information can be represented in one of two ways: analog or digital. Analog data A continuous representation, analogous to the actual information it represents. Digital data A discrete representation, breaking the information up into separate elements.

5 Analog and Digital Information A mercury thermometer exemplifies analog data as it continually rises and falls in direct proportion to the temperature. Digital displays only show discrete information.

6 Analog and Digital Information Computers cannot work well with analog information, so we digitize information by breaking it into pieces and representing those pieces separately. Why do we use binary? Modern computers are designed to use and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent one of two possible values.

7 Electronic Signals An analog signal continually fluctuates in voltage up and down. But a digital signal has only a high or low state, corresponding to the two binary digits. All electronic signals (both analog and digital) degrade as they move down a line. That is, the voltage of the signal fluctuates due to environmental effects.

8 Electronic Signals (Cont’d) Periodically, a digital signal is reclocked to regain its original shape. An analog and a digital signal Degradation of analog and digital signals

9 Representing Text To represent a text document in digital form, we need to be able to represent every possible character that may appear. There is a finite number of characters to represent, so the general approach is to list them all and assign each a binary string. A character set is a list of characters and the codes used to represent each one. By agreeing to use a particular character set, computer manufacturers have made the processing of text data easier.

10 The ASCII Character Set ASCII stands for American Standard Code for Information Interchange. The ASCII character set originally used seven bits to represent each character, allowing for 128 unique characters. Wikipedia has an excellent entry on ASCII.ASCII

11 The ASCII Character Set

12 The ASCII Character Set Notice the organisation of the ASCII table. The table divides in half according to the MSB.  Letters are all in the second half so all codes for alphabetic characters start with 1. This second half of the table divides in half again according to the next bit:  UPPERCASE letters start 10.  lowercase letters start 11. The first half of the table also divides in half according to the next bit:  Control characters start 00.  Numerals and punctuation start 01.

13 The ASCII Character Set Note that control characters (the first 32 in the ASCII character set) do not have simple character representations that you could print to the screen. Some, however, perform actions with which you are familiar.

14 The ASCII Character Set Coding letters in ASCII is easy. Let’s look at ‘j’ as an example: Since ‘j’ is a letter, its code starts with a 1. Since it’s lowercase, the next bit is also a 1. Since it’s the tenth letter of the alphabet the rest of the code is The complete ASCII code for ‘j’ is

15 Later ASCII evolved so that eight bits were used. The 7-bit codes were simply prefixed with another bit, giving another natural doubling.  The original 7-bit codes were padded with 0.  128 new characters were added. The codes for this alternate character set start with 1.

16 The Unicode Character Set Even the extended version of the ASCII character set is not enough for international use. The Unicode character set uses 16 bits per character. The Unicode character set can represent 2 16, or over 65 thousand characters. Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.

17 The Unicode Character Set Figure 3.6 A few characters in the Unicode character set

18 Data Compression It is important that we find ways to store and transmit data efficiently, which leads computer scientists to find ways to compress it. Data compression is a reduction in the amount of space needed to store a piece of data. Compression ratio is the size of the compressed data divided by the size of the original data.

19 Data Compression A data compression technique can be  lossless, which means the data can be retrieved without any loss of the original information,  lossy, which means some information may be lost in the process of compaction. As examples, consider these 3 techniques:  keyword encoding  run-length encoding  Huffman encoding

20 Keyword Encoding Frequently used words are replaced with a single character. For example… Note, that the characters used to encode cannot be part of the original text.

21 Keyword Encoding Consider the following paragraph, The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how these separate systems work in concert.

22 Keyword Encoding This version highlights the words that can be replaced. The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must each system work independently, they must interact and cooperate as well. Overall health is a function of the well-being of separate systems, as well as how those separate systems work in concert.

23 Keyword Encoding This is the encoded paragraph: The human body is composed of many independent systems, such ^ ~ circulatory system, ~ respiratory system, + ~ reproductive system. Not only & each system work independently, they & interact + cooperate ^ %. Overall health is a function of ~ %- being of separate systems, ^ % ^ how # separate systems work in concert.

24 Keyword Encoding There are a total of 349 characters in the original paragraph including spaces and punctuation. The encoded paragraph contains 314 characters, resulting in a savings of 35 characters. The compression ratio for this example is 314/349 or approximately 0.9.

25 Run-Length Encoding A single character may be repeated over and over again in a long sequence. This type of repetition doesn’t generally take place in English text, but often occurs in large data streams. In run-length encoding, a sequence of repeated characters is replaced by a flag character, followed by the repeated character, followed by a single digit that indicates how many times the character is repeated.

26 Run-Length Encoding Some examples: AAAAAAA would be encoded as *A7 *n5*x9ccc*h6 some other text *k8eee can be decoded into the following original text nnnnnxxxxxxxxxccchhhhhh some other text kkkkkkkkeee

27 Run-Length Encoding The original text contains 51 characters, and the encoded string contains 35 characters, giving us a compression ratio in this example of 35/51 or approximately Since we are using one character for the repetition count, it seems that we can’t encode repetition lengths greater than nine. Instead of interpreting the count character as an ASCII digit, we could interpret it as a binary number.

28 Huffman Encoding Why should the character “X”, which is seldom used in text, take up the same number of bits as the blank, which is used very frequently? Huffman codes use variable-length bit strings to represent each character. A few characters may be represented by five bits, and another few by six bits, and yet another few by seven bits, and so forth.

29 Huffman Encoding If we use only a few bits to represent characters that appear often and reserve longer bit strings for characters that don’t appear often, the overall size of the document being represented will be smaller.

30 Huffman Encoding An example of a Huffman alphabet

31 Huffman Encoding DOORBELL would be encoded in binary as If we used a fixed-size bit string to represent each character (say, 8 bits), then the binary from of the original string would be 64 bits. The Huffman encoding for that string is 25 bits long, giving a compression ratio of 25/64, or approximately An important characteristic of any Huffman encoding is that no bit string used to represent a character is the prefix of any other bit string used to represent a character.

32 Representing Audio Information We perceive sound when a series of air compressions vibrate a membrane in our ear, which sends signals to our brain. A stereo sends an electrical signal to a speaker to produce sound. This signal is an analog representation of the sound wave. The voltage in the signal varies in direct proportion to the sound wave.

33 Representing Audio Information To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value. The process is called sampling. In general, a sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction.

34 Representing Audio Information Figure 3.8 Sampling an audio signal

35 Representing Audio Information A compact disk (CD) stores audio information digitally. On the surface of the CD are microscopic pits that represent binary digits. A low intensity laser is pointed at the disc. The laser light reflects strongly if the surface is smooth and reflects poorly if the surface is pitted.

36 Representing Audio Information Figure 3.9 A CD player reading binary information

37 Audio Formats  WAV, AU, AIFF, VQF, and MP3. MP3 is dominant  MP3 is short for MPEG-2, audio layer 3 file.  MP3 employs both lossy and lossless compression. First it analyses the frequency spread and compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain), and it discards information that can’t be heard by humans. Then the bit stream is compressed using a form of Huffman encoding to achieve additional compression.

38 Representing Images and Graphics Colour is our perception of the various frequencies of light that reach the retinas of our eyes. Our retinas have three types of colour photoreceptor cones which respond to different sets of frequencies. These photoreceptor categories correspond to the colours of red, green, and blue.

39 Representing Images and Graphics Colour is often expressed in a computer as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours. For example, an RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue, which results in a bright yellow.

40 Representing Images and Graphics Figure 3.10 Three-dimensional color space

41 Representing Images and Graphics

42 Representing Images and Graphics The amount of data that is used to represent a colour is called the colour depth. HiColor is a term that indicates a 16-bit colour depth. Five bits are used for each number in an RGB value and the extra bit is sometimes used to represent transparency. TrueColor indicates a 24-bit colour depth. Therefore, each number in an RGB value gets eight bits.

43 Representing Images and Graphics HiColor uses 5 bits for each number.  Since 2 5 = 32, there are 32 different levels for each of the 3 primary colours. So there are 32 3 (or 2 15 ) possible colours.  This is a total of 32,768 different colours. TrueColor uses eight bits for each colour component.  2 8 * 2 8 * 2 8 = 2 24 or 16,777,216 colours. Some monitors can use as many as 32 bits for colour depth.  This is potentially 4,294,967,296 colours!

44 Representing Images and Graphics The human eye is able to distinguish about 200 intensity levels in each of the three primaries red, green, and blue. All in all, up to 10 million different colours can be distinguished. So modern monitors are examples of solutions without a problem.  If the human eye can distinguish only 10 million colours, why develop monitors that can display over 4 billion?

45 Indexed Colour A particular application such as a browser may support only a certain number of specific colours, creating a palette from which to choose. For example, Netscape Navigator’s colour palette has only 216 colours.

46 Digitized Images and Graphics Digitizing a picture is the act of representing it as a collection of individual dots called pixels. The number of pixels used to represent a picture is called the resolution. As an example, the resolution of many monitors is 1024 X 768, or 786,432 pixels. If the colour of each pixel is stored as 24 bits (3 bytes) of data, the screen alone requires 2,359,296 bytes (2 megaBytes) of memory.

47 Digitized Images and Graphics Figure 3.12 A digitized picture composed of few individual pixels

48 Digitized Images and Graphics Figure 3.12 A digitized picture composed of many individual pixels

49 Digitized Images and Graphics The storage of image information on a pixel- by-pixel basis is called a raster-graphics format. There are several popular raster file formats including:  BMP (bitmap)  GIF (Graphics Interchange Format)  JPEG (Joint Photographic Experts Group)

50 Vector Graphics Instead of assigning colours to pixels as we do in raster graphics, a vector-graphics format describes an image in terms of lines and geometric shapes. A vector graphic is a series of commands that describe a line’s direction, thickness, and colour. The file size for these formats tend to be small because every pixel does not have to be represented.

51 Vector Graphics Vector graphics can be resized mathematically, and these changes can be calculated dynamically as needed. This makes them particularly useful for defining scalable fonts. However, vector graphics is not a good technique for representing real-world images.

52 Representing Video A video codec (COmpressor/DECompressor) refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network. Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video.

53 Representing Video Codecs use two types of compression - spatial and temporal.  Spatial compression A technique based on removing redundant information within a frame. This problem is essentially the same as that faced when compressing still images.  Temporal compression A technique based differences between consecutive frames. If most of an image in two frames hasn’t changed, why should we waste space to duplicate all of the similar information?