Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.

Slides:



Advertisements
Similar presentations
Vocabulary Electronic pulses Transistors Decimal numbers
Advertisements

Information Representation
Text Compression 1 Assigning 16 bits to each character in a document uses too much file space We need ways to store and transmit text efficiently Text.
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Processing Data.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Unit 3—Part A Computer Memory
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Number Systems & Logic Gates Day 1
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Digital Data Patrice Koehl Computer Science UC Davis.
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
Dale & Lewis Chapter 3 Data Representation
Bits, Bytes, KiloBytes, MegaBytes, GigaBytes & TeraBytes.
Communications Technology 2104 Mercedes Lahey. Bit 1. bit=From a shortening of the words “binary digit” 2. the basic unit of information for computers.
Memory Terminology & Data Representation CSCI 1060 Fall 2006.
Data and Program Representation
MAC OS – Unit A Page: 10-11, Investigating Data Processing Understanding Memory.
Chapter 2 Computer Hardware
Data Representation A series of eight bits is called a byte. A byte can be used to represent a number or a character. As you’ll see in the following table,
Chapter 3 Data Representation. 2 Data and Computers Computers are multimedia devices, dealing with many categories of information. Computers store, present,
1 Analysis of Algorithms Chapter - 08 Data Compression.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Chapter 3 Representation. Key Concepts Digital vs Analog How many bits? Some standard representations Compression Methods 3-2.
3-1 Data and Computers Computers are multimedia devices, dealing with a vast array of information categories. Computers store, present, and help us modify.
What do computers know?  All they really know is on or off.  Kind of like a light switch  Computers aren’t nearly as smart as you are!
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Bits & Bytes Created by Chris McAbee For AAMU AGB199 Extra Credit Created from information copied and pasted from
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Chapter 1: Data Storage.
Unit 2—Part A Computer Memory Computer Technology (S1 Obj 2-3)
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Do it now activity Can you work out what the missing symbols are and work out the order they should be in if the table shows smallest to largest KB kilobyte.
Chapter 3 Data Representation. 2 Compressing Files.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
COMPUTER SYSTEM A computer system is define as combination of components designed to process data and store files. A computer system consists of four.
CS 101 – Sept. 11 Review linear vs. non-linear representations. Text representation Compression techniques Image representation –grayscale –File size issues.
Binary a. express numbers in binary, binary-coded decimal (BCD), octal and hexadecimal;
CC111 Lec#2 The System Unit The System Unit: Processing and Memory Lecture 2 Binary System.
© OCR 2016 Unit 2.6 Data Representation Lesson 1 ‒ Numbers.
Chapter 1: Data Storage.
Computer Science: An Overview Eleventh Edition
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Understanding binary Understanding Computers.
Computer basics.
Data Representation.
Lecture 3: Data representation
3.1 Denary, Binary and Hexadecimal Number Systems
Computer Memory Digital Literacy.
Lesson Objectives Aims You should know about: 1.3.1:
Memory Parts of a computer
Unit 2- Lesson 1 & 2- Bytes and File Sizes / Text Compression
What is Binary? Binary is a two-digit (Base-2) numerical system, which computers use to process and store data. The reason computers use the binary system.
Unit 2.6 Data Representation Lesson 1 ‒ Numbers
Unit 2 Computer Memory Computer Technology (S1 Obj 2-3)
Lecture 3 ISE101: Computing Fundamentals
Unit 3—Part A Computer Memory
Data Representation Numbers
Representation of Data in Computer Systems
How do computers work? Storage.
Unit 3—Part A Computer Memory
Data Compression.
Unit 2- Lesson 1 & 2- Bytes and File Sizes / Text Compression
LO1 – Understand Computer Hardware
Presentation transcript:

Data Representation CS105

Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video

Representing Text Document: Paragraphs, sentences, words – All made up of characters English language has 26 letters – 52 if you consider upper and lower case – Punctuation characters – Space Character sets: ASCII

ASCII Character Set 256 characters – 8 bits = 1 byte ASCII: Character a --> Dec: 97 --> Binary:

Recap: Some terminology Up to this point we have been talking about data in either bits or bytes. – 1 byte = 8 bits While this is the correct way to talk about data, sometimes it is a bit inefficient. Therefore, we use prefixes to given an order of magnitude. – Much the same way we do with the metric system. The following is a list of the common terms. – Kilobyte (KB) = 10 3 = 1000 bytes – Megabyte (MB) = 10 6 = 1 million bytes – Gigabyte (GB) = 10 9 = 1 billion bytes – Terabyte (TB) = = 1 trillion bytes – Petabyte (PB) = 1 quadrillion bytes 1 gigabyte of storage 20 years ago!

Unicode Character Set Why Unicode? – 2 16 : characters – ASCII is a subset of Unicode

Data Compression Why compress data? – Storage, transmission within PC/over network What is data compression? – Reducing physical size of information blocks – Compression ratio Tells us how much compression occurs. Number between 0 and 1 – Lossless versus lossy compression Images, sound files, videos Database of names, numbers

Text Compression Examine three types of text compression: – Keyword encoding – Run-length encoding – Huffman encoding

Keyword Encoding Frequently used words replaced by a single character --> Reversible WordSymbol as^ the~ and+ that$ must& well% these# The human body is composed of many independent systems, such as the circulatory system, the respiratory system, and the reproductive system. Not only must all systems work independently, but they must interact and cooperate as well. Overall health is a function of the well being of separate systems, as well as how these separate systems work in concert. The human body is composed of many independent systems, such ^ the circulatory system, ~ respiratory system, + ~ reproductive system. Not only & all systems work independently, but they & interact and cooperate ^ %. Overall health is a function of ~ % being of separate systems, ^% ^ how # separate systems work in concert. Reduced from 352 to 317 Compression ratio: 317/352 = 0.9 Is this efficient? Drawbacks: Symbols used for encoding must not appear in the text ‘The’ & ‘the’ needs to be represented by different symbols Would not gain anything by encoding ‘a’ and ‘I’ Most frequently used words are often short

Run-Length Encoding Also known as recurrence coding Encoding a single character that is repeated over and over again – For example: replacing ‘AAAAAAA’ with a ‘*’ : *A7 Drawbacks? Uses: DNA sequences, simple images Lossy or lossless compression?

Huffman Encoding Variable bit lengths to represent characters: – a --> Binary – 8 bits – Why would character X take up as many bits as a? Represent it using 5 bits instead Saving space: – Frequently appearing characters are represented by shorter bit lengths

Huffman Encoding DOORBELL D= 1011 O= 110 O=110… If we used fixed size bit string: 64 bits With Huffman encoding: 25 bits Compression ratio: 25/64 = 0.39 Huffman CodeCharacter 00A 01E 100L 110O 111R 1010B 1011D