Fundamentals of Data Representation 2016.2.15 Yusung Kim

Fundamentals of Data Representation 2016.2.15 Yusung Kim yskim525@gmail.com

Data, Information and Knowledge Data : often in the form of facts or figures obtained from experiments or surveys. Information : data that are processed to be useful about a particular subject Knowledge : the appropriate collection of information to be useful. Knowledge is a deterministic process. situationdatainformationknowledge measurementunderstanding relations understanding patterns

Analog and Digital Analog – a kind of signal that is continuously variable such as sound and voltage. Digital – a representation of a sequence of discrete values which can only take on one of a finite number of values.

Digital Computing A computer handles information represented by discrete values. It is essential that computer hardware be reliable and error free. If the hardware gives incorrect results, then any program run on that hardware may give incorrect results as well. The key to develop reliable systems is to keep the design as simple as possible.

Decimal Digitalization Suppose that the computer represented numbers as we are used to, in base ten. Each digit is represented by a different voltage level. The more voltage levels the hardware requires, the more complex the hardware becomes. voltage levels 0 1 2 3 4 5 6 7 8 9 0 1 6 3 5 9 0 4 2 7 8

Information Theory The Fundamental Theorem of Information Science of Claude Shannon states that all information can be represented by the use of only two symbols, 0 and 1. This is referred to as binary representation. Shannon is known as the “Father of Information Theory.”

Binary Digitalization Each digit can be one of only two possible values, similar to a light switch that can be either on or off. Computer hardware is based on the use of simple electronic on/off switches called transistors. voltage levels low high 0 1 1 0 1 0 0 1 0 1

The Binary Number System For representing numbers, any base (radix) can be used. For example, in base 10, there are ten possible digits (0, 1,..., 9), in which each column value is a power of ten: 10,000,000 1,000,000 100,000 10,000 1,000 100 10 1 10 7 10 6 10 5 10 4 10 3 10 2 10 1 10 0 9 9 90 + 9 = 99

For representing numbers in base 2, there are two possible digits (0, 1) in which each column value is a power of two: 1286432168421 2 7 2 6 2 5 2 4 2 3 2 2 2 1 2 0 0 1100011 0 + 64 + 32 +0 +0 +0 +2 +1 = 99 Although values represented in base 2 are significantly longer than those in base 10, binary representation is used in digital computing because of the simplicity of hardware design.

Bits and Bytes Each binary digit is referred to as a bit. A group of (usually) eight bits is called a byte. Converting a base ten number to base two involves the successive division of the number by 2. 49/2 = 24 with remainder 1 24/2 = 12 with remainder 0 12/2 = 6 with remainder 0 6/2 = 3 with remainder 0 3/2 = 1 with remainder 1 1/2 = 0 with remainder 1 1 0 0 0 1 1

Negative Numbers In mathematics, negative numbers in any base are represented by prefixing them with a minus ("−") sign. However, in computers, numbers are represented only as sequences of bits, without extra symbols. 7 10 = 00000111 2 -7 10 = ?

Two’s complement representation The leftmost bit serves as a sign bit; 0 for positive numbers, 1 for negative numbers To compute negative values; 1.complement the entire positive number Flip each bit if it is 0, make it 1, and visa versa 2.and then add one 00000111 11111000 1 11111001 (complemented) (add one) + -7 10 = ? 00000111 + 11111001 (+7) (-7) 00000000 (+0)

00001000 + 11111001 (+8) (-7) 00000001 (+1) Two’s complement representation For example, to add +8 and -7, we simply add the corresponding binary codes; A carry from the leftmost column has been ignored. The result, 00000001, is the code for +1, the sum of +8 and -7.

Text ASCII ( American Standard Code for Information Interchange ) – A character-encoding scheme to represent text in computers – ASCII encodes 128 specified characters into seven-bit integers Including numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, and a space. Unicode – A computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

Text ASCII code table – ‘A’ would become 65 – ‘a’ would become 97 – ‘0’ ~ ’9’ would become 48 ~ 57 in base 10 numbers

Character String String – A series of characters manipulated as a group ‘H’‘e’‘l’ ‘o’‘ ‘W’‘o’‘r’‘l’‘d’‘\0’ “Hello World” 72 101 108 1110 ASCII 3287111 114 108 100

Sound Analog audio signal sampled at constant rate – telephone: 8,000 samples/sec – CD music: 44,100 samples/sec Each sample quantized (rounded) – e.g., 2 8 =256 possible quantized values – each quantized value represented by bits e.g., 8 bits for 256 values time audio signal amplitude analog signal quantized value of analog value quantization error sampling rate (N sample/sec)

Image An image is an artifact that depicts or records visual perception, for example a two-dimensional picture A bitmap image is a digital image composed of a matrix of dots (or pixels). – Each pixel has its own color. – The number of pixels in an image is called resolution. A vector image is made up of many individual objects. – These objects (lines and curves) are defined by mathematical equations rather than pixels, so they always render at the highest quality. – Resolution independent

Bitmap Image 1bit black and white 8 bit gray-scale 24 bit Red-Green-Blue (RGB) RedGreenBlue

Image Compression Reducing irrelevance and redundancy of the image data in order to store or transmit data in an efficient form. Lossless compression – Allows the original data to be perfectly reconstructed from the compressed data – Applications: text articles, bank records, medical images.. – Run-length encoding (RLE) is a very simple form of lossless data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count. e.g. RRRRRRRRRRGGGGGGGGGGGGBBBBBBBBBBBBBBBB R10G12B16

Image Compression Lossy compression – Uses inexact approximations (or partial data discarding) to represent images. – May reduce file sizes significantly before degradation is noticed by the end-user. – Even when noticeable by the user, further data reduction may be desirable (e.g. for real-time communication, to reduce transmission times, or to reduce storage needs). – Applications: multimedia data (audio, video, and images), especially streaming media and internet telephony.

JPEG (Joint Photographic Experts Group) A commonly used method of lossy compression for digital images. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. Algorithms 1.Color space transformation: from RGB to YCbCr 2.Block splitting: cut up the image in separate blocks of 8 x 8 pixels 3.For each block, a Discrete Cosine Transform (DCT) transformation is applied. 4.Quantization: values in blocks are changed to integers (rounded). 5.Lossless compression

Video Video: sequence of images displayed at constant rate e.g. 24 images/sec Coding: use redundancy within and between images to decrease # bits used to encode image ……………………...… spatial coding example: instead of sending N values of same color (all purple), send only two values: color value (purple) and number of repeated values (N) ……………………...… frame i frame i+1 temporal coding example: instead of sending complete frame at i+1, send only differences from frame i

Fundamentals of Data Representation 2016.2.15 Yusung Kim

Similar presentations

Presentation on theme: "Fundamentals of Data Representation 2016.2.15 Yusung Kim"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fundamentals of Data Representation 2016.2.15 Yusung Kim

Similar presentations

Presentation on theme: "Fundamentals of Data Representation 2016.2.15 Yusung Kim"— Presentation transcript:

Similar presentations

About project

Feedback