Lecture 6 Data representation
Storing information bit can contain only two values: 0 and 1 2 bits can contain four values: 00, 01, 10, 11 and so on: 8 bits can store 256 different data
How to store whole numbers We can store it in 8 bit number, starting from 0 to 255 0 - 0000 0000 255 - 1111 1111
How to store negative and positive ints in 8 bits Set first bit as a sign, so 0 means positive, and 1 is negative 0000 0100 is 4 1000 0100 is -4 But… it will have two zeros: 1000 0000 and 0000 0000 and they are not equal! It stores numbers from -127 to 127, totally 255 numbers
Two's complementary way In negative numbers store complement of number 0000 0100 - 4 1000 0000 - -127 1000 0010 - -125 1111 1111 - -1 It store 256 numbers, from -128 to 127
How to store real (double or float) numbers research it by yourself
Morse code
Storing text To store text early day computer engineers created ASCII code ASCII code was 8 bit, so it means that it can store 256 different characters ASCII contained different symbols, english alphabet (uppercase and lowercase letters)
ASCII code ASCII coding is standard coding in computers. 65 - 93 for capital letters 97 - 123 for lower case letters 48 - 58 for digits ASCII is stored in one byte memory A = 65 = 01000001 z = 123 = 01111011
Problem: How to store non-English characters Early approach: every alphabet used it's own encodings. Problem: How to store text that contains letters from different alphabets
Different encodings Windows-1250 for Central European languages that use Latin script, (Polish, Czech, Slovak, Hungarian, Slovene, Serbian, Croatian, Romanian and Albanian) Windows-1251 for Cyrillic alphabets Windows-1252 for Western languages Windows-1253 for Greek Windows-1254 for Turkish Windows-1255 for Hebrew and etc.
Problem: World alphabets There are many alphabets that are used in the world: Latin (spanish, german, finnish), Arabic (Persian), Hebrew, Chinese hieroglyphs, Korean, Japanese (Hiragana, Katakana), Cyrillic (Kazakh, Tatar, Serbian, Ukrainian), Tamil, Armenian, Mongolian, Greek, georgian. How to represent all of them in one document?
Problem: how to write following text: شخص جيد אדם טוב 좋은 사람 καλό πρόσωπο មនុស្សម្នាក់ដ៏ល្អ நல்ல நபர் Using different encoding for each script won’t allow you to write text with different scripts
Unicode All symbols stored in one table. Modern version contains 28 ancient and historic scripts (alphabets) and 72 modern scripts Contains 110,000 characters Can store text containing different scripts
UTF-8 what is it? UTF-8 (UCS Transformation Format—8-bit) is a variable-width encoding that can represent every character in the Unicode character set. It is compatible with ASCII means any file stored by UTF-8 but from symbols that are present in ASCII, will be same as stored by UTF-8
UTF-8 a = 65 = 01000001 ¢ = 11000010 10100010 € = 11100010 10000010 10101100 欽 = 6B3D
Use Unicode symbols in Python Put following to the first line of python code # -*- coding: utf-8 -*- print u“қазақша”
Images and colors Image is a set of pixels. Pixel is one cell on screen, which contains only one color. Image is stored in sequence of pixels, which is represented by its colors
How to store color Approach #1: Combine three colors: Cyan, Magenta, Yellow. Used in printers. Approach #2: Combine three colors: Red, Green, Blue. Used in displays. In computers it mostly saves every color
CMYk vs RGB Combination of Cyan+Magenta+Yellow gives black, and if there is no color it gives white Whilst combination of Red+Green+Blue gives white, and if there is no color it gives black So why CMY is used in printing and why RGB is used in monitors?
Mixing colors
Image Image is a sequence of pixels Pixel is one cell on screen of monitor, it displays color. Color is a combination of three colors (RED, GREEN, BLUE) Bitmap - is a map of pixels
Bit depth The amount of colours that can be represented in a bitmapped image is dictated by the bit depth. Bit depth Available colours 8 bits per pixel 256 (28) 16 bits per pixel 65,536 (216) 24 bits per pixel 16,777,216 (224)
PBM PBM file format to represent bitmap images. So 1 means white, and 0 means black
PNM PNM file format to represent color images
Vector graphics Vector graphics are stored as a list of attributes. The attributes are used by the computer to create the graphic. Rather than storing the data for each pixel, the computer will generate an object by looking at its attributes It saves geometrical information about image.
Raster (Bitmap) vs Vector graphics
Raster vs Vector Loads faster: Raster. Can be zoomed without lose of quality: Vector Takes less memory for simple figures: Vector Used in typography: Vector Best for real-world images: Raster try to understand why?
SVG (Scalable vector graphics) SVG most common vector graphics format. Used in web pages and in mobile applications Format that is based on XML, and can create vector graphics.
SVG example <svg xmlns="http://www.w3.org/2000/svg" version="1.1"> <circle cx="100" cy="50" r="40" stroke="black" stroke-width="2" fill="red" /> </svg>
SVG example <svg height="210" width="500"> <polygon points="200,10 250,190 160,210" style="fill:lime;stroke:purple;stroke-width:1" /> </svg> <svg height="80" width="300"> <g fill="none"> <path stroke="red" d="M5 20 l215 0" /> <path stroke="black" d="M5 40 l215 0" /> <path stroke="blue" d="M5 60 l215 0" /> </g> <svg height="150" width="400"> <defs> <linearGradient id="grad1" x1="0%" y1="0%" x2="100%" y2="0%"> <stop offset="0%" style="stop-color:rgb(255,255,0);stop-opacity:1" /> <stop offset="100%" style="stop-color:rgb(255,0,0);stop-opacity:1" /> </linearGradient> </defs> <ellipse cx="200" cy="70" rx="85" ry="55" fill="url(#grad1)" />
SVG animation <svg height="60" width="200"> <text x="0" y="15" fill="red" transform="rotate(30 20,40)">I love SVG</text> Sorry, your browser does not support inline SVG. </svg> <svg width="400" height="400"> <rect x="20" y="20" width="250" height="250" style="fill:blue"> <animate attributeType="CSS" attributeName="opacity" from="1" to="0" dur="5s" repeatCount="indefinite" /> </rect> <svg width="600" height="600"> <g transform="translate(100,100)"> <text id="TextElement" x="0" y="0" style="font-family:Verdana;font-size:24; visibility:hidden"> It's SVG! <set attributeName="visibility" attributeType="CSS" to="visible" begin="1s" dur="5s" fill="freeze" /> <animateMotion path="M 0 0 L 100 100" begin="1s" dur="5s" fill="freeze" /> <animateTransform attributeName="transform" attributeType="XML" type="rotate" from="-30" to="0" begin="1s" dur="5s" fill="freeze" /> <animateTransform attributeName="transform" attributeType="XML" type="scale" from="1" to="3" additive="sum" begin="1s" dur="5s" fill="freeze" /> </text> </g>
Plain text data/file formats That are standard data formats that are stored in plain form This file formats are used to interchange data in web, applications and etc. JSON XML HTML CSV
XML: extensible markup language <group name=”D03”> <student id=”332”>John Black</student> <student id=”321”>Mike Pawn</student> <student id=”320”>Jeremy King</student> </group>
JSON: javascript object notation [ { name: “A04”, students: [ {id:”332”,name:“John Black”}, {id:”322”,name:“Jeremy King”} ] },{ name: “B04”, [ {id:”332”,name:“John Black”}, } ]
CSV Tabular data saved in CSV format name,surname,group steve,jobs,A03 michael,phelps,B03 Can be opened by Excel, used for sending tabular information
Have questions? What were these texts? They were document formats Who uses them? Developers use it, to send data between different applications Can we use other format or create format by ourselves? Yes, but this are standard formats, so everyone knows it, and also there are tons of libraries working with them
HTML Hyper Text Markup Language: is markup language that stores how elements are placed on a web-page <html> <body> <h1>Header</h1> </body> </html>
HTML <p>Paragraph</p> <h1>Header</h1> <img src="1.jpg"/> <ul><li>Item</li><li>Item</li><li>Item</li></ul> <a href="1.html">Link to item</a>
Browsers Web browsers retrieve data (mostly HTML code) from server and displays it on screen Nowadays browsers are free, but before people had to buy browsers
History of browser 1990 - World Wide Web browser (later renamed to Nexus) 1993 - Mosaic (later called Netscape) 1995 - Internet Explorer, as answer to Netscape 1996 - Opera 2004 - Firefox 1.0. on the base of Netscape 2003 - Apple’s Safari 2008 - Google’s Chrome
Usage of browsers