ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3
ITEC 1011 Introduction to Information Technologies Introduction Examples pp Real World Data Computer Data Input device Dear Mom: Keyboard … Digital camera …
ITEC 1011 Introduction to Information Technologies Format must be appropriate The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound)
ITEC 1011 Introduction to Information Technologies Rules/Conventions Proprietary formats –Unique to a product or company –E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes Standards –Evolve two ways: Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time) Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG) pp
ITEC 1011 Introduction to Information Technologies Standards Organizations ISO – International Standards Organization CSA – Canadian Standards Association ANSI – American National Standards Institute IEEE – Institute for Electrical and Electronics Engineers Etc.
ITEC 1011 Introduction to Information Technologies Examples of Standards Type of DataStandards AlphanumericASCII, EBCDIC, Unicode ImageJPEG, GIF, PCX, TIFF Motion pictureMPEG-2, Quick Time SoundSound Blaster, WAV, AU Outline graphics/fontsPostScript, TrueType, PDF
ITEC 1011 Introduction to Information Technologies Why Standards? Standard are “arbitrary” They exist because they are –Convenient –Efficient –Flexible –Appropriate –Etc.
ITEC 1011 Introduction to Information Technologies Alphanumeric Data Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters “123” (one, two, three) Four standards for representing letters (alpha) and numbers –BCD – Binary-coded decimal –ASCII – American standard code for information interchange –EBCDIC – Extended binary-coded decimal interchange code –Unicode pp
ITEC 1011 Introduction to Information Technologies Next 2 slides Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode
ITEC 1011 Introduction to Information Technologies Binary-Coded Decimal (BCD) Four bits per digit DigitBit pattern Note: the following bit patterns are not used:
ITEC 1011 Introduction to Information Technologies Example = ? (in BCD)
ITEC 1011 Introduction to Information Technologies Next 22 slides Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode
ITEC 1011 Introduction to Information Technologies The Problem Representing text strings, such as “ Hello, world”, in a computer
ITEC 1011 Introduction to Information Technologies Codes and Characters Each character is coded as a byte Most common coding system is ASCII (Pronounced ass-key) ASCII = American National Standard Code for Information Interchange Defined in ANSI document X
ITEC 1011 Introduction to Information Technologies ASCII Features 7-bit code 8 th bit is unused (or used for a parity bit) 2 7 = 128 codes Two general types of codes: –95 are “Graphic” codes (displayable on a console) –33 are “Control” codes (control features of the console or communications channel)
ITEC 1011 Introduction to Information Technologies ASCII Chart
ITEC 1011 Introduction to Information Technologies
ITEC 1011 Introduction to Information Technologies Most significant bit Least significant bit
ITEC 1011 Introduction to Information Technologies e.g., ‘a’ =
ITEC 1011 Introduction to Information Technologies 95 Graphic codes
ITEC 1011 Introduction to Information Technologies 33 Control codes
ITEC 1011 Introduction to Information Technologies Alphabetic codes
ITEC 1011 Introduction to Information Technologies Numeric codes
ITEC 1011 Introduction to Information Technologies Punctuation, etc.
ITEC 1011 Introduction to Information Technologies “Hello, world” Example ======================== Binary Hexadecimal C 6F 2C C 64 Decimal Hello, worldHello, world ======================== ========================
ITEC 1011 Introduction to Information Technologies Common Control Codes CR0Dcarriage return LF0Aline feed HT09horizontal tab DEL7Fdelete NULL00null Hexadecimal code
ITEC 1011 Introduction to Information Technologies
ITEC 1011 Introduction to Information Technologies Terminology Learn the names of the special symbols –[ ] brackets –{ }braces –( )parentheses ‘at’ sign –& ampersand –~tilde
ITEC 1011 Introduction to Information Technologies
ITEC 1011 Introduction to Information Technologies Escape Sequences Extend the capability of the ASCII code set For controlling terminals and formatting output Defined by ANSI in documents X and X The escape code is ESC = 1B 16 An escape sequence begins with two codes: ESC [ 1B 16 5B 16
ITEC 1011 Introduction to Information Technologies Examples Erase display:ESC [ 2 J Erase line:ESC [ K
ITEC 1011 Introduction to Information Technologies Next 1 slides Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode
ITEC 1011 Introduction to Information Technologies EBCDIC Extended BCD Interchange Code (pronounced ebb’-se-dick) 8-bit code Developed by IBM Rarely used today IBM mainframes only
ITEC 1011 Introduction to Information Technologies Next 2 slides Standard Alphanumeric Formats BCD ASCII EBCDIC Unicode
ITEC 1011 Introduction to Information Technologies Unicode 16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes
ITEC 1011 Introduction to Information Technologies Unicode Version Improves on version 2.0 Includes the Euro sign (20AC 16 = ) From the standard: …contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.
ITEC 1011 Introduction to Information Technologies Keyboard Input Key (“scan”) codes are converted to ASCII ASCII code sent to host computer Received by the host as a “stream” of data Stored in buffer Processed Etc. pp. 69
ITEC 1011 Introduction to Information Technologies Shift Key inhibits bit 5 in the ASCII code Key(s) ASCII code Character aAaA a aShift
ITEC 1011 Introduction to Information Technologies Control Key inhibits bits 5 & 6 in the ASCII code Key(s) ASCII code Character c ETX c cCtrl Control code
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies OCR Hello, world Page of text Optical scan … Computer file
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies Bar Codes An automatic identification (Auto ID) technology that streamlines identification and data collection See
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies Voice/audio Input Input device: microphone Audio input is “digitized” and stored Processed in two ways –As is (no recognition) –Recognized and converted to alphanumeric data (ASCII) Digitize …
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies Punched Cards Invented by Herman Hollerith (founder of IBM) Each card holds 80 characters
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies Images Typically images are pictures that are optically scanned and saved as a “bit map” or in some other format Many formats –gif, jpeg, …
ITEC 1011 Introduction to Information Technologies Typical “Save As” Dialog
ITEC 1011 Introduction to Information Technologies Objects Images made of geometrically definable shapes Offer efficiency, flexibility, small size, etc.
ITEC 1011 Introduction to Information Technologies Other Input OCR – optical character recognition Bar code readers Voice/audio input Punched cards Images / objects Pointing devices pp
ITEC 1011 Introduction to Information Technologies Pointing Devices Originally used for specifying coordinates (x, y) for graphical input Today used as general purpose device for “graphical user interfaces” (GUIs)
ITEC 1011 Introduction to Information Technologies Thank you