End-to-End Data Outline Presentation Formatting Data Compression
Problem The sender and receiver seeing the same data is often called the presentation format. The efficiency of the encoding involves the error detection/correcting and data compression.
Presentation Formatting The transformations of network data from the representation used by the application into a form suitable for transmission is called presentation formatting. The sending program encodes data into a message and the receiving application decodes the message into data. Encoding is sometimes called argument marshalling, and decoding called unmarshalling.
Presentation Formatting Data types we consider –integers –floats –strings –arrays –structs Application data Presentation encoding Application data Presentation decoding Message … Types of data we do not consider –images –video –multimedia documents
Difficulties Representation of base types –floating point: IEEE 754 versus non-standard –integer: big-endian versus little-endian (e.g., 34,677,374) Compiler layout of structures (126)(34)(17)(2) Big-endian Little-endian (2)(17)(34)(126) High address Low address
Taxonomy Data types –base types (e.g., ints, floats); must convert –flat types (e.g., structures, arrays); must pack –complex types (e.g., pointers); must linearize Conversion Strategy –canonical intermediate form –receiver-makes-right (an N x N solution) Marshaller Application data structure
Taxonomy (cont) Tagged versus untagged data Stubs –compiled –interpreted type = INT len = 4value = Call P Client stub RPC Arguments Marshalled arguments Interface descriptor for Procedure P Stub compiler Message Specification P Server stub RPC Arguments Marshalled arguments Code
eXternal Data Representation (XDR) Defined by Sun for use with SunRPC C type system (without function pointers) Canonical intermediate form Untagged (except array length) Compiled stubs
#define MAXNAME 256; #define MAXLIST 100; struct item { int count; char name[MAXNAME]; int list[MAXLIST]; }; bool_t xdr_item(XDR *xdrs, struct item *ptr) { return(xdr_int(xdrs, &ptr->count) && xdr_string(xdrs, &ptr->name, MAXNAME) && xdr_array(xdrs, &ptr->list, &ptr->count, MAXLIST, sizeof(int), xdr_int)); } CountName JO37HNSON List
Abstract Syntax Notation One (ASN-1) An ISO standard Essentially the C type system Canonical intermediate form Tagged Compiled or interpretted stubs BER: Basic Encoding Rules (tag, length, value) value type lengthvaluelengthtypevaluelength
Network Data Representation (NDR) Defined by DCE Essentially the C type system Receiver-makes-right (architecture tag) Individual data items untagged Compiled stubs from IDL 4-byte architecture tag –IntegerRep 0 = big-endian 1 = little-endian –CharRep 0 = ASCII 1 = EBCDIC –FloatRep 0 = IEEE = VAX 2 = Cray 3 = IBM
Compression Overview Encoding and Compression –Huffman codes Lossless –data received = data sent –used for executables, text files, numeric data Lossy –data received does not != data sent –used for images, video, audio
Huffman Codes Huffman coding [1952] can be used as a reasonable approximation to the theoretical limit. 1.Write down the symbols and their probabilities: A B C D They are the terminal nodes. 2.Find and mark the two smallest nodes. Add a node with arcs to the nodes marked. 3.Set the probability of the new node to the sum of marked nodes. 4.Repeat steps 2 and 3 until all nodes have been marked, except one the root. 5.The encoding is found by tracing the path from the root to the symbol, with left=0, right=1.
Huffman Codes () / \ / \1 / \ 0/ () / / \ / 0/ \1 / / \ / / () / / / \ / / 0/ \1 / / / \ (A) (B) (C) (D)
Lossless Algorithms Run Length Encoding (RLE) –Replace consecutive occurrences of a given symbol with only one copy of the symbol, plus a count of how many times that symbol occurs. –example: AAABBCDDDD encoding as 3A2B1C4D –good for scanned text (8-to-1 compression ratio) –can increase size for data with variation (e.g., some images) Differential Pulse Code Modulation (DPCM) –First output a reference symbol and then, for each symbol in the data, to output the difference between that symbol and the reference symbol. –example AAABBCDDDD encoding as A –change reference symbol if delta becomes too large –works better than RLE for many digital images (1.5-to-1)
Dictionary-Based Methods Build dictionary of variable-length strings of common terms Transmit index into dictionary for each term –For example, replace ‘compression’ with ‘compression’ is 9293rd in /usr/share/dict/words. Lempel-Ziv (LZ) – compress command is the best-known example. Commonly achieve 2-to-1 ration on text Variation of LZ used to compress GIF images –first reduce 24-bit color to 8-bit color –treat common sequence of pixels as terms in dictionary –not uncommon to achieve 10-to-1 compression ( x 3)
Image Compression JPEG (Joint Photographic Experts Group) is an ISO/IEC group of experts that develops and maintains standards for a suite of compression algorithms for computer image files. JPEG is also a term for any graphic image file produced by using a JPEG standard. A JPEG file is created by choosing from a range of compression qualities (actually, from one of a suite of compression algorithms). Lossy still-image compression
MPEG The Moving Picture Experts Group (MPEG), develops standards for digital video and digital audio compression. Lossy compression of video First approximation: JPEG on each frame Also remove inter-frame redundancy