Chapter 7 – End-to-End Data Two main topics Presentation formatting Compression We will go over the main issues in presentation formatting, but not much detail More detail will be covered in compression, especially JPEG and MPEG
Presentation Formatting/Encoding The receiver must be able to extract the same message from the signal as the transmitter sent Encoding is sometimes called argument marshalling Marshalling is actually not trivial – because compilers and application programs have a lot of latitude in how they lay out structures (records) Look over the next high-level graphic
Application data Presentation encoding Application data Presentation decoding Message …
Taxonomy Base types – lowest level Integers, floating point, characters, etc. Flat types Structures Arrays Complex types – highest level Types requiring pointers
Examples Case 1 – sending an ordered string of integers (say financial market data) over the internet – no problem breaking this up into a string of bytes and no problem reassembling the data at the end
Examples continued Case 2 – sending a database of student records over a network. Students would have different numbers of courses that took, so the records would be of different length but the fields would probably be fixed length Packing and unpacking the data would have some difficulties involved
Examples continued Case 3 – a hierarchical database stored in a format with pointers needs to be transmitted over the Internet Packing and unpacking is a large problem Pointers are implemented by memory addresses and will change from one machine (sender) to another (receiver) Marshalling must serialize a complex, pointer implementation of a database – quite difficult!
Two Conversion Strategies Sender converts to common format, common format is transmitted, receiver decodes from common format Seems natural, but … Receiver-makes-right – transmit and let the receiver figure it all out Surprisingly this is often the better approach See the reasons on page 533
Call P Client stub RPC Arguments Marshalled arguments Interface descriptor for Procedure P Stub compiler Message Specification P Server stub RPC Arguments Marshalled arguments Code
Data Compression Blue.bmp = 293 KB Blue.jpeg = 4 KB Not much information Length, width of each area Color of each area
Data Compression - Why Bandwidth is a scarce resource, someone still has to pay for it Often important to compress the data at the sender then transmit the compressed form then decompress it at the receiver.bmp is a good format for application programs like “Paint” but it is much better to transmit with the.jpeg file format
Two classes of Compression Lossless Data recovered from the compression/decompression process is the same as the original Lossy Some information might be removed by the compression/decompression process
Why not always Lossless? Lossy algorithms typically achieve an order of magnitude (10x) better compression than a lossless algorithm 10x makes a big difference in the amount of data that must be transmitted Still images, video and audio are all intended for human eyes or ears – which can tolerate errors and imperfections – because the brain can compensate
Lossless Algorithms Run length encoding (RLE) Replaces consecutive occurrences of a symbol with a single symbol and the number of times it occurs (example: AAACCCC is 3A4C) Differential Pulse Code Modulation Records differences from the base symbol Dictionary-Based Each string is replace with its index in a dictionary
Lossless Example – Differential Encoding Basic idea is to encode changes. Concept is also used in some lossy algorithms No need to store all the information in each of the following pictures – for the last two just the changes which are much smaller Frame 1 A B C Frame 2 A B C D E F then just store “D E F” for Frame 2 and “add” it to Frame 1 to restore Frame 2
Image Compression (JPEG) JPEG = Joint Photographic Experts Group More than a compression algorithm, also defines the format for image or video data JPEG compression takes place in stages Aside first: Fourier transforms and filtering
Fourier Transform Consider the following graph It is a weighted sum of 5 sine waves But the coefficients of the higher frequency terms are very small So the entire figure can be approximated well by the low order terms
1 st Order Approximation Only the first term – not a good approximation
2 nd Order Approximation The first two terms give a better approximation
4 th Order Approximation Skipping ahead to 4 terms the approximation is excellent – almost exact
5 th Order Approximation would be Exact Since the original function is a weighted sum of the first 5 sin terms - sin(kt) - the information that uniquely represents the function is the set of coefficients [10;5;2;1;.5] As we saw we could drop the.5 coefficient and retain most of the shape of the curve – hence our information loss would be very slight. A simple example of lossy.
Fourier vs. DCT The Discrete Cosine Transform (DCT) is very similar to the Fourier Transform (see pages and note that we are using a 2- dimensional transform)
Source image JPEG compression DCTQuantizationEncoding Compressed image
MPEG A very difficult algorithm! Like JPEG for a single frame, but it has three basic kinds of frames Encoding is very difficult and computationally intensive, hence slow, often done offline Decoding is the only part usually done real time
Three Phases Study over the three phases of JPEG DCT Quantization similar to our example of dropping the 5 th coefficient and retaining a graph that was very similar to the original Encoding phase
Frame 1Frame 2Frame 3Frame 4Frame 5Frame 6Frame 7 I frameB frame P frameB frame I frame MPEG compression Forward prediction Bidirectional prediction Compressed stream Input stream
16 16 macroblock with Y component 8 8 macroblock with U component 8 8 macroblock with V component 16 pixel region Color frame
SeqHdrGroup of picturesSeqHdrGroup of picturesSeqEndCode GOPHdrPicture SlicePictureHdrSlice Macroblock SliceHdrMacroblock MBHdrBlock(0)Block(1)Block(2)Block(3)Block(4)Block(5) … … … …