1 COM 360. 2 Chapter 7 End-To-End Data 3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these.

1 COM 360

2 Chapter 7 End-To-End Data

3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these messages are strings of bytes. From the application’s perspective, these messages contain different kinds of data- arrays of integers, video frames, lines of text, digital images, etc. These bytes have meaning.

4 Presentation Format In the case of the sender and receiver seeing the same data, the issue is one of agreeing to a message format, called a presentation format. Traditional data can be encoded as integers, floating point numbers, character strings, arrays and structures. Formats also exist for multimedia data: video (MPEG format), still images (JPEG or GIF).

5 Encoding Efficiency There are two opposing forces: –In one direction, you want as much redundancy as possible, so that the receive is able to extract the right data even if errors are introduced into the message. –In the other direction, you want to remove as much redundancy as possible, so that you may encode the data in as few bits as possible. This is the goal of data compression.

6 Data Manipulation Both presentation format and data compression require the sending and receiving hosts to process every byte of data in the message. For this reason presentation formatting and compression are called data manipulation functions.

7 Presentation Formatting Sending program translates the data it wants to transmit from the internal representation into a message that can be transmitted over the network; the data is the encoded in a message. Receiving application translates this arriving message into a representation it can process; that is the message is decoded. Encoding is called argument marshalling and decoding is called unmarshalling. This terminology is from the RPC world, where the client thinks it is invoking a procedure with a set of arguments that form a network message.

8 Presentation Encoding Application data Presentation encoding Application data Presentation decoding Message ■ ■ ■ Presentation formatting involves encoding and decoding application data.

9 Presentation Formatting Computer represent data in different ways. Even simple integers are represented with different sizes (16 bit, 32 bit, 64 bit, etc.). On some machines, integers are represented in “big-endian” form (most significant bit is in the byte with the lowest address) while on other machines, integers are represented in “little-endian” form (most significant bit is in the byte with the “highest” address). MIPS and PowerPC are big-endian machine and the Intel x86 family is a little-endian architecture.

10 Integer Representation Big-endian and little-endian byte order for the integer 34,677,374

11 Presentation Formatting Another reason encoding (or marshalling) is difficult is that application programs are written in different languages or compiled with different compilers. Thus, you cannot simply transmit a structure from one machine to another.

12 Taxonomy Data types –Base types (int, float, char) –Flat types ( structures and arrays) –Complex types ( those using pointers) Conversion Strategy –Canonical intermediate form –Receiver-makes-right Tags –Tagged –Untagged

13 Data Types The first question is what data types the system is going to support. At the lowest level the a marshalling system operates on some set of base types. The encoding process must be able to convert each base type from one representation to another, for example convert an integer from big-endian to little-endian.

14 Data Types At the next level are the flat types - structures and arrays. These complicate encoding since some compilers insert padding between fields that make up structures to align the fields on word boundaries. The marshalling system typically packs structures so they contian no padding.

15 Data Types At the highest level, the marshalling system must deal with complex types – those that use pointers. The problem is that the data structure that one program wants to send to another might not be contained in a single structure, but might involve pointers from one structure to another. A tree is an example of this type of structure. The system must serialize (or flatten) complex data structures.

16 Argument Marshalling Converting, packing, and linearizing

17 Argument Marshalling To summarize: Depending on how complicated the type system is, the task of argument marshalling usually involves converting base types, packing the structures and linearizing the complex data structures all to form a contiguous message that can be transmitted over the network.

18 Conversion Strategy What conversion strategy should be used? There are two options: –Canonical intermediate form- settle on an external representation for each type. The sender converts to this before sending. ( This is done in the Internet for protocol headers.) –Receiver-makes-right- sender uses its own format and the receiver is responsible for translating. But every host must be prepared to handle all N architectures (called the N-by-N solution) The more common approach is to use a common external format, but there are other approaches.

19 Tags How does the receiver know what kind of data is contained in the message it receives? The two common approaches are to use tagged and untagged data. A tag is any additional information included in a message that helps the receiver to decode it. Examples might include a type tag, a length tag, an architecture tag.

20 Tagged message A 32 bit integer encoded in a tagged message.

21 Tags Alternative is NOT to use tags. How then does the receiver know how to decode the data? –It knows because it was programmed to “know”. Calling a remote procedure with arguments (for example, 2 ints and a float), assumes that the message contains 2 integers and a float. This breaks down with variable length arrays, so a length tag is commonly used to indicate how long the array is. An untagged approach means that the presentation formatting is truly end-to-end and an intermediate agent cannot interpret it.

22 Stubs A stub is a piece of code that implements argument marshalling and are often used to support RPC. On the client side, the stub marshals the procedure arguments into a message that can be transmitted by the RPC protocol. On the server side, the stub converts the message back into a set of variables that can be used as arguments to call the procedure. Each procedure has a “customized” client and server stub. Stubs are usually generated by a stub compiler.

23 Stub Compiler A Stub compiler takes the interface description as input and outputs client and server stubs.

24 Examples Popular network data representations include: XDR- used with SunRPC ASN.1- an ISO standard NDR- used in distributed systems (See text for details)

25 XDR Example Example encoding of a structure in XDR

26 Compound types Compound types created by means of nesting in ASN.1/BER

27 ASN.1/BER Representation Representation for a 4 byte integer

28 ASN.1/BER Representation For Length a) 1 byte b) multibyte

29 NDR’s Architecture Tag

30 Presentation Formatting The presentation formatting problem has been seen from the perspective of RPC – how do you encode primitive data types and compound data structures so they can be sent from a client program to a server program? The same problem exists in other different settings - like a web server…

31 Markup Language (XML) How does a Web server describe a webpage so that any number of browsers know what to display on the screen? HTML or HyperText Markup Language is used to indicate which font and type should be used, whether a string should be displayed in bold or italics and where images should be positioned. This is basically text formatting. XML or Extensible Markup Language can be viewed as a data representation standard. Unlike HTML, which only describes how to display data, XML describes the data, whether it is displayed, processed or sent to another application.

32 Markup Language (XML) XML is text based and looks very much like HTML. A structure like an employee record, can be saved as an.xml document. XML allows users to specify a nested structure of tag/value pairs. It also allows a user to define a schema, a database term for a specification of how to interpret a collection of data. XML defines a syntax for describing data that applications might share in the Internet. See pp.546-548 or http://www.cs.princeton.edu/schema/employee.xsd

33 Data Compression Sometimes programs need to send more data in a timely fashion than the bandwidth of the network supports. (For example, a video stream that needs 10Mbps to transmit on a 1 Mbps network). It’s hard to move data on the Internet at > 1Mbps. The Internet does not allow applications to use more than their “fair share” of the bandwidth on a congested link. Need to compress the data at the sender and decompress it at the receiver.

34 Data Compression Compression is inseparable from data encoding. You want to encode a piece of data in the fewest bits. Principle of Huffman codes, which uses the relative probability that a symbol will occur and then assigns the number of bits to each symbol in a way to minimize the number of bits.

35 Data Compression Is it always a good idea to compress data before sending? Can’t the network deliver it in less time? Not necessarily! Compression/decompression algorithms involve time-consuming computations. You need to compare the time to compress/decompress the data to factors such as the host’s processor speed and the network bandwidth. See computation p. 549

36 Data Compression Two classes of compression algorithms: Lostless – ensures that the data recovered is the same as the original data. Used to compress executable code, text files, numeric data, etc. Lossy – does not promise this. The algorithm removes data it cannot restore, but which will not be missed by the user. Used to compress still images, audio and video. They achieve much better compression ratios.

37 Lostless Encoding Algorithms Run Length Encoding (RLE) – simple. It replacesconsecutive instances of a symbol with one copy of the symbol and a count of times it occurred (AAABBCDDDD would be encoded as 3A2B1C4D) Used to compress digital imagery by comparing adjacent pixel values and encoding only the changes. Achieves about 8-to-1 compression ratios. RLE is the key compression algorithm used to transmit faxes.

38 Lostless Encoding Algorithms Differential Pulse Code Modulation Dictionary Based Methods

39 Image Compression (JPEG) JPEG, GIF and MPEG are more than compression algorithms, they also define the format for image or video data in the same way as XDR, NDR and ANS.1 define the format for numeric and string data. JPEG -Named after the Joint Photographic Experts Group that designed it. Used for digital imagery (ISO standard) Compression takes place in 3 phases: a discrete cosine transformation (DCT), quantization and encoding

40 Block Diagram of JPEG Compression

41 Grayscale image Each pixel in the image is represented by an 8 bit value that indicates the brightness of the pixel ( where 0 = white and 255=black). DCT, like a fast Fourier Transform(FFT) takes an 8x8 matric of pixel values as input.

42 Color images In the case of color there are many values for each pixel. One representation is RGB, which represents each pixel with 3 color components: red, green, blue. This matches the human visual system, better than other representations such as YUV. JPEG allows some control over how much compression is done versus fidelity to the image and is able to compress 24-bit color images.

43 Video Compression (MPEG) MPEG -Named after the Motion Picture Experts Group that defined it.

44 MPEG Compression Sequence of I, P, B frames generated by MPEG

45 MPEG Frame Types

46 MPEG Video Stream SeqHdrGroup of picturesSeqHdrGroup of picturesSeqEndCode GOPHdrPicture SlicePictureHdrSlice Macroblock SliceHdrMacroblock MBHdrBlock(0)Block(1)Block(2)Block(3)Block(4)Block(5) ■ ■ ■

47 Audio Compression (MP3) MPEG also defines a standard for compressiong audio, and can be used to compress the audio portion of a movie. MPEG define 3 layers of compression, and Layer III, commonly known as MP3, is the most widely used.

48 Summary Unlike earlier protocols which you can think of as processing messages, these transformations process data: Presentation formatting- formatting the different types of data Compression- reducing the bandwidth required to transmit different types of data. –Lossless for executables and text fiels –Lossy for images and video (JPEG, MPEG and MP3)

1 COM 360. 2 Chapter 7 End-To-End Data 3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these.

Similar presentations

Presentation on theme: "1 COM 360. 2 Chapter 7 End-To-End Data 3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 COM 360. 2 Chapter 7 End-To-End Data 3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these.

Similar presentations

Presentation on theme: "1 COM 360. 2 Chapter 7 End-To-End Data 3 What Do We Do With The Data? From the network’s perspective, applications send messages to one another and these."— Presentation transcript:

Similar presentations

About project

Feedback