Copyright Fujitsu & Savvion © 2000 ebXML Reliability Messaging A Proof of Concept Implementation Fujitsu Savvion Author: Jacques Durand

Slides:



Advertisements
Similar presentations
Data Link Layer Protocols Flow Control in Data Link Layer.
Advertisements

Introduction 1 Lecture 13 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
IS333, Ch. 26: TCP Victor Norman Calvin College 1.
Buffered Data Processing Procedure Version of Comments MG / CCSDS Fall Meeting 2012 Recap on Previous Discussions Queue overflow processing.
Data link control. Data Link Control –Flow Control how much data may sent –Error Control How can error be detected and corrected.
Flow and Error Control. Flow Control Flow control coordinates the amount of data that can be sent before receiving acknowledgement It is one of the most.
Fundamentals of Computer Networks ECE 478/578 Lecture #21: TCP Window Mechanism Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
Error control An Engineering Approach to Computer Networking.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
Cs4411 – Operating Systems Practicum November 4, 2011 Zhiyuan Teo Supplementary lecture 4.
Transport Layer 3-1 Transport Layer r To learn about transport layer protocols in the Internet: m TCP: connection-oriented protocol m Reliability protocol.
Transport Layer3-1 Reliable Data Transfer. Transport Layer3-2 Principles of Reliable data transfer r important in app., transport, link layers r top-10.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Transport Protocols Slide 1 Transport Protocols.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
3-1 Transport services and protocols r provide logical communication between app processes running on different hosts r transport protocols run in end.
1 Transport Layer goals: r understand principles behind transport layer services: m multiplexing/demultiplexing m reliable data transfer m flow control.
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
Principles of Reliable Data Transfer. Reliable Delivery Making sure that the packets sent by the sender are correctly and reliably received by the receiver.
CS 4396 Computer Networks Lab
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Chapter 11 Data Link Control Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.
Wireless TCP Prasun Dewan Department of Computer Science University of North Carolina
26-TCP Dr. John P. Abraham Professor UTPA. TCP  Transmission control protocol, another transport layer protocol.  Reliable delivery  Tcp must compensate.
3: Transport Layer 3a-1 8: Principles of Reliable Data Transfer Last Modified: 10/15/2015 7:04:07 PM Slides adapted from: J.F Kurose and K.W. Ross,
Arrays Tonga Institute of Higher Education. Introduction An array is a data structure Definitions  Cell/Element – A box in which you can enter a piece.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Cyclic Code. Linear Block Code Hamming Code is a Linear Block Code. Linear Block Code means that the codeword is generated by multiplying the message.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 15.
1 TCP III - Error Control TCP Error Control. 2 ARQ Error Control Two types of errors: –Lost packets –Damaged packets Most Error Control techniques are.
2000 년 11 월 20 일 전북대학교 분산처리실험실 TCP Flow Control (nagle’s algorithm) 오 남 호 분산 처리 실험실
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update
Copyright © Lopamudra Roychoudhuri
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
The concept of RAID in Databases By Junaid Ali Siddiqui.
CS603 Fault Tolerance - Communication April 17, 2002.
Chapter 3: Channel Coding (part 3). Automatic repeat request (ARQ) protocols ▫Used in combination with error detection/correction ▫Block of data with.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
1 TCP Timeout And Retransmission Chapter 21 TCP sets a timeout when it sends data and if data is not acknowledged before timeout expires it retransmits.
TCP OVER ADHOC NETWORK. TCP Basics TCP (Transmission Control Protocol) was designed to provide reliable end-to-end delivery of data over unreliable networks.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Failure detection The design of fault-tolerant systems will be easier if failures can be detected. Depends on the 1. System model, and 2. The type of failures.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Fault Tolerance (2). Topics r Reliable Group Communication.
Data Link Layer. Data link layer The communication between two machines that can directly communicate with each other. Basic property – If bit A is sent.
Chi-Cheng Lin, Winona State University CS412 Introduction to Computer Networking & Telecommunication Data Link Layer Part II – Sliding Window Protocols.
1 The utopia protocol  Unrealistic assumptions: –processing time ignored –infinite buffer space available –simplex: data transmitted in one direction.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Transmission Control Protocol (TCP) TCP Flow Control and Congestion Control CS 60008: Internet Architecture and Protocols Department of CSE, IIT Kharagpur.
DATA LINK CONTROL. DATA LINK LAYER RESPONSIBILTIES  FRAMING  ERROR CONTROL  FLOW CONTROL.
Computer Networking Lecture 16 – Reliable Transport.
Data Link Layer.
9. Principles of Reliable Data Transport – Part 1
Reliable Transmission
Chapter 5 TCP Sequence Numbers & TCP Transmission Control
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 full duplex data:
Introduction to Networks
Instructor Mazhar Hussain
Chapter 5 TCP Sliding Window
Flow Control.
Chapter 5 TCP Transmission Control
Flow and Error Control.
CSS432 (Link Level Protocols) Reliable Transmission Textbook Ch 2.5
CS4470 Computer Networking Protocols
8: Principles of Reliable Data Transfer
The Transport Layer Reliability
Error Checking continued
Presentation transcript:

Copyright Fujitsu & Savvion © 2000 ebXML Reliability Messaging A Proof of Concept Implementation Fujitsu Savvion Author: Jacques Durand

Copyright Fujitsu & Savvion © 2000 Buyer Business Process Buyer Message Service Handler Seller MSH Seller BP Purchase Order PO Reliability: Non-Failure Case Msg type: Ack Msg type: Normal Msg deliv.sem: OnceAndOnlyOnce Buyer Host Seller Host PO payload PO payload

Copyright Fujitsu & Savvion © 2000 timeout Buyer BP Buyer MSH Seller MSH Seller BP PO Reliability Use Case A “Normal message” failure

Copyright Fujitsu & Savvion © 2000 Buyer BP Buyer MSH Seller MSH Seller BP PO duplicate Does not transmit Reliability Use Case B Accidental message re-send

Copyright Fujitsu & Savvion © 2000 Buyer BP Buyer MSH Seller MSH Seller BP PO timeout resend Does not transmit (duplicate) Reliability Use Case C Acknowledgement failure

Copyright Fujitsu & Savvion © 2000 Reliability Messaging: A Service Architecture Message Service Handler (MSH) Business Process HTTP SMTP HTTP Reliability Service Sent messages Sender Counter data 1 sender RS-data set [Received messages?] Receiver Counter data 1 receiver RS-data set RS interface callback interface For each two-way connection

Copyright Fujitsu & Savvion © 2000 Reliability Service: connection-stateful Message Service Handler (MSH) Business Process A HTTP RS Message Service Handler (MSH) HTTP RS Message Service Handler (MSH) HTTP RS Business Process C Business Process B This RS -enabled MSH maintains 2 sender RS-data sets 2 receiver RS-data sets This RS-enabled MSH maintains 1 sender RS-data set 1 receiver RS-data set

Copyright Fujitsu & Savvion © 2000 Reliability Service Functions Before Sending a Reliable Normal Message (sender side) After Receiving a Reliable Normal Message (receiver side) After Receiving an Ack Message (sender side) get message out of store Callback sender MSH, with message to resend When a Msg TimeOut occurs Tell receiver MSH to send back an Ack msg check Seq Nbr: Duplicate ? Ignore msg OK (new) ? Pass to receiver MSH Store message copy generate new Seq Nbr update (and return to sender MSH) msg with modified header Remove acknowledge msg from sender store cancel timeout countdown MSH

Copyright Fujitsu & Savvion © 2000 time sender receiver time sender receiver time sender receiver ack msg Sender Blocking on Acks Sender Not blocking on Acks: the good and the bad Message sequences scenarios

Copyright Fujitsu & Savvion © 2000 Problem: The original Sequence Number Method was designed for purely sequential exchanges of messages (the sender is waiting for an acknowledgement from receiver before sending next message.) It is able to detect message duplicates only in this case. How to adapt it to enforce “Once And Only Once” when Sequencing is no longer guaranteed? The enhanced solution proposed here: Non-ordered Sequence Numbers Method allows for quick detection of message duplicates spanning an arbitrary long sequence of messages, without maintaining a database of billions of messages IDs on the receiver side...

Copyright Fujitsu & Savvion © 2000 time sender receiver Problem: how to detect duplicates when sequencing is no longer guaranteed? Solution: The receiver maintains: (a) a max Seq Nbr received (MSNR), that represents the max seq counter value received so far (e.g =4 at time t i ). (b) a list (union) of past Seq Numbers intervals (SN-intervals). Each interval represents a set of received messages with contiguous sequence numbers (no matter in which order they were received). [1:2] [4] [1:4] List of SN-Intervals at time t i When receiving mesg with seq nbr = 3 New list of SN-Intervals Time t i Simple version of algorithm: When receiving a message M: if (counter(M) > MSNR), then MSNR = counter(M), // it is not a duplicate, merge counter(M) with last SN-interval [or create a new one if needed] else if counter(M) belongs to any SN-interval, then it is a duplicate, ignore it else new union of SN-intervals = current union + counter(M) MSNR=4

Copyright Fujitsu & Savvion © 2000 Problem: how to detect a counter wrap-around on receiver side, since the maxSeqNbr (e.g. 999,999,999 ) may never be obtained (or too late ) due to message failure/delay? (same for the fisrt SeqNbr). Also, the receiver could get the message with SeqNbr=1 before receiving the SeqNbr=999,999,999. How to know whether this is a “new” message, or a duplicate of an (very) “old” one? SeqNbr=1 could also be delayed too, so receiver gets SeqNbr=2 or 3 instead. We could rely on a special “wrap-around” notification message to receiver, but: This message could be lost or delayed too, thus paralyzing the sender. Even if sender blocks on the Ack from such a notification message, this could hurt overall performance as normal messages would not be sent during this time. 1999,999,999 MSNR (Current max seq nbr received) Seq number range Most recent SN-intervals Most ancient SN-intervals Handling of Wrap-around Condition When wrapping-around the Sequence Number Counter, the sender re-generate sequence numbers from 1.

Copyright Fujitsu & Savvion © 2000 Solution: use a quantitative (yet safe) method … We define: A maximum contiguous loss (MCL), which represents the maximum number of contiguous messages that can be “missing” at any time on receiver side (i.e. delayed or lost). e.g. MCL = 1000 assumes that no more than 1000 contiguous messages are permitted to be “missing” at any time on receiver side, between two received messages. If that was about to be the case, the receiver would notify an Error to the sender, so that appropriate action be taken, e.g. stop sending until communication problem is solved. In other words, given an MSNR, it is not expected that the next MSNR value jumps to more than MSNR+MCL. Note that it is also the max distance (in nbr of messages) between the upper limit of an SN-interval, and the lower limit of the next one. A finish watermark (FW), which is the highest SeqNbr value (maxSeqNbr) minus the MCL (e.g. SeqNbr 999,999,999 minus 1000). We say we are in a finish condition if MSNR is past the finish watermark.

Copyright Fujitsu & Savvion © 2000 Note 1 : With the MCL, we can assume that before wrapping around the counter, the receiver has received at least one message beyond the FW mark. Extra care may be taken near the finish watermark to make sure the counter does not wrap around unexpectedly, in case of exceptional situation. Note 2 : The case where MCL = 0 is the traditional Sequence Number Method, which assumes all messages should be received in the same order they were sent. (No tolerance for “disorder”.) In such case, there is always only one large SN-interval, no fragmentation, and only the upper bound needs be checked. Note 3 : When incrementing the MSNR, the MCL assumes that the next “new” message may be in the range [MSNR, MSNR+MCL]. So, this range should not be searched for duplicates… this range must then be subtracted from the list of SN-intervals ,998, ,999,999 FW

Copyright Fujitsu & Savvion © 2000 When receiving a reliable normal message M: if (counter(M) <= MCL + MSNR - maxSeqNbr) then //we were in finish-condition and the counter just wrapped-around MSNR = counter(M), new union of SN-intervals = current union minus the interval [1, MSNR+MCL] else if (counter(M) <= MSNR) then //this is a message coming late - out of sequence if (counter(M) belongs to any SN-interval), then [it is a duplicate, ignore it] else new union of SN-intervals = current union + counter(M) endif else // it is a new message with higher Seq Nbr than previous ones MSNR = counter(M), new union of SN-intervals = current union minus interval [MSNR, MSNR +MCL] minus interval [1, MSNR+ MCL - maxSeqNbr] //in case of wrap around plus MSNR endif General un-ordered sequence number method algorithm with handling of counter wrap-around :

Copyright Fujitsu & Savvion © 2000 The un-ordered sequence number method algorithm takes advantage of the fact that the disorder of Sequence Numbers on the reception side, is local in time and space: After a reasonable amount of time, most old messages have arrived, thanks to re-send. Thus, the oldest the intervals the larger they are. The fragmentation of intervals will remain localised near the most recent sequence numbers. The overall number of intervals (of which depends the speed of the duplicate check) is expected to remain stable and manageable, no matter how big the max seq number (and in any case, much faster to use than a comparable database of message Ids). If some “holes” (missing messages) persist for a long time, the receiver could send some error message suggesting to the sender to either re-send, or reuse the missing numbers for new messages.