End-to-End Arguments in System Design J.H. Salter, D.P. Reed, and D.D. Clark MIT-LCS
Motivation1 “Choosing the proper boundaries between functions is perhaps the primary activity of the computer system designer.” “Design principles that provide guidance in this choice of function placement are among the most important tools of a system designer” The statements are from 1984. Think about software design, OO, Patterns, Frameworks, from certain perspective they are all about choosing the proper boundary is software system design.
What is this paper about? “…discusses one class of function placement that has been used for many years with neither explicit recognition nor much conviction” Comment: to some extend still true today So what is it, anyway?
Example: A system includes communication Usually we draw a modular boundary around the communication subsystem and the rest of the system. There is a list of functions to be implemented in any of several ways: by the communication subsystem? by its client? as a joint venture? both doing it, redundantly?
Function example: File Transfer How many points of failure? Disk, software, processor/memory, communication, crash… How would a “careful file transfer” application then cope with this list of threats?
Threats to the “careful FT” transaction Disk faults Software faults: file system, file transfer, communication software (buffering, copying mistakes…) Processor, memory transient errors. Packet dropping, mutation Crash in the middle of transaction …
How to cope with the threat? Reinforce each step along the way using duplicate copies, time-out and retry, carefully located redundancy for error detection, crash recovery, t=etc. Reduce the probability of each of the individual threats to an acceptable small value.
Yet, other observations Countering threat (2) requires writing correct programs, which is quite difficult. Few nontrivial large programs can claim correctness. Doing everything many times, also appear uneconomical (especially in real-time systems, and resource constrained systems).
Alternate approach End-to-end check and retry. Use end to end checksums. The file transfer application declares the transaction commited is the checksums agree. If failures are fairly rare, this technique will normally work on the first try; occasionally a second or even third try might be reqired.
How will a reliable communication subsystem help? Does it reduce the frequency of retries of the file transfer system? (thus improves performance) YES! Does it effect the correctness of the outcome? NO! Yes. No. (The correctness of the outcome is specified and achieved by the end-to-end checksum.
End-to-End Argument The function in question can completely and correctly be implemented only with the knowledge of the application standing at the endpoints of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible. Some times an incomplete version of the function provided by the communication system maybe useful as a performance enhancement.
A Too-Real Example Place: MIT local network What: Over a period of time many of the files were repeatedly transferred through a defective gateway. The owners were forced to do the ultimate end-to-end error check: manual comparison with old files. Why: the application programmer believed (assumed) the network was providing reliable transmission.
Performance Aspect Some low level effort does have significant effect on application performance BUT the key idea is the lower level need not to (overly spent effort to) provide “perfect” reliability. The amount of effort to put into reliability measures within the data communication system is seen to be an engineering tradeoff rather a requirement for correctness. If the communication system is beefed up with internal reliability measures, those measures also have a reali
Example2: Delivery guarantee The ack message in ARPANET was never found to be helpful to applications using ARPANET, why? Because knowing for sure that message was delivered to the target host is not very important. What the application wants to know is whether or not the target host has acted on the message!
Continue All manner of disaster might have struck after message delivery but before completion of the action requested by the message. The acknowledgement that is really desired is an end-to-end one, which can obly by the target application—”I did it”, or “I didn’t”.
Eample3: Secure Transmission of Data Use a “secure” subsystem: If the data transmission system perform encryption and decryption, it must be trusted to securely manage the required encryption keys The data will be in the clear and thus vulnerable to attacks as they pass into the subsystem and are fan out to the target application. The authenticity of the message must still be checked by the application.
Alternative The application itself performs end-to-end encryption Has its own authentication check Manages the key itself The data is never exposed to outside! So to satisfy the application of the application, there is no need for communication subsystem to provide for automatic encryption of all traffic. Automatic encryption of all traffic by the communication subsystem may be called for ensure something else: that a misbehaving user or application program does not delibrately ransmit information that should not be exposed.
Other examples Duplicate message suppression Guaranteeing FIFI message delivery. Transaction Management They applied end-to-end argument to the construction of the SWALLOW distributed data storage system, where it leads to significant reduction in overhead.
Identifying the ends Using the e2e arguments sometimes requires subtlety of analysis of application requirements. Example: if low levels of a telephone system try to accomplish bit-perfect communication, they will probably introduce uncontrolled delays in packet delivery. Such delays are disruptive to voice apps. It is better off to accept the damaged data and the participant to say “excuse me?”.
But, this strong version of e2e argument is a property of the specific application—two people in real-time conversation. In a speech message system, the argument suddenly changes its nature.
Conclusion The e2e argument is a guideline that helps in application and protocol design analysis. One must use some care to identify the end points to which the argument should be applied. It is not an absolute rule, but a kind of “Occam’s razor”. In designing a subsystem, don’t overly anticipate to “help” users by taking on more functions than necessary. Tradeoffs of function placement shall be carefully analyzed in system design.
% of app needs effort Reliability