Travis McVey, Diego Velasquez, Mark Whylie, Drem Darios, Elroy Ashtian Jr.
Members of the groupPart of the paper Diego VelasquezIntroduction Drem DariosFunctionality (From 2.1 to 2.3) Elroy Ashtian JrFunctionality (From 2.4 to 2.5) Travis McVeySpeed Mark WhylieFault Tolerance Diego VelasquezConclusion Outline:
Abstract: Paper based in the experienced of Butler W. Lampson. General hints for computer system design illustrated using examples.
Points explain in the paper: Designing a computer system is different from designing an algorithm. There is no a best way to build a system. They are just hints The hints are illustrated by a number of examples. They range from: hardware, operating systems, programming systems, and applications programs.
Each hint is summarized by a slogan. The following table organizes the slogans in two axes: Why? where? Functionality Does it work? Speed Is it fast enough? Fault-tolerance Does it keep working? Completeness Separate normal and worst case Safety first Shed load End-to endEnd-to-end InterfaceDo one thing well: Don’t Generalize Don’t hide power Use procedure arguments Leave it to the client Keep basic interfaces stable Keep a place to stand Make it fast Split Resources Static Analysis Dynamic translation End-to-end Log Updates Make action atomic ImplementationPlan to throw one away Keep Secrets Use a good idea again Divide and conquer Cache answers Use hints Use brute force Compute in background Batch processing Make action atomic Use hint
The most vague but most important hint is to obtain the right functionality for a system. Interface design must satisfy three things: It should be simple It should be complete, meaning normal and worst cases are considered It should admin a sufficiently small and fast implementation
Driving Operate Car Brake Pedal Brake Controller Brake Line Brakes Accelerator Throttle Controller Fuel Engine Steering Wheel Steering system Steering Column Wheel Angle User Program Interface Device s Software Hardware Abstract Interface Device Interface Modules Hardware Interface
Do one thing and do it well When an interface undertakes too much it results in a large, slow, and complicated implementation Some interfaces are ok to sacrifice performance for functionality Get it right! Do not generalize; generalization is generally wrong The Tenex System Example The problem with this design is that is made it possible to gain access by guessing a password of length n in 64n tries (on average) rather than 128 n /2
Make it fast rather than powerful If it’s fast the client can program the function it wants and another client program some other function Just as before, simpler is better. It is better to be simple and fast than powerful and slow Don’t hide power The purpose of abstraction is to conceal the undesired properties but desired ones shouldn’t be hidden. If a low level of abstraction allows something to be done quickly, it shouldn’t be hidden by higher levels Use procedure arguments to provide flexibility in an interface A good example of this is an enumeration procedure that returns a result set based on a certain property. The best interface would allow the client to pass their own filter to the enumeration procedure rather than fighting with built in mechanism. Leave it to the client If it is cheap to pass control of an interface back and forth, the interface should just quickly solve one problem and leave the rest to the client.
Keep basic interfaces stable. Software Interfaces Type-checking & Non type-checking programming language Ex. Type-checking language – Mesa
Keep a place to stand Compatibility package Tenex World-swap debugger Useful in bootstrapping ex. BIOS to OS
Plan to throw one away Testing & Prototyping Keep secrets of the implementation Secrets Assumptions between the parts Downside to Fewer Assumptions
Divide and conquer Solving a complex problem ex. Alto's Scavenger Program & Dover raster printer Use a good idea again, instead of generalizing it. An Idea – ex. replication of data Small amount of data Large amounts of data
Handle normal and worse cases as a rule Error Handling Interlisp-D & Cedar programming systems Reference-counting garbage collector Cedar’s additional functions
Rare Problem with referencing-counting – Overflow – Solution: An overflow count table Paging system – Worst case scenario: all dirty pages Bravo editor Piece Table Editing Cleanup process
It is usually Faster to allocate dedicated resources, but this increases cost Examples
A program can read data much faster when the data is read sequentially When in sequential order the data becomes predictable However, it is very difficult for a programmer to go over the code and optimize the disk transfers This leads to Dynamic Analysis by demand paging which is at least as good
Make translation easy – so it can be quickly interpreted is a nice change from bit to bit translation Another idea of this scheme is to translate on demand and cache the result
Short definition: Storing information that takes a long time to compute. Cache MUST: Be true – invalidate the value and/or update the value Not “Thrash”
Hardware: Bad Examples: Software: Bravo Editor
Like a cache entry is the saved result of some computation and is used to make the system faster How is it different? How is it effective?
In Alto and Pilot Operating Systems Arpanet Operating System Smalltalk Program
Do not forget Brute Force is always an option – and easier as the cost of Hardware comes down Example in Chess: Special-purpose Hardware by Belle beats sophisticated algorithms
When it is possible, computing in the background Examples: Electronic Mail Garbage Collectors Banks Paging Systems
Doing things incrementally Disks and tapes work better when accessed Sequentially Errors Recovery is much simpler Example: Bank of America
When “Allocating Resources” it is more important to prevent disaster than to optimize General Purpose systems cannot be optimized Sad Truth Leads to Shedding the load
Do not let the System become overloaded – must take control Bob Morris’s and the “Red Button” Arpanet Operating System Example
Making a system reliable is not really hard, if you know how to go about it. The issue arises when you attempt to add reliability to a existing design.
End-to-end error recovery is absolutely necessary for a reliable system, and any other error detection or recovery is not logically necessary, but is strictly for performance. Example: Consider the operation of transferring a file from a disk using the NTFS file system on machine A to a disk consisting of the ext3 file system on machine B. What would be the logical thing to do to test that the file actually did transfer successfully with all bits still in the correct order?
Answer: Obviously you would just open the file from machines B's disk, compute a checksum on machine B, and compute the same checksum on machines A's disk for the same file and if they are equal we can assume that the transaction was successful. Here we have an end to end check.
However, if we decide to implement more intermediate checks after looking at the end to end technique we notice that these intermediate steps are not sufficient at all. For instance, we could have transferred the file from A's disk to A's memory, then from A's memory over a network to B's memory then move the file from B's memory to its disk. But the pitfall with this is that if we transfer this file over the network without checking for packet loss, we could have random bits missing from the file when it arrives at its destination on B’s disk. So obviously all this extra headache can be avoided in this example by just comparing the checksums at the source and destination to see if they match. However, let me point out that performing these intermediate checks would be for performance gains.
Another Great example of end to end for reliable systems is the pup Internet. In this network a data packet is transferred from a source to a destination. These packets may traverse various networks at different rates where each individual networks may implement different intermediate strategies to catch errors before proceeding. For instance, some networks may only be used to temporarily store and forward packets. But a pitfall here may be that there are so many packets coming through a particular node that a forwarder queue becomes clobbered and when this occurs it is forced to drop packets.
In instances like these intermediate steps becomes unreliable as in this case the sender of the packet has no way to know if the packet reached its destination or not as these intermediate checks are local to each individual network that the packet is traversing. However, to face these uncertainties the pup internet provides good services with an implementation by attempting only "best efforts" delivery. In this case, clients provide there own error control to deal with problems. However, the packet transport does attempt to report problems to its clients, by providing a modest amount of error control (a 16-bit checksum), notifying senders of discarded packets when possible, etc. These services are intended to improve performance in the face of unreliable communication and overloading; since they too are best efforts, they don't complicate the implementation much.
However, there are two pitfalls with the end- to-end strategy: 1) it requires a cheap test for success. 2) It can lead to working systems with sever performance defects, which may not be obvious until a operation is placed on heavy load.
We use log updates to record the truth about the state of an object. A log is a very simple data structure which can be reliably written and read, and cheaply forced out on disk or other storage that can survive a crash i.e. some portable media. These files are append only so it ensure that a log is valid whenever a crash occurs To use a log, record every update as a log entry, consisting of the name of the update procedure and its arguments. This allows the same statement from the log to be executed later i.e. after a crash. Keeping the log in this order allows a sequence of log entries to be re-executed, starting with an object in its original state, and produce the same object that was produced in its original execution.
The update procedure must be a true function: Its result does not depend on any state outside its arguments; It has no side effects, except on the object in whose log it appears. The arguments must be values, one of: Immediate values, such as integers, strings etc. An immediate value can be a large thing, like an array or even a list, but the entire value must be copied into the log entry.
However, most objects are not immutable since they are updated. Each update to a object changes its version. However in the case of a log a simple way to refer a particular version of an object is to identify the object in the log and all the updates done to it. When we replay the log file and begin with the original object we can choose from the number of updates queue to identify what version of the object we want to access.
Make actions atomic or restartable. An atomic action is one which completes or has no effect. In most storage systems fetch and store operations are atomic so either it completely retrieves some arbitrary word or it doesn't. This eliminates the need for intermediate steps when attempting to recover from any errors.
“Most humbly do I take my leave, my lord”
Why? where? Functionality Does it work? Speed Is it fast enough? Fault-tolerance Does it keep working? Completeness Separate normal and worst case Safety first Shed load End-to endEnd-to-end InterfaceDo one thing well: Don’t Generalize Don’t hide power Use procedure arguments Leave it to the client Keep basic interfaces stable Keep a place to stand Make it fast Split Resources Static Analysis Dynamic translation End-to-end Log Updates Make action atomic ImplementationPlan to throw one away Keep Secrets Use a good idea again Divide and conquer Cache answers Use hints Use brute force Compute in background Batch processing Make action atomic Use hint The slogans in the paper are collected in the table below.