Evaluating the Running Time of a Communication Round over the Internet

Evaluating the Running Time of a Communication Round over the Internet
Omar Bakr Idit Keidar MIT MIT/Technion PODC 2002

Communication Round Exchange of information from all hosts to all hosts Part of many distributed algorithms, systems consensus, atomic commit, replication, ...

Common Metric for Evaluating Algorithms
Number of rounds (or steps) they require

Questions What is the best way to implement a communication round over the Internet decentralized vs. centralized How long is a communication round over the Internet?

Prediction is Hard Internet is unpredictable, diverse, …
Different answers for different topologies, different times Different performance metrics local running time one host is engaged in algorithm overall running time from when first host starts to when last host finishes

“Communication Round” Primitive
Initiated by some host Propagates data from every host to every other host connected to it

Example Implementations
All-to-all Leader Secondary Leader

Experiment I 10 hosts: Taiwan, Korea, US academia, ISPs
TCP/IP (connections always up) Algorithms: All-to-all Leader (initiator) Secondary leader (not initiator) periodically initiated at each host - 650 times over 3.5 days

Computing Overall Running Time
Elapsed time from initiation (at initiator) until all hosts terminate Requires estimating clock differences Clocks not synchronized, drift We compute difference over short intervals Compute 3 different ways Achieve accuracy within 20 ms. on 90% of runs

Teaser: Comparing Performance Based on Number of Steps
All-to-all: 2 Leader: 3 Secondary Leader: 4

Predicting Overall Runnig Times From MIT
Ping-measured latencies (IP): Longest link latency 240 milliseconds Longest link to MIT 150 milliseconds = 390 = 450

Measured Running Times Runs Initiated at MIT
All-to-All Leader Overall Local Prediction 390 300 450 Average (runs under 2 sec) 811 295 541 335 % runs over 2 seconds 55% 3% 13% 6% Running times in milliseconds

What’s Going On? Loss rates on two links are very high
42% and 37% Taiwan to two ISPs in the US Loss rates on other links up to 8% Upon loss, TCP’s timeout is big More than round-trip-time All-to-all sends messages on lossy links Often delayed by loss

Distribution of Running Times Up to 1.3 sec. at MIT

Running Times Runs Initiated at Taiwan
% runs over 2 seconds Average (runs under 2 sec) Sec. Leader overall local Leader All-to-all 7% 13% 43% 64% 24% 54% 607 679 844 1120 645 866 Running times in milliseconds

Distribution of Running Times in Taiwan

What’s Going On? Taiwan Good link Lossy link MIT
Hosts with bad links to Taiwan Other Hosts Leader Secondary Leader All-to-all

Experiment II: Removing Taiwan
Overall running times much better For every initiator and algorithm, less than 10% over 2 seconds (as opposed to 55% previously) All-to-all overall still worse than others! either Leader or Secondary Leader best, depending on initiator loss rates of 2% - 8% are not negligible all-to-all sends O(n2) messages; suffers But, all-to-all has best local running times

Probability of Delay due to Loss
If all links would have same latency assume 1% loss on all links; 10 hosts (n=10) Leader sends 3(n-1) = 27 messages probability of at least one loss: » 24% All-2-all sends n(n-1) = 90 messages probability of at least one loss: » 60% In reality, links don’t have same latency only loss on long links matters

Conclusions Message loss causes high variation in TCP link latencies
latency distribution has high variance, heavy tail Latency distribution determines expected time for receiving O(n) concurrent messages Secondary leader helps No triangle inequality, especially for loss Different for overall vs. local running times Number of rounds/steps not sufficient metric One-to-all and all-to-all have different costs

Evaluating the Running Time of a Communication Round over the Internet

Similar presentations

Presentation on theme: "Evaluating the Running Time of a Communication Round over the Internet"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluating the Running Time of a Communication Round over the Internet

Similar presentations

Presentation on theme: "Evaluating the Running Time of a Communication Round over the Internet"— Presentation transcript:

Similar presentations

About project

Feedback