Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for delivering content. We study how to optimize throughput of large, multipoint transfers across richly connected overlay networks, focusing on the question of what to put in each transmitted packet. We first make the case for transmitting encoded content in this scenario, arguing for the digital fountain approach which enables end-hosts to efficiently restitute the original content of size n from a subset of any n symbols from a large universe of encoded symbols. Such an approach affords reliability and a substantial degree of application-level flexibility, as it seamlessly tolerates packet loss, connection migration, and parallel transfers. However, since the sets of symbols acquired by peers are likely to overlap substantially, care must be taken to enable them to collaborate effectively.We provide a collection of useful algorithmic tools for efficient estimation, summarization, and approximate reconciliation of sets of symbols between pairs of collaborating peers, all of which keep messaging complexity and computation to a minimum. Through simulations and experiments on a prototype implementation, we demonstrate the performance benefits of our informed content delivery mechanisms and how they complement existing overlay network architectures. John Byers, Jeffrey Considine, Michael Mitzenmacher and Stan Rost ShapeShifter Project Poster 46
Computer Science 2 Premises of Our Work Working to improve content distribution Current schemes are limited Topologies limited to trees (servers at roots) Other overlay links might have residual bandwidth Plenty of room for improvement Why suffer through slow download from server? If last hop is not bottleneck, we can use residual bandwidth. Can we make use of partial content? (most schemes assume otherwise)
Computer Science 3 A Motivating Example Improving a Tree Topology initial multicast tree parallel downloads collaborative transfers Careful use of additional links can yield significant speedups. Not Trivial: Parallel Downloads: Orchestration across servers. Collaborative Transfers: Need to avoid spurious transmissions.
Computer Science 4 Digital Fountain to the Rescue Each y j = x j1 x j2 x j3 … Erasure encodings allow packet loss to be handled without feedback. Memory-less encodings enable unsynchronized downloads from multiple servers. Large symbol space provides flexibility. Input File (n symbols) Encoding (huge symbol space!) x1x1 x2x2 x3x3 …xnxn y1y1 y2y2 y3y3 …ynyn …y 2n …y 3n … (network) Recovered File Received symbols (any n) x1x1 x2x2 x3x3 …xnxn y i1 y i2 y i3 …y in
Computer Science 5 Collaborative Transfers are Much Harder Need to summarize working sets Working sets often overlap Correlation between spatially proximate peers can be very high Transmitting shared packets is a waste Must be done concisely since Space of codewords can be large Enumerating all codewords is prohibitively expensive Our contributions Estimating overlap in 1 packet Working set in 10 packets Recoding over received symbols (erasure coding by non- servers)
Computer Science 6 Peer Assessment How many symbols does A have that I don’t? Assessing the amount of shared content of a peer allows evaluation of the peer. Random sampling Min-wise summaries (right) Both fit in one packet! Benefits of min-wise Multiple peers can be compared to each other. Similarity is discarded or utilized for parallel speedups.
Computer Science 7 Set Reconciliation Reconciling the differences in content between peers facilitates efficient transfer. Exact set reconciliation methods are expensive. We use faster approximate methods. Bloom filters: efficient summaries for large universes Approximate reconciliation trees: faster for small differences Large stretch factor makes this feasible. In this example, if only 10 symbols are necessary, only 2 of the 4 differences need to be reconciled. Content of A y4y4 y5y5 y 13 y 16 y 21 y 25 y 31 y 35 Content of B y1y1 y4y4 y5y5 y 16 y 23 y 25 y 33
Computer Science 8 Connecting to peers with partial content speeds up transfers with or without servers. Experiments 1 full server and 1 partial server:Multiple partial servers: Naïve methods suffer degradation from correlation. Two smart servers are better than four naïve ones.
Computer Science 9 Conclusions Peers ignored in current distribution schemes are an untapped resource Allow overlay manager to use almost any link Concise summaries handle previous problems. Overlay changes don’t matter Reconfiguration is expected More freedom to pick good links Perfect for ad-hoc wireless Performance closer to maximum-flow Often better than any single path