Balancing Throughput, Robustness, and In-Order Delivery in P2P VoD Bin Fan+, David Andersen+, Michael Kaminsky* and Konstantina Papagiannaki* Good morning everyone, I am Bin Fan, a PhD student from Carnegie Mellon University. Today I am going to present you the work we’ve done to explore a fundamental tradeoff in peer-to-peer VoD delivery systems. Doing so, we will provide a framework that we hope useful to understand the current schemes and design future peer-to-peer VoD systems. + Carnegie Mellon University * Intel Labs Pittsburgh
Outline Motivation Tradeoff Analysis Experimental Validation
P2P Basics Peer1 Seed File Peer2 2 4 1 2 4 5 3 1 2 3 5 First let me briefly give you an overview of some basics of P2P transfers. Namely, this is how the famous BitTorrent works In a P2P swarm, we have a source of original content (or seed in BitTorrent terminology) pushing the file other downloaders. The served file is divided into different chunks. The source has every copy of the file. The source will send the chunks to peers in some way that we will talk about. The peers can download from either the source or each other. In VoD or streaming, we then playback the chunks we downloaded sequentially Peer2 1 2 3 5
P2P VoD Design Goals Large Design Space High Throughput Robustness Fully In Order Goals Large Design Space Which chunks to download Which peers to download/upload To build a efficient P2P VoD system, we need to achieve 3 goals: high throughput means users can get the chunks fast. Robustness means the system can still do well under presence of heterogeneous peers and churn. In addition to high tput and good robustness, p2p VoD systems also want to in order delivery to playback the chunks. Our major result is that we prove that among the three design choices. You can pick two of them, but you have to compromise for the third one Keep the goals in mind, however there is also a large design space. For example, you can design strategies to fetch chunks in different orders, and from different sources. You can also constrain your peers to serve certain kind of neighbors. People have come up with a number of different schemes for VoD purpose. The motivation of this paper is actually to provide a systematic way to understand and compare these schemes.
1st Attempt: BitTorrent Rarest Random Fetch the rarest chunk Peer-to-peer technique has achieved a huge success for file transfers and why don’t we try to just use it to deliver movie content? BitTorrent uses a strategy called rarest first or rarest random. It operates by grabbing the rarest chunk of the file for every peer. so everyone is likely to contribute to the swarm. Each bar represents a peer downloading and uploading, hence we show four peers here. Each blue arrow means a data transfer. As you can see, data transfer is likely to happen between any two peers since every one has something useful to the others. This’s a very effective strategy for peer-to-peer system to achieve high throughput.
# of Total Downloaded Chunks Use BitTorrent for VoD Select the rarest chunk to download Downloading complete # of Played Chunks # of Total Downloaded Chunks [Movie:] If we stop at xx, There are some chunks from the very early of the movie that we haven’t downloaded yet, because of the random downloading That prevents us from playing back the movie [Picture:] The number of chunks you have downloaded is linearly over time. So the downloading is efficient but we can’t playback when almost the chunks are done # of Useful Chunks Starting Playback t + Complete downloading soon - Can not start playback until the end
2nd Attempt: Naïve Sequential Rarest Random Fetch the rarest chunk Naïve Sequential Fetch the next chunk
Naive Sequential Downloading Change BitTorrent to download sequentially: Throughput collapsed # of Total Downloaded Chunks # of Played Chunks # of Useful Chunks t + Sequential download - Low througput
Question Can we achieve both high throughput and sequential peer-to-peer download?
Outline Motivation Tradeoff Analysis Experimental Validation Modeling throughput Study basic schemes TRS Trdeoff Experimental Validation
Per-chunk Capacity Peer1 Seed File File Peer2 Ci (Per-chunk capacity of chunk i): The aggregated uplink bandwidth for chunk i from seeds and peers Seed File 2 5 3 Uplink BW 1 2 4 5 3 File Peer2 Uplink Bandwidth 1 2 4 Uplink BW
Per-chunk Capacity & Throughput Ci: Per-chunk capacity for chunk i System throughput = Min{Ci} See proof in paper System Throughput C1 C2 C3 C4 C5 Chunk5: Bottleneck Chunk
Metrics Throughput Robustness Sequentiality Minimal Per-chunk capacity Number of sources to download each chunk Sequentiality Order of chunk arrival Let’s analyze the schemes and we are going to look specifically at three metrics. Tput as we already shown, related to perchunk capacity Robustness sequentiality
Three Basic Schemes Rarest Random Naïve Sequential Fetch the rarest chunk Naïve Sequential Fetch the next chunk Cascading [Annapureddy07, Yang09] Form a chain to fetch
Rarest Random Chunks uniformly distributed ✓ Each chunk gets about the same capacity to replicate C1=C2=…=Cm ✓ Many sources for each chunk ✗ Out of order C1 C2 C3 C4 C5
Naïve Sequential Many copies of 1st chunk, few copies of last chunk ✗ Low throughput: later chunk gets less chance to replicate C1>C2>…>Cm ✓ Many sources for each chunk, on average ✓ Fully in order C1 C2 C3 C4 C5
Cascading Skewed chunks distribution ✓ Each chunk gets same chance to replicate C1=C2=…=Cm ✗ Few sources for each chunk ✓ Fully in order C1 C2 C3 C4 C5
TRS Tradeoff TRS Tradeoff High Throughput See proof in paper Impossible to achieve at the same time: Maximal throughput purely sequential retrieval perfectly robust High Throughput Robustness Fully In Order Cascading Rarest Random See proof in paper Naïve Sequential
Intuition of TRS Tradeoff Caused by resource contention among chunks Downloading chunks in order leads to skewed distribution Skewed distribution imposes Either imbalanced per-chunk capacity allocation Or limited sources allowed to serve each chunk
TRS Tradeoff in the Real World Homogeneous nodes Heterogeneous nodes Rarest Random Naive Sequential Cascading
Balance the Tradeoff Maintain high throughput Ensure “less skewed” per-chunk capacity Slightly reduce sequentiality Intuition: 95% sequential is good enough for playback Slightly reduce robustness Intuition: 20 sources are nearly as robust as 100 sources
Three Hybrid Schemes Hybrid Sequential [Huang08] Segment random Rarest random + sequential Segment random Fetch segment in order Fetch chunks in one segment out of order Network coding [Annapureddy07] Each segment encoded Annapureddy07, Huang08
Outline Motivation Tradeoff Analysis Experimental Validation
Evaluation 50 Peers on Emulab 10 Mbps up, 20 Mbps down One seed Modified BitTornado Client
Homogeneous Peers 2 peers/min 6 peers/min 10 peers/min Tput of each Peer: Mpbs
Heterogeneous Peers With no slow peers With slow peers Tput of each Peer: Mpbs
Conclusion Motivated by how to build efficient p2p VoD TRS Tradeoff: Throughput vs Robustness vs Sequentiality Framework to understand tradeoff space Experimental Validation Outline Abstract Formalize the tradeoff, understand design space Enumerate design dimensions. Message we really want to deliver Illustrate the tradeoff