Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University.

Slides:



Advertisements
Similar presentations
A Digital Fountain Approach to Reliable Distribution of Bulk Data
Advertisements

Digital Fountains: Applications and Related Issues Michael Mitzenmacher.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Digital Fountain Codes V. S
José Vieira Information Theory 2010 Information Theory MAP-Tele José Vieira IEETA Departamento de Electrónica, Telecomunicações e Informática Universidade.
Jump to first page A. Patwardhan, CSE Digital Fountains Main Ideas : n Distribution of bulk data n Reliable multicast, broadcast n Ideal digital.
IS333, Ch. 26: TCP Victor Norman Calvin College 1.
D.J.C MacKay IEE Proceedings Communications, Vol. 152, No. 6, December 2005.
Network Coding in Peer-to-Peer Networks Presented by Chu Chun Ngai
Computer Science 1 ShapeShifter: Scalable, Adaptive End-System Multicast John Byers, Jeffrey Considine, Nicholas Eskelinen, Stanislav Rost, Dmitriy Zavin.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 OSI Transport Layer Network Fundamentals – Chapter 4.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Informed Content Delivery Across Adaptive Overlay Networks J. Byers, J. Considine, M. Mitzenmacher and S. Rost Presented by Ananth Rajagopala-Rao.
EE 4272Spring, 2003 Chapter 10 Packet Switching Packet Switching Principles  Switching Techniques  Packet Size  Comparison of Circuit Switching & Packet.
Threshold Phenomena and Fountain Codes
Erasure Correcting Codes
Fountain Codes Amin Shokrollahi EPFL and Digital Fountain, Inc.
Secure Multimedia Multicast: Interface and Multimedia Transmission GROUP 2: Melissa Barker Norman Lo Michael Mullinix server router client router client.
1 NETWORK CODING Anthony Ephremides University of Maryland - A NEW PARADIGM FOR NETWORKING - February 29, 2008 University of Minnesota.
RAPTOR CODES AMIN SHOKROLLAHI DF Digital Fountain Technical Report.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #8 Explicit Congestion Notification (RFC 3168) Limited Transmit.
1 Verification Codes Michael Luby, Digital Fountain, Inc. Michael Mitzenmacher Harvard University and Digital Fountain, Inc.
Digital Fountain with Tornado Codes and LT Codes K. C. Yang.
Informed Content Delivery Across Adaptive Overlay Networks John Byers Dept. of Computer Science, Boston University Joint work with.
Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed Up Downloads John Byers, Boston University Michael Luby, Digital Fountain, Inc.
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
Data Communications and Networking
What Can IP Do? Deliver datagrams to hosts – The IP address in a datagram header identify a host IP treats a computer as an endpoint of communication Best.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
1 Codes, Bloom Filters, and Overlay Networks Michael Mitzenmacher.
Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh.
Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.
Chapter 2 – X.25, Frame Relay & ATM. Switched Network Stations are not connected together necessarily by a single link Stations are typically far apart.
Wireless TCP Prasun Dewan Department of Computer Science University of North Carolina
TCP Lecture 13 November 13, TCP Background Transmission Control Protocol (TCP) TCP provides much of the functionality that IP lacks: reliable service.
3: Transport Layer 3a-1 8: Principles of Reliable Data Transfer Last Modified: 10/15/2015 7:04:07 PM Slides adapted from: J.F Kurose and K.W. Ross,
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Computer Networks with Internet Technology William Stallings
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
CprE 545 project proposal Long.  Introduction  Random linear code  LT-code  Application  Future work.
Stochastic Networks Conference, June 19-24, Connections between network coding and stochastic network theory Bruce Hajek Abstract: Randomly generated.
Copyright 2008 Kenneth M. Chipps Ph.D. Controlling Flow Last Update
Packet switching network Data is divided into packets. Transfer of information as payload in data packets Packets undergo random delays & possible loss.
Wireless TCP. References r Hari Balakrishnan, Venkat Padmanabhan, Srinivasan Seshan and Randy H. Katz, " A Comparison of Mechanisms for Improving TCP.
Chapter 24 Transport Control Protocol (TCP) Layer 4 protocol Responsible for reliable end-to-end transmission Provides illusion of reliable network to.
Multi-Edge Framework for Unequal Error Protecting LT Codes H. V. Beltr˜ao Neto, W. Henkel, V. C. da Rocha Jr. Jacobs University Bremen, Germany IEEE ITW(Information.
Computer Science Division
Presented by Kelly Whitacre Written by John W. Byers, Jeffrey Considine, Michael Mitzenmacher, Member, IEEE, and Stanislav Rost.
a/b/g Networks Routing Herbert Rubens Slides taken from UIUC Wireless Networking Group.
TCP OVER ADHOC NETWORK. TCP Basics TCP (Transmission Control Protocol) was designed to provide reliable end-to-end delivery of data over unreliable networks.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
Indian Institute of Technology Bombay 1 Communication Networks Prof. D. Manjunath
CS/EE 145A Reliable Transmission over Unreliable Channel II Netlab.caltech.edu/course.
Fault Tolerance (2). Topics r Reliable Group Communication.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Transmission Control Protocol (TCP) TCP Flow Control and Congestion Control CS 60008: Internet Architecture and Protocols Department of CSE, IIT Kharagpur.
1 Implementation and performance evaluation of LT and Raptor codes for multimedia applications Pasquale Cataldi, Miquel Pedros Shatarski, Marco Grangetto,
ICDT'06 - Capillary routing with FEC - Emin Gabrielyan 1 Capillary-routing with Forward Error Correction (FEC) ICDT’06 - International Conference.
Coding for Multipath TCP: Opportunities and Challenges Øyvind Ytrehus University of Bergen and Simula Res. Lab. NNUW-2, August 29, 2014.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 Muhammad Waseem Iqbal Lecture # 20 Data Communication.
Internet Networking recitation #9
Packet Switching Datagram Approach Virtual Circuit Approach
CRBcast: A Collaborative Rateless Scheme for Reliable and Energy-Efficient Broadcasting in Wireless Sensor/Actuator Networks Nazanin Rahnavard, Badri N.
Internet Networking recitation #10
Hash Functions for Network Applications (II)
Computer Networks Protocols
Presentation transcript:

Digital Fountains, and Their Application to Informed Content Delivery over Adaptive Overlay Networks Michael Mitzenmacher Harvard University

The Talk Survey of the area –My work, and work of others –History, perspective –Less on theoretical details, more on big ideas Start with digital fountains –What they are –How they work –Simple applications Content delivery –Digital fountains, and other tools

Data in the TCP/IP World Data is an ordered sequence of bytes –Generally split into packets Typical download transaction: –“I need the file: packets 1-100,000.” –Sender sends packets in order (windows) –“Packet 75 is missing, please re-send.” Clean semantics –File is stored this way –Reliability is easy –Works for point-to-point downloads

Problem Case: Multicast One sender, many downloaders –Midnight madness problem – new software –Video-on-demand (not real time) Can download to each individual separately –Doesn’t scale Can “broadcast” –All users must start at the same time? –Heterogeneous packet loss –Heterogeneous download rates

Digital Fountain Paradigm Stop thinking of data as an ordered stream of bytes. Data is like water from a fountain –Put out your cup, stop when the cup is full. –You don’t care which drops of water you get. –You don’t care what order the drops get to your cup.

What is a Digital Fountain? For this talk, a digital fountain is an ideal/paradigm for data transmission. –Vs. the standard (TCP) paradigm: data is an ordered finite sequence of bytes. Instead, with a digital fountain, a k symbol file yields an infinite data stream; once you have received any k symbols from this stream, you can quickly reconstruct the original file.

Digital Fountains for Multicast Packets sent from a single source along a tree. Everyone grabs what they can. –Starting time does not matter – start whenever. –Packet loss does not matter – avoids feedback explosion of lost packets. –Heterogeneous download rates do not matter – drop packets at routers as needed for proper rate. When a user has filled their cup, they leave the multicast session.

Digital Fountains for Parallel Downloads Download from multiple sources simultaneously and seamlessly. –All sources fill the cup – since each fountain has an “infinite” collection of packets, no duplicates. –Relative fountain speeds unimportant; just need to get enough. –No coordination among sources necessary. Combine multicast and parallel downloading. –Wireless networks, multiple stations and antennas.

Digital Fountains for Point-to-Point Data Transmission TCP has problems over long-distance connections. –Packets must be acknowledged to increase sending window (packets in flight). –Long round-trip time leads to slow acks, bounding transmission window. –Any loss increases the problem. Using digital fountain + TCP-friendly congestion control can greatly speed up connections. Separates the “what you send” from “how much” you send. –Do not need to buffer for retransmission.

One-to-Many TCP Setting: Web server with popular files, may have many open connections serving same file. –Problem: has to have a separate buffer, state for each connection to handle retransmissions. –Limits number of connections per server. Instead, use a digital fountain to generate packets useful for all connections for that file. Separates the “what you send” from “how much” you send. –Do not need to buffer for retransmission. Keeps TCP semantics, congestion control.

Digital fountains seem great! But do they really exist?

How Do We Build a Digital Fountain? We can construct (approximate) digital fountains using erasure codes. –Including Reed-Solomon, Tornado, LT, fountain codes. Generally, we only come close to the ideal of the paradigm. –Streams not truly infinite; encoding or decoding times; coding overhead.

Digital Fountains through Erasure Codes Message Encoding Received Message Encoding Algorithm Decoding Algorithm Transmission n cn n n

Reed-Solomon Codes In theory, can produce an unlimited number of encoding symbols, only need k to recover. In practice, limited by: –Field size (usually 256 or 65,536) –Quadratic encoding/decoding times These problems ameliorated by striping data. –But raises overhead; now many more than k packets required to recover. Conclusion: may be suitable for some applications, but far from practical or theoretical goals of a digital fountain.

Tornado Codes Irregular low-density parity check codes. Based on graphs: k input symbols lead to n encoding symbols, using XORs. –Sparse set of equations derived from input symbols. –Solve received set of equations using back substitution. Properties: –Graph of size n agreed on by encoder, decoder, and stored. –Need k(1+  ) symbols to decode, for some  > 0. –Encoding/decoding time proportional to n ln (1/  ).

Tornado Codes An Example

Encoding Process

Decoding Process: Substitution Recovery indicates right node has one edge

Regular Graphs Random Permutation of the Edges Degree 3 Degree 6

Decoding Process Analysis Recovered Missing/not yet recovered = = Induced Graph

3-6 Regular Graph Analysis LeftRightLeft

3-6 Regular Graph Equation Want: y < x for all 0 < x <  Works for  < 0.43

Irregular Graphs 3-6 regular graphs can correct up to 0.43 fraction of erasures. Best possible, with n/2 constraints for n symbols, would be gives best performance of all regular graphs. Need irregular graphs, with varying degrees, to reach optimality.

Tornado Codes: Weaknesses Encoding size n must be fixed ahead of time. Memory, encoding and decoding times proportional to n, not k. Overhead factor of (1+  ). –Hard to design around. In practice  = Conclusion: Tornado codes a dramatic step forward, allowing good approximations to digital fountains for many applications. Key problem: fixed encoding size.

Decoding Process: Direct Recovery

Digital Fountains through Erasure Codes : Problem Message Encoding Received Message Encoding Algorithm Decoding Algorithm Transmission n cn n n

Digital Fountains through Erasure Codes : Solution Message Encoding Received Message Encoding Algorithm Decoding Algorithm Transmission n n n

LT Codes Key idea: graph is implicit, rather than explicit. –Each encoding symbol is the XOR of a random subset of neighbors, independent of other symbols. –Each encoding symbol carries a small header, telling what message symbols it is the XOR of. No initial graph; graph derived from received symbols. Properties: –“Infinite” supply of packets possible. –Need k +o(k) symbols to decode. –Decoding time proportional to k ln k. –On average, ln k time to produce an encoding symbol.

LT Codes Conclusion: making the graph implicit gives us an almost ideal digital fountain. One remaining issue: why does average degree need to be around ln k? –Standard coupon collector’s problem: for each message symbol to be hit by some equation, need k ln k variables in the equations. Can remove this problem by pre-coding.

Rateless/Raptor Codes Pre-coding independently described by Shokrollahi, Maymoukov. Rough idea: –Expand original k message symbols to k (1+  ) symbols using (for example) a Tornado code. –Now use an LT code on the expanded message. –Don’t need to recover all of the expanded message symbols, just enough to recover original message.

Raptor/Rateless Codes Properties: –“Infinite” supply of packets possible. –Need k(1+  ) symbols to decode, for some  > 0. –Decoding time proportional to k ln (1/  ). –On average, ln (1/  ) (constant) time to produce an encoding symbol. –Very efficient. Raptor codes give, in practice, a digital fountain.

Impact on Coding These codes are examples of low-density parity-check (LDPC codes). Subsequent work: designed LDPC codes for error-correction using these techniques. Recent developments: LDPC codes approaching Shannon capacity for most basic channels.

Putting Digital Fountains To Use Digital fountains are out there. –Digital Fountain, Inc. sells them. Limitations to their use: –Patent issues. –Perceived complexity. Lack of reference implementation. –What is the killer app?

Patent Issues Several patents / patents pending on irregular LDPC codes, LT codes, Raptor codes by Digital Fountain, Inc. Supposition: this stifles external innovation. –Potential threat of being sued. –Potential lack of commercial outlet for research. Suggestion: unpatented alternatives that lead to good approximations of a digital fountain would be useful. –There is work going on in this area, but more is needed to keep up with recent developments in rateless codes.

Perceived Complexity Digital fountains are now not that hard… …but networking people do not want to deal with developing codes. A research need: –A publicly available, easy to use, reasonably good black box digital fountain implementation that can be plugged in to research prototypes. Issue: patents. –Legal risk suggests such a black box would need to be based on unpatented codes.

What’s the Killer App? Multicast was supposed to be the killer app. –But IP multicast was/is a disaster. –Distribution now handled by contend distributions companies, e.g. Akamai. Possibilities: –Overlay multicast. –Big wireless: e.g. automobiles, satellites. –Others???

Conclusions, Part I Stop thinking of data as an ordered stream of bytes. Think of data as a digital fountain. Digital fountains are implementable in practice with erasure codes.

A Short Breather We’ve covered digital fountains. Next up: –Digital fountains for overlay networks. –And other tricks! Pause for questions, 30 second stretch.

Overlays for Content Delivery A substitute for IP multicast. Build distribution topology out of unicast connections (tunnels). Requires active participation of end-systems. Native IP multicast unnecessary. Saves considerable bandwidth over N * unicast solution. Basic paradigm easy to build and deploy. Bonus: Overlay topology can adapt to network conditions by self-reconfiguration. SOURCE

Limitations of Existing Schemes Tree-like topologies. –Rooted in history (IP Multicast). –Limitations: bandwidth decreases monotonically from the source. losses increase monotonically along a path. Does this matter in practice? –Anecdotal and experimental evidence says yes: Downloads from multiple mirror sites in parallel. Availability of better routes. Peer-to-peer: Morpheus, Kazaa and Grokster.

An Illustrative Example 1. A basic tree topology Harnessing the power of parallel downloads Incorporating collaborative transfers. 3

Our Philosophy Go beyond trees. –Use additional links and bandwidth by: downloading from multiple peers in parallel taking advantage of “perpendicular” bandwidth –Has potential to significantly speed up downloads… But only effective if: –collaboration is carefully orchestrated –methods are amenable to frequent adaptation of the overlay topology

Suitable Applications Prerequisite conditions: –Available bandwidth between peers. –Differences in content received by peers. –Rich overlay topology. Applications –Downloads of large, popular files. –Video-on-demand or nearly real-time streams. –Shared virtual environments.

Use Digital Fountains! Intrinsic resilience to packet loss, reordering. Better support for transient connections via stateless migration, suspension. Peers with full content can always generate useful symbols. Peers with partial content are more likely to have content to share. ButBut using a digital fountain comes at a price: –Content is no longer an ordered stream. –Therefore, collaboration is more difficult.

Informed Content Delivery: Definitions and Problem Statement Peers A and B have working sets of symbols S A, S B drawn from a large universe U and want to collaborate effectively. Key components: 1) 1) Summarize: Furnish a concise and useful sample of a working set to a peer. 2) 2) Approximately Reconcile: Compute as many elements in S A - S B as possible and transmit them. Do so with minimal control messaging overhead.

Summarization Goal: each peer has a 1 packet calling card. –Can be used to estimate |S A – S B |. One possibility: random sampling. –B sends A a random sample of k elements of S B. –Each element is in S A with probability –Negative: must search S A. –Negative: hard to work with multiple summaries. Alternative: min-wise independent sampling.

Min-Wise Summaries Let U be the set of 64 bit numbers, and  be a random permutation on U. Then Calling card for A: keep vector of k values min  j (A), j=1…k. To estimate, count the j for which min  j (A) = min  j (B), divide by k.

Min-Wise Summaries : Example

Recoding: An Intermediate Solution Problem: What to transmit when peers have similar content? Solution: Allow peers to probabilistically “hedge their bets,” minimizing chance of transmission of useless content. Example: S A Suppose the resemblance between S A and S B is 0.9. If A sends a symbol at random the probability of it being useful to B is 0.1. A better strategy is to XOR 10 random symbols together. B can extract one useful symbol with probability: 10 x ( 1 / 10 ) x ( 9 / 10 ) 9 > 1/e  0.37

Approximate Reconciliation Suppose summarization suggests collaboration is worthwhile. Goal: compute as many elements in S A - S B as possible, with low communication. Idea: we do not need all of S A - S B, just as much as possible. –Use Bloom filters.

Lookup Problem Given a set S A = {x 1,x 2,x 3,…x n } on a universe U, want to answer queries of the form: Bloom filter provides an answer in –“Constant” time (time to hash). –Small amount of space. –But with some probability of being wrong.

Bloom Filters Start with an m bit array, filled with 0s. Hash each item x j in S k times. If H i (x j ) = a, set B[a] = B B To check if y is in S, check B at H i (y). All k values must be B B Possible to have a false positive; all k values are 1, but y is not in S.

Errors Assumption: We have good hash functions, look random. Given m bits for filter and n elements, choose number k of hash functions to minimize false positives: –Let As k increases, more chances to find a 0, but more 1’s in the array. Find optimal at k = (ln 2)m/n by calculus.

Example m/n = 8 Opt k = 8 ln 2 =

Bloom Filters for Reconciliation B transmits a Bloom filter of its set to A; A then sends packets from the set difference. –All elements will be in difference: no false negatives. –Not all element in difference found: false pos. Improvements –Compressed Bloom filters –Approximate Reconciliation Trees

Experimental Scenarios Three methods for collaboration Uninformed – Uninformed: A transmits symbols at random to B. Speculative – Speculative: B transmits a minwise summary to A; A then sends recoded symbols to B. Reconciled – Reconciled: B transmits a Bloom filter of its set to A; A then sends packets from the set difference. Overhead: –Decoding overhead: with erasure codes, fixed 2.5%. –Reception overhead: useless duplicate packets. –Recoding overhead: useless recoding packets. symbols received - symbols needed symbols needed

Pairwise Reconciliation Containment of B in A: |S A  S B | |S B | 128MB file 96K input symbols 115K distinct symbols in system initially

Four peers in parallel 128MB file 96K input symbols 105K distinct symbols in system initially Containment of B in A: |S A  S B | |S B |

Four peers, periodic updates 128MB file 96K input symbols 105K distinct symbols in system initially Filters updated at every 10%. Containment of B in A: |S A  S B | |S B |

Subsequent Work Maymounkov: each source sends a stream of consecutive encoded packets. –Possibly simplifies collaboration, with loss of flexibility. Bullet (SOSP ’03): –An implementation with our ideas, plus purposeful distribution of different content. Network coding –Nodes inside the network can compute on the input, rather than just the endpoints. –Potentially more powerful paradigm –Practice?

Conclusions whatEven with ultimate routing topology optimization, the choice of what to send is paramount to content delivery. Digital fountain model ideal for fluid and ephemeral network environments. Collaborations with coded content worthwhile. Richly connected topologies are key to harnessing perpendicular bandwidth. Wanted: more algorithms for intelligent collaboration.

Why regular graphs are bad d Left node has on average neighbors of degree one. Right degree 2d implies Pr[right degree =1] =

Irregular Graphs Random Permutation of the Edges Degree 2 Degree 3 Degree 4 Degree 1 Degree 4 Degree 5 Degree 6 Degree 10

Degree Sequence Functions Left Side – fraction of edges of degree i on the left in the original graph. Right Side – fraction of edges of degree i on the right in the original graph.

Irregular Graph Analysis LeftRightLeft

Irregular Graph Condition Want: y < x for all 0 < x < 

Good Left Degree Sequence: Truncated Heavy Tail D = 9, N = Fraction of nodes of degree i is Average node degree is

Good Right Degree Sequence: Poisson Average node degree is

Good Degree Sequence Functions Want: y < x for all 0 < x <  Works for

Tornado Code Performance Time overhead (Average left degree) Reception Efficiency

Why irregular graphs are good D+1 Left node of max degree has on average one neighbor of degree one. Average right degree 2ln(D) implies Pr[right degree =1]  1/(D+1)

Digital Fountains: A Survey and Look Forward Michael Mitzenmacher

Goals of the Talk Explain the digital fountain paradigm for network communication. Examine related advances in coding. Summarize work on applications. Speculate on what comes next.

How Do We Build a Digital Fountain? We can construct (approximate) digital fountains using erasure codes. –Including Reed-Solomon, Tornado, LT, fountain codes. Generally, we only come close to the ideal of the paradigm. –Streams not truly infinite; encoding or decoding times; coding overhead.

Reed-Solomon Codes In theory, can produce an unlimited number of encoding symbols, only need k to recover. In practice, limited by: –Field size (usually 256 or 65,536) –Quadratic encoding/decoding times These problems ameliorated by striping data. –But raises overhead; now many more than k packets required to recover. Conclusion: may be suitable for some applications, but far from practical or theoretical goals of a digital fountain.

Tornado Codes Irregular low-density parity check codes. Based on graphs: k input symbols lead to n encoding symbols, using XORs. –Sparse set of equations derived from input symbols. –Solve received set of equations using back substitution. Properties: –Graph of size n agreed on by encoder, decoder, and stored. –Need k(1+  ) symbols to decode, for some  > 0. –Encoding/decoding time proportional to n ln (1/  ).

Tornado Codes: Weaknesses Encoding size n must be fixed ahead of time. Memory, encoding and decoding times proportional to n, not k. Overhead factor of (1+  ). –Hard to design around. In practice  = Conclusion: Tornado codes a dramatic step forward, allowing good approximations to digital fountains for many applications. Key problem: fixed encoding size.

LT Codes Key idea: graph is implicit, rather than explicit. –Each encoding symbol is the XOR of a random subset of neighbors, independent of other symbols. –Each encoding symbols carries a small header, telling what message symbols it is the XOR of. No initial graph; graph derived from received symbols. Properties: –“Infinite” supply of packets possible. –Need k +o(k) symbols to decode. –Decoding time proportional to k ln k. –On average, ln k time to produce an encoding symbol.

LT Codes Conclusion: making the graph implicit gives us an almost ideal digital fountain. One remaining issue: why does average degree need to be around ln k? –Standard coupon collector’s problem: for each message symbol to be hit by some equation, need k ln k variables in the equations. Can remove this problem by pre-coding.

Rateless/Raptor Codes Pre-coding independently described by Shokrollahi, Maymoukov. Rough idea: –Expand original k message symbols to k (1+  ) symbols using (for example) a Tornado code. –Now use an LT code on the expanded message. –Don’t need to recover all of the expanded message symbols, just enough to recover original message.

Raptor/Rateless Codes Properties: –“Infinite” supply of packets possible. –Need k(1+  ) symbols to decode, for some  > 0. –Decoding time proportional to k ln (1/  ). –On average, ln (1/  ) (constant) time to produce an encoding symbol. Conclusion: these codes can be made very efficient and deliver on the promise of the digital fountain paradigm.

Applications Long-distance transmission (avoiding TCP) Reliable multicast Parallel downloads One-to-many TCP Content distribution on overlay networks Streaming video

Point-to-Point Data Transmission TCP has problems over long-distance connections. –Packets must be acknowledged to increase sending window (packets in flight). –Long round-trip time leads to slow acks, bounding transmission window. –Any loss increases the problem. Using digital fountain + TCP-friendly congestion control can greatly speed up connections. Separates the “what you send” from “how much” you send. –Do not need to buffer for retransmission.

Reliable Multicast Many potential problems when multicasting to large audience. –Feedback explosion of lost packets. –Start time heterogeneity. –Loss/bandwidth heterogeneity. A digital fountain solves these problems. –Each user gets what they can, and stops when they have enough.

Downloading in Parallel Can collect data from multiple digital fountains for the same source seamlessly. –Since each fountain has an “infinite” collection of packets, no duplicates. –Relative fountain speeds unimportant; just need to get enough. –Combined multicast/multigather possible.

One-to-Many TCP Setting: Web server with popular files, may have many open connections serving same file. –Problem: has to have a separate buffer, state for each connection to handle retransmissions. –Limits number of connections per server. Instead, use a digital fountain to generate packets useful for all connections for that file. Separates the “what you send” from “how much” you send. –Do not need to buffer for retransmission. Keeps TCP semantics, congestion control.

Distribution on Overlay Networks Encoded data make sense for overlay networks. –Changing, heterogeneous network conditions. –Allows multicast. –Allows downloading from multiple sources, as well as peers. Problem: peers may be getting same encoded packets as you, via the multicast. –Not standard digital fountain paradigm. Requires reconciliation techniques to find peers with useful packets.

Video Streaming For “near-real-time” video: –Latency issue. Solution: break into smaller blocks, and encode over these blocks. –Equal-size blocks. –Blocks increases in size geometrically, for only logarithmically many blocks. Engineering to get right latency, ensure blocks arrive on time for display.