Packet audio playout delay adjustment Performance bounds and algorithms Moon, Kurose, Towsley
Overall Idea Because of packet jitter/delay changes, we need a playout buffer Because of packet jitter/delay changes, we need a playout buffer The bigger, the better The bigger, the better But, a large buffer hinders responsive transmission of audio But, a large buffer hinders responsive transmission of audio 400ms/5% loss for voice conversation 400ms/5% loss for voice conversation Interactive media/video conferencing needs the smallest buffers possible Interactive media/video conferencing needs the smallest buffers possible
A solution, and an approximation In the first part of the paper, they give bounds on the size of playout buffer needed under certain losses In the first part of the paper, they give bounds on the size of playout buffer needed under certain losses Not an online algorithm, computationally expensive (the idea is to focus on percentages) Not an online algorithm, computationally expensive (the idea is to focus on percentages) Inelastic medium Inelastic medium In the second part of the paper, they present an on-line algorithm that is computationally feasible to adjust talkspurt playout delay In the second part of the paper, they present an on-line algorithm that is computationally feasible to adjust talkspurt playout delay
Related Work Playout delay adjustments Playout delay adjustments Per-packet and per-talkspurt (assumptions… speech, or music?) Per-packet and per-talkspurt (assumptions… speech, or music?) Network level observations Network level observations Three graphs, probe compression Three graphs, probe compression Baseline doesn’t change much—real advantages in adjusting delay playout occurs in multi- talkburst delay spikes Baseline doesn’t change much—real advantages in adjusting delay playout occurs in multi- talkburst delay spikes
Problem Statement For a given set of losses at the receiver, we get to set the playout delays of each talkspurt anyway we want For a given set of losses at the receiver, we get to set the playout delays of each talkspurt anyway we want Which assignment is the best? Which assignment is the best? For 1 packet lost? 2? 3? 134? For 1 packet lost? 2? 3? 134? First, let’s fix some notation First, let’s fix some notation
Notation t k i – sender timestamp of ith packet of kth talkspurt t k i – sender timestamp of ith packet of kth talkspurt a k i – receiver timestamp of ith packet of kth talkspurt a k i – receiver timestamp of ith packet of kth talkspurt n k – num packets in kth talkspurt (received) n k – num packets in kth talkspurt (received) N – total number of packets in trace (Σ k n k ) N – total number of packets in trace (Σ k n k ) p k i (A) – playout time under algorithm A p k i (A) – playout time under algorithm A Delay: p k i (A) – t k i, loss if p k i (A) < a k i Delay: p k i (A) – t k i, loss if p k i (A) < a k i Indicator if packet is played: Indicator if packet is played: r k i (A) r k i (A)
Notation, (con’t) Total # packets played under A Total # packets played under A N(A) = Σ k M Σ i nk r k i (A) N(A) = Σ k M Σ i nk r k i (A) Average playout delay: Average playout delay: 1/N(A) Σ k M Σ i nk r k i (A)(p k i (A) – t k i ) 1/N(A) Σ k M Σ i nk r k i (A)(p k i (A) – t k i ) Loss rate: Loss rate: l = (N – N(A)) / N * 100 l = (N – N(A)) / N * 100
Notation (con’t) d’ k i : delay between sending and receiving d’ k i : delay between sending and receiving d’: min (d’ k i ) d’: min (d’ k i ) d k i : normalized delay = d’ k i – d’ d k i : normalized delay = d’ k i – d’ d k (i) : ith smallest normalized delay d k (i) : ith smallest normalized delay
Off-line solution w/o collisions To play i packets from the kth talkspurt, the playout delay must be at least (the unknowable) d k (i) To play i packets from the kth talkspurt, the playout delay must be at least (the unknowable) d k (i) Remember that if algorithm A uses a large playout delay for one talkspurt, it could delay subsequent talkspurts (collisions) Remember that if algorithm A uses a large playout delay for one talkspurt, it could delay subsequent talkspurts (collisions) Let’s ignore them for now Let’s ignore them for now Time: O(MN 2 ) Space: O(MN) Time: O(MN 2 ) Space: O(MN)
Off-line solution w/o collisions We assume percentages of loss, not actual loss patterns (to simplify the complexity) We assume percentages of loss, not actual loss patterns (to simplify the complexity) D(k,i) is min playout delay for i packets lost D(k,i) is min playout delay for i packets lost D(k,i) = D(k,i) = 0 if i = 0 0 if i = 0 d k (i) if k = M and i <= n M d k (i) if k = M and i <= n M inf if k = M and i > n M inf if k = M and i > n M min (((i-j)D(k+1,i-j) + jd k (j ))/i) min (((i-j)D(k+1,i-j) + jd k (j ))/i) Proof by contradiction Proof by contradiction
Offline algorithm with collisions We might have to adjust the playout times of some of the talkspurts due to collisions, so D must now take those into account We might have to adjust the playout times of some of the talkspurts due to collisions, so D must now take those into account We define a vector S (captures length of silence) We define a vector S (captures length of silence) We can capture the sum of the increases We can capture the sum of the increases Now D includes C as well (C tracks packets played out at every step of the computation) Now D includes C as well (C tracks packets played out at every step of the computation) D now differs from the old D only in the extra delays incurred by the collisions D now differs from the old D only in the extra delays incurred by the collisions The new D does not capture the optimal, though (why?) The new D does not capture the optimal, though (why?) Time: O(M 2 N 2 ) Space: O(M 2 N 2 ) Time: O(M 2 N 2 ) Space: O(M 2 N 2 )
An online algorithm Algorithm 1: Linear Algorithm 1: Linear Slow to catch up, good at maintaining a solid value Slow to catch up, good at maintaining a solid value Algorithm 2: Depends on spike detection Algorithm 2: Depends on spike detection Quick at catching up, but sometimes overzealous Quick at catching up, but sometimes overzealous Algorithm 3: Two Modes Algorithm 3: Two Modes Track spikes when they are detected Track spikes when they are detected Otherwise update delay and delay varience (q) Otherwise update delay and delay varience (q) Switch when you have a multiple of the delay Switch when you have a multiple of the delay
Evaluation / Conclusion They instrument the senders and the receivers They instrument the senders and the receivers Plot average playout delay vs packet loss rate Plot average playout delay vs packet loss rate Results seem to show that Algorithm 3 gets very close to the optimal Results seem to show that Algorithm 3 gets very close to the optimal However, the results are very close much of the time However, the results are very close much of the time Sometimes 1 is much worse, sometimes 2, but 3 seems to always be pretty stable Sometimes 1 is much worse, sometimes 2, but 3 seems to always be pretty stable
Queue Monitoring A Delay Jitter Management Policy Stone, Jeffay
Display and e2e Jitter Recall the steps for transmitting video: Recall the steps for transmitting video: Acquire, digitize, compress, transmit, decompressed, buffer, display Acquire, digitize, compress, transmit, decompressed, buffer, display Display Latency is acquire to display Display Latency is acquire to display e2e latency is acquire to buffer e2e latency is acquire to buffer What problems can affect this process? What problems can affect this process? Delay Jitter (variance in e2e latency) Delay Jitter (variance in e2e latency) Can we ensure constant e2e latency? Can we ensure constant e2e latency? Even with Isochronous service models? Even with Isochronous service models? We’re going to adjust the display latency instead We’re going to adjust the display latency instead
Audio vs video Recall the audio application Recall the audio application Talkspurts vs Silence Periods Talkspurts vs Silence Periods Analog for video? Analog for video? Are gaps ok during the transmission? Are gaps ok during the transmission? Display perception Display perception Network congestion Network congestion Video as a datatype Video as a datatype Can we repeat frames, leave black spaces, etc? Can we repeat frames, leave black spaces, etc?
Late policies I-policy: I-policy: Discard Discard All frames now have the same display latency All frames now have the same display latency Static Static E-policy: E-policy: Play at earliest convenience Play at earliest convenience Increases latency for subsequent frames Increases latency for subsequent frames Keeps getting higher than observed e2e delay Keeps getting higher than observed e2e delay
Example
Example
I-vs-E I policy’s advantage I policy’s advantage Low jitter and bursts Low jitter and bursts E policy’s advantage E policy’s advantage Good during high latency and low latency, but not good after bursts Good during high latency and low latency, but not good after bursts Hybrid approach: Queue Monitoring Hybrid approach: Queue Monitoring
Queue Monitoring When displaying a frame When displaying a frame Thresholding operation Thresholding operation If qlen is m, then counters 1 through m-1 are incremented If qlen is m, then counters 1 through m-1 are incremented All others are reset All others are reset When the counter exceeds a value, the oldest frame is discarded When the counter exceeds a value, the oldest frame is discarded If the queue has contained more than n frames, then we can reduce the latency (the jitter is stable) If the queue has contained more than n frames, then we can reduce the latency (the jitter is stable) Large variations occur infrequently and smaller variations occur more frequently (still true today)? Large variations occur infrequently and smaller variations occur more frequently (still true today)?
Evaluation The inherent difficulty The inherent difficulty Gaps vs display latency Gaps vs display latency Lexocographic ordering for two axes Lexocographic ordering for two axes Average gap rate Average gap rate Average display latency Average display latency Experimental Design Experimental Design “academic computer science” network “academic computer science” network Time of day, workload seen Time of day, workload seen
Evaluation Results Comparison between I2, I3, and E Comparison between I2, I3, and E Usually the same or better Usually the same or better Except for incomparable results Except for incomparable results In comparison to the E-policy, it seems to be workload/network dependent In comparison to the E-policy, it seems to be workload/network dependent Instantaneous gap rate, delay policy would be better (perhaps) Instantaneous gap rate, delay policy would be better (perhaps) More adaptive I-policy More adaptive I-policy More tests, of course More tests, of course Addressing ad-hoc quality measures Addressing ad-hoc quality measures