Download presentation
Presentation is loading. Please wait.
1
1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens Philips Research Laboratories, the Netherlands
2
2 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments Overview
3
3 Introduction worst-case load ? How can we make the best of it? -Video processing in software -User expects high-quality output -Deadlines on the completion times of frames -Many video algorithms show a highly fluctuating load -Given a fixed processing-time budget, lower than worst-case load
4
4 Our Approach Introduction 1. Asynchronous, work-preserving processing Using buffers 2. Scalable Video Algorithm (SVA) Frames can be processed at different quality levels Trade-off picture quality and processing needs 3. Soft real-time task, hence we allow occasional deadline misses 4. QoS trade-off Deadline misses Picture quality Quality fluctuations QoS measure reflects user-perceived quality
5
5 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
6
6 ….. SVA-controller interaction SVA process frame ….. QoS Control Problem ….. quality level new frame Controller: select quality level:
7
7 QoS Control Problem Real-Time Processing Example time 1245 deadline miss time 12453 blocked
8
8 Revenue for processed frame QoS Control Problem Sum of: reward for selected quality level penalty for each deadline miss penalty for changing the quality level 50 500 5,000 10 50 500 100 10 50 1,000 100 10 Current quality level Previous quality level 46810 Current quality level Deadline miss: 10,000 Revenue 8 - 10,000 - 100 = -10,092
9
9 QoS measure QoS Control Problem - Average revenue per frame - Reflects the user-perceived quality, provided that the revenue - parameters are well chosen - At each decision point, select the quality level - Goal: maximize the QoS measure - Difficult on-line problem: - what will the future bring? QoS Control Problem
10
10 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
11
11 Agent-Environment Interaction Reinforcement Learning Environment (SVA) Agent (Controller) Action State Action State Revenue Action Revenue State
12
12 Reinforcement Learning Agent’s goal -Maximize the expected return -Discounted return at time step t Selecting actions - Policy : - stores for each state a single action - to be chosen infinite time horizon discount parameter
13
13 Reinforcement Learning Markov Decision Process - We assume a memoryless state signal: state and action predict state and revenue - Hence, the reinforcement learning task is a - Markov Decision Process (MDP) - We assume a finite MDP: -Finite state set, finite action set -One-step dynamics:
14
14 Reinforcement Learning Value functions - State value of state under policy -Action value of state under policy -A policy is better than or equal to a policy if -We are looking for an optimal policy
15
15 Reinforcement Learning Solution approach - Compute an optimal policy OFFLINE (= before ) -Requires transition probabilities -Requires expected revenues -Algorithms: policy iteration, value iteration, ….. - Compute an optimal policy ONLINE, at the discrete time steps, - using the experienced states and revenues -Algorithms: SARSA, Q-Learning, …..
16
16 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
17
17 Select quality level State at start point 0.050.10 0.250.300.150.05 transition probabilities, expected revenues Offline Approach State at next start point -87.5-65.20.35.69.611.412.1 - If the transition probabilities and expected revenues are known, - an optimal policy can be computed
18
18 Offline Approach - Decision moments = start points - State - progress interval (discrete!!!) - previous quality level - Action = select quality level - Transition probabilities and expected revenues: computed using processing-time statistics time 1245 - Progress: measure for the amount of budget left until the deadline - of the frame to be processed
19
19 Offline Approach - Given this model, we use value iteration to compute - an optimal policy for a particular value of the budget
20
20 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
21
21 Online Approach - Based on learning Q-values - State = progress, previous quality level - Action = select quality level - At each decision point - Given the state transition, action, and revenue, - the controller first updates (learns) Q-value - Next, given state, the controller selects action for which is maximal -Default: one Q-value updated; also do exploring actions Q-Learning
22
22 progress previous quality level State Space - Progress (continuous) - Previous quality level - We learn Q-values - only for a small set - of states - To select the quality - level, given the state, - we interpolate between - the learned Q-values Online Approach
23
23 progress previous quality level Learning - Progress delta given for - current state - - Calculate delta for all - progress points Online Approach t t+1
24
24 progress Learning (ctnd) - Estimate effect of - other actions - Hence: all Q(s,a) - values updated - in each step! - (no exploration - needed) Online Approach action
25
25 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
26
26 Handling Load Fluctuations - Both approaches implicitly assume that processing times of - successive frames are mutually independent -Result: both approaches perform sub-optimal - TRUE for stochastic load fluctuations - NOT TRUE for structural load fluctuations
27
27 Scaled budget enhancement Handling Load Fluctuations 1.At each decision point, compute the complication factor = proc.time of frame / expected proc.time for applied quality level 2.Filter out the stochastic load fluctuations 3.Compute the scaled budget = budget / structural load
28
28 Scaled budget enhancement Handling Load Fluctuations Adapt offline strategy - Compute a policy for many different values of the budget b - During run time, at each decision point: - Compute the scaled budget - Compute the state of the SVA - Apply the policy corresponding to the scaled budget, - and use the state to select the quality level - Interpolate between policies Adapt online strategy - Add scaled budget directly to the state
29
29 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
30
30 Simulation Experiments -Scalable MPEG-2 decoder -TriMedia 1300-180MHz platform -Quality levels (based on IDCT pruning): -Sequence `TV’ -(five episodes of ‘Allo ‘Allo, 230,936 frames, 2.5 hours) -Latency: 3 periods (= work ahead of at most 2 periods) -Control strategies: OFFLINE, OFFLINE*, ONLINE*, Q0,…,Q3 -For each control strategy, we simulate processing sequence `TV’ -for a fixed value of the processing-time budget -Revenues: based on input of video experts (slide 8)
31
31 Average Revenue Simulation experiments 28.3 29.5 29.8 34.0 35.4
32
32 Deadline misses Simulation experiments
33
33 Quality-level usage Simulation experiments
34
34 Budget usage Simulation experiments
35
35 Cross-trace simulations Simulation experiments
36
36 Introduction QoS Control Problem Reinforcement Learning Offline Approach Online Approach Handling Load Fluctuations Conclusion Simulation Experiments
37
37 Conclusion -Problem -Video processing algorithm with highly fluctuating load -Fixed processing-time budget, lower than worst-case needs -How to optimize the user-perceived quality? -Approach -Asynchronous work-preserving processing -Scalable video algorithm -QoS trade off: deadline misses, processing quality, quality fluctuations -Control strategies -Offline, online, scaled budget enhancement -Simulation experiments -OFFLINE* and ONLINE* perform close to optimum -OFFLINE* and ONLINE* are independent of the applied statistics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.