RT-OPEX: Flexible Scheduling for Cloud-RAN Processing

Slides:



Advertisements
Similar presentations
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
1 “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation In Multi-processor Real-Time Systems” Dakai Zhu, Rami Melhem, and Bruce Childers.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
© NOKIAProduced as informative material for 3GPP RAN WG1 meeting No. 2 Downlink Shared Channel - DSCH DSCH associated with a dedicated channel (DCH) Downlink.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
SLA-Oriented Resource Provisioning for Cloud Computing
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
SLA-aware Virtual Resource Management for Cloud Infrastructures
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Chapter 13 Embedded Systems
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
1 Real-Time Queueing Network Theory Presented by Akramul Azim Department of Electrical and Computer Engineering University of Waterloo, Canada John P.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Low-Power Wireless Sensor Networks
Improving Network I/O Virtualization for Cloud Computing.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Performance evaluation of adaptive sub-carrier allocation scheme for OFDMA Thesis presentation16th Jan 2007 Author:Li Xiao Supervisor: Professor Riku Jäntti.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
ASPLOS’02 Presented by Kim, Sun-Hee.  Technology trends ◦ The rate of frequency scaling is slowing down  Performance must come from exploiting concurrency.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
Building Wireless Efficient Sensor Networks with Low-Level Naming J. Heihmann, F.Silva, C. Intanagonwiwat, R.Govindan, D. Estrin, D. Ganesan Presentation.
Unit - I Real Time Operating System. Content : Operating System Concepts Real-Time Tasks Real-Time Systems Types of Real-Time Tasks Real-Time Operating.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Michael Einhaus, ComNets, RWTH Aachen University Distributed and Adjacent Subchannels in Cellular OFDMA Systems Michael Einhaus Chair of Communication.
Real-Time Operating Systems RTOS For Embedded systems.
CHaRy Software Synthesis for Hard Real-Time Systems
NFV Compute Acceleration APIs and Evaluation
LTE Long Term Evolution
REAL-TIME OPERATING SYSTEMS
Chapter 6: CPU Scheduling (Cont’d)
Architecture and Algorithms for an IEEE 802
Prabhat Kumar Saraswat Paul Pop Jan Madsen
ARQ Proxy for Cross-Layer Error Control Optimization in 3G LTE
Wayne Wolf Dept. of EE Princeton University
Impact of LTE in Unlicensed Spectrum on Wi-Fi
Task Scheduling for Multicore CPUs and NUMA Systems
R : SRS Enhancements for LTE-A
LTE Long Term Evolution
3GPP TSG RAN Meeting #67 Shanghai, China, 9 – 12 March, 2015
Long Term Evolution (LTE)
DETAILED SYSTEM DESIGN
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Short Circuiting Memory Traffic in Handheld Platforms
Abusayeed Saifullah*, Mahbubur Rahman*, Dali Ismail,
Improved schedulability on the ρVEX polymorphic VLIW processor
Department of Computer Science University of California, Santa Barbara
Page Replacement.
Towards IEEE HDR in the Enterprise
TESTNG TECHNIQUES FOR NB-IOT PHYSICAL LAYER
CSCI1600: Embedded and Real Time Software
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
CSCI1600: Embedded and Real Time Software
Department of Electrical Engineering Joint work with Jiong Luo
E-MiLi: Energy-Minimizing Idle Listening in Wireless Networks
Department of Computer Science University of California, Santa Barbara
Guaranteeing Message Latencies on Controller Area Network (CAN)
FAST: Frequency-Aware Static Timing Analysis
COMP755 Advanced Operating Systems
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

RT-OPEX: Flexible Scheduling for Cloud-RAN Processing Krishna C. Garikipati, Kassem Fawaz, Kang G. Shin University Of Michigan

What is Cloud-RAN*? Virtualization in Radio Access Network (RAN) Benefits Lower energy consumption (compute, HVAC) Less site visits faster upgrade and replacement cycles Advanced signal processing fronthaulnetwork * C-RAN

C-RAN in Practice

Deadlines Periodic (sub)frames every 1 ms Hard deadline of 3ms Transport, decode and respond to LTE uplink frame Requires real-time scheduling fronthaulnetwork ACK ACK ACK

C-RAN Scheduling Assign subframes to cores core 0 core 1 core 2 core N BS 0 – subframe 0 BS 1 – subframe 0 BS 0 – subframe 1 BS 1 – subframe 1 Per-node scheduler . . . core 0 core 1 core 2 core N BS 0 Core network BS 0 BS 1 scheduler BS 1 Assign basestations to computing nodes

State-of-the-Art Partitioned Global Parallelism Scheduling CloudIQ Assumes fixed processing time Partitioned PRAN High runtime overhead Global WiBench Bigstation Scheduler-agnostic Parallelism Scheduling Architecture

Real-world Traffic Two scheduling options: Max load Band 17 Band 13 Max load Two scheduling options: Design for WCET  overprovision resources Design for average case  deadline misses

RT-OPEX Offers flexible scheduling for C-RAN Combines offline partitioned scheduling with runtime parallelism (work stealing) Achieves resource pooling at finer time scale Avoids over-provisioning of resources

E2E model Scheduling Implementation Uplink processing Parallelism Deadline misses Scheduling RT-OPEX design Leverage model for processing time Implementation Evaluation Platform Performance gains Overhead

End to End Model

Uplink Processing Model Dominating terms Error term FFT, Equalization Turbo-decoding Error Model LTE processing in software N = # antennas K = modulation order D = bits per carrier (load) L = decoding iterations Dominating terms FFT, Equalization, Turbo decoding Error term Platform variations (kernel tasks/interrupt handling) Comparable to benchmark stress test De-mapping, De-matching 𝑤0 𝑤1 𝑤2 𝑤3 𝑟2 GPP (𝜇𝑠) 31.4 169.1 49.7 93.0 0.992

Parallelism Decoder Block FFT Independent w.r.t code blocks Independent w.r.t antenna and OFDM symbols

Parallelism Task Model Divide tasks into parallel and independent subtasks Parallel processing Precedence constraints

End-to-End Model Assuming Tx processing starts 1ms before deadline 𝑇 𝑟𝑥𝑝𝑟𝑜𝑐 RTT/2 RTT/2 Assuming Tx processing starts 1ms before deadline 𝑇 𝑟𝑥𝑝𝑟𝑜𝑐 + 𝑇 𝑓𝑟𝑜𝑛𝑡ℎ𝑎𝑢𝑙 + 𝑇 𝑐𝑙𝑜𝑢𝑑 ≤2𝑚𝑠 RTT/2

Scheduling

Conventional Approaches Static Global Deterministic, offline Offers real-time guarantees Deadline miss: 𝑇 𝑟𝑥𝑝𝑟𝑜𝑐 ≥ 𝑇 𝑚𝑎𝑥 Single-queue of subframes FIFO (or EDF) de-queuing Non-deterministic, flexible No real-time guarantees

Scheduling Gaps WCET design + non-optimal design  gaps in execution

RT-OPEX Exploit the gaps dynamically at runtime core is idle

RT-OPEX Migration Subtasks migrated to cores with enough slack time Start migration Subtasks migrated to cores with enough slack time Local processing does not wait for migrated task Ensures no performance degradation Otherwise perform recovery Core 0 Local FFT Local FFT decode Core 1 Core 2 Core 3 Core 4

Implementation & evaluation

RT-OPEX Implementation OpenAirInterface (LTE Rel 10) Modularize the tasks Abstraction of FFT, Demod, Decode Utilize pthread library Migration Data references from shared memory Open-source Enables different configurations https://github.com/gkchai/RT-OPEX

Evaluation Platform GPP LTE data collection 32-core Intel Xeon E5, 128 GB RAM, 15 MB L3 cache Ubuntu 14.0.4 low latency kernel LTE data collection USRP to collect load of 4 cellular towers 30000 subframes Replay load from each BS trace 4 BS, 2 Antennas, 10MHz LTE FDD 1 UE per BS, 100% PRB utilization Simulated transport delay (RTT/2)

Performance Evaluation Performance Comparison Large gaps Narrower gaps

Migration Overhead FFT median overhead is 26𝜇𝑠 Decoding overhead is 20 𝜇𝑠 Overhead = cost of transfer OAI variables from shared memory to core Account for overhead at migration

Partitioned Scheduler RTT/2 > 400𝜇𝑠  Budget<1.6ms  subframes with MCS > 20 miss deadlines Partitioned scheduler cannot exploit gaps

Global Scheduler Fails to deliver performance gains Cache thrashing causes deadline performance to saturate beyond 8 cores At MCS 27, processing time increases with more cores

Conclusion RT-OPEX: Real-Time Opportunistic Execution Low overhead Migration on top of partitioned Flexible to resources Exploits added resources for migration Flexible to load Leverages load variations to improve deadline miss rate

Thank You! Questions?

RT-OPEX Performance Lower RTT  larger gaps Larger RTT  narrower gaps migrate decode tasks of high MCS  deadline miss goes to zero migrate only FFT subtasks  deadline miss reduced

Transport Latency Latency between and Radio Fronthaul ( 𝑇 𝑓𝑟𝑜𝑛𝑡ℎ𝑎𝑢𝑙 ): Fixed latency (~20us/Km) Cloud network latency ( 𝑇 𝑐𝑙𝑜𝑢𝑑 ): Switch, Ethernet and driver delay Latency per packet Average 0.15ms 1Gbps Ethernet to switch 1/10 Gbps Ethernet to GPP

Uplink Processing Dynamic and depends on: MCS selection Number of antennas SNR of channel 0.5𝑚𝑠 2.8x increase w.r.t MCS 0.5ms increase w.r.t L 50% increase w.r.t SNR 200𝜇𝑠 per antenna

RT-OPEX Performance At miss rate threshold ≤ 0.01, RT-OPEX supports 4 Mbps of extra load RTT/2 = 500𝜇𝑠

RT-OPEX Challenges What to migrate? How to migrate? When to migrate?

RT-OPEX