1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant.

Slides:



Advertisements
Similar presentations
CCNA3: Switching Basics and Intermediate Routing v3.0 CISCO NETWORKING ACADEMY PROGRAM Switching Concepts Introduction to Ethernet/802.3 LANs Introduction.
Advertisements

VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
Overview and Basics of Software Defined Radios INSTRUCTOR: Dr. George Collins By Praveen Kumar Chukka
01/10/2013 Ebro Observatory, October 1st, 2013 New Technology involved in SWING: Software Radio and HF Links A.L. Saverino A.Capria, F.Berizzi, M. Martorella,
Software Defined Radio Testbed Team may11-18 Members: Alex Dolan, Mohammad Khan, Ahmet Unsal Adviser: Dr. Aditya Ramamoorthy.
GNU Radio A Free Software Defined Radio Eric Blossom Blossom Research Lighthouse Ave., Suite 109 Monterey, CA USA.
ECE 699: Lecture 1 Introduction to Zynq.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Software Defined Radio
Integrated  -Wireless Communication Platform Jason Hill.
Configurable System-on-Chip: Xilinx EDK
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Accelerating IEC Packet Processing and Networking
Intel ® Research mote Ralph Kling Intel Corporation Research Santa Clara, CA.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
1 Summary of SDR Analog radio systems are being replaced by digital radio systems for various radio applications. SDR technology aims to take advantage.
RaPTEX: Rapid Prototyping of Embedded Communication Systems Dr. Alex Dean & Dr. Mihai Sichitiu (ECE) Dr. Tom Wolcott (MEAS) Motivation  Existing work.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Anthony Gaught Advisors: Dr. In Soo Ahn and Dr. Yufeng Lu Department of Electrical and Computer Engineering Bradley University, Peoria, Illinois May 7,
SeaSolve Software Inc.,
Doc.: IEEE /211r2 Submission September, 2000 Jeyhan Karaoguz, Broadcom CorporationSlide 1 Project: IEEE P Working Group for Wireless Personal.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
The GNU in RADIO Shravan Rayanchu. SDR Getting the code close to the antenna –Software defines the waveform –Replace analog signal processing with Digital.
Automated Design of Custom Architecture Tulika Mitra
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Developing a SDR Testbed Alex Dolan Mohammad Khan Ahmet Unsal Project Advisor Dr. Aditya Ramamoorthy.
The Case of Software Defined Radio with MSRA (work-in-progress) Yongguang Zhang with Kun Tan, Fan Yang, Jiansong Zhang, Haitao Wu, Chunyi Peng, Songwu.
High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
MAC Protocols In Sensor Networks.  MAC allows multiple users to share a common channel.  Conflict-free protocols ensure successful transmission. Channel.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Hardware-software Interface Xiaofeng Fan
PADS Power Aware Distributed Systems Architecture Approaches USC Information Sciences Institute Brian Schott, Bob Parker UCLA Mani Srivastava Rockwell.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
LEBÉE Marie-Hélène PERALTA Philippe A1B IEEE j standard.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
John Ankcorn Networks and Mobile Systems Group MIT LCS Software Technologies for Wireless Communication and Multimedia.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
June, 1999©Vanu, Inc. Vanu Bose Vanu, Inc. Programming the Physical Layer in Wireless Networks.
Introduction or Background
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Software defined radio (SDR) requires deep knowledge of the operating environment and coding. A bi-directional transceiver in MATLAB that allows automated.
Implementing a MATLAB-based Self-Configurable Software Defined Radio Transceiver Presenter: Kaushik Chowdhury Next GEneration NEtworks and SYStems Lab.
PADS Power Aware Distributed Systems Architecture Approaches – Deployable Platforms & Reconfigurable Power-aware Comm. USC Information Sciences Institute.
ECE 699: Lecture 2 Introduction to Zynq.
S , Postgraduate Course in Radio Communications
Introduction to OFDM and Cyclic prefix
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective.
What is CRKIT Framework ? Baseband Processor :  FPGA-based off-the-shelf board  Control up to 4 full-duplex wideband radios  FPGA-based System-on-Chip.
1 Modeling Considerations for the Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko, Matthew Zimmermann, Tuan Dao,
High-Level Hardware-Software Co-design of an 802
Dynamo: A Runtime Codesign Environment
ENG3050 Embedded Reconfigurable Computing Systems
FPGAs in AWS and First Use Cases, Kees Vissers
DETAILED SYSTEM DESIGN
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
Strawmodel ac Specification Framework
System View Inc..
Presentation transcript:

1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant & Ph.D. Candidate Advisors: Prof. Leeser (RCL) & Prof. Chowdhury (GENESYS) Northeastern University, Boston, MA Northeastern ECE Ph.D. Student Seminar Series (NEPSSS) March 30, 2016 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant & Ph.D. Candidate Advisors: Prof. Leeser (RCL) & Prof. Chowdhury (GENESYS) Northeastern University, Boston, MA Northeastern ECE Ph.D. Student Seminar Series (NEPSSS) March 30, 2016

2 So What-Who Cares? Wireless Transceivers: Y’all Got ‘em! Surge in wireless devices 10B devices today, 50B by 2050 $14 trillion business over next 10 years Challenges: Times are changing C1: Adapt to changing protocols to handle contention C2: Maintain/increase bit rates C3: Decrease energy consumption and error rates LTEWi-Fi

3 Another Challenge: Spectrum Scarcity C4: Change center frequency to use new bandwidths MHz: af TV Whitespace Reuse GHz: Military RADAR Reuse 2.4, 5.8 GHz: a/b Designated ISM Bands

4 Modeling Environment Barriers: Why Making Such Wireless Transceivers Is Hard Comms protocols evolve; transceiver HW/SW must evolve too! SW f c =5.8 GHz f s =20 Msps HW B2: HW & SW must be reconfigurable B4: Map behaviors to HW or SW B3: Each processing block (PB) must be same on HW&SW B1: HW-SW modeling environment y=FFT(x) module FFT(x,y) function FFT(x,y) == ProcBlk3ProcBlk1 ProcBlk2 Effective Bus FPGA

5 (B1)Signal Processing for Wireless Comms Understanding (B3)Implementation Portability: equivalent functionality on HW&SW (C2)Time Synchronization (C3)Low Energy/Error Rate (C4)Spectrum scarcity: Changing bandwidths (B1)HW & SW Joint Modeling Environment (B2)SW Control of Radio Parameters Enabling Technologies System goals and challenges Fundamental Research (C1)Adapt to changing protocols, contention (B2)Partially Reconfig- urable HW (FPGA) (B1)Time & Energy Optimization Techniques (B4)Identify ideal mapping of wireless behaviors to HW & SW Top Down-Bottom Up Approach to Modeling Wireless Transceivers for Protocol Coexistence R1 R2R3 R4 T1 T2T3

6 What Does a HW-SW Modeling Environment Need? 1.Provide a HW-SW Prototyping Platform 1.Modeling for Wireless Processing Blocks (PBs) 2.Hardware (HW) Components 3.Software (SW) Tools 2.Model a HW-SW Divide Point 3.Enact HW-SW Interfacing 4.Exhibit Reusability & Adaptability to Modern Standards What Does a HW-SW Modeling Environment Need? 1.Provide a HW-SW Prototyping Platform 1.Modeling for Wireless Processing Blocks (PBs) 2.Hardware (HW) Components 3.Software (SW) Tools 2.Model a HW-SW Divide Point 3.Enact HW-SW Interfacing 4.Exhibit Reusability & Adaptability to Modern Standards T2: HW & SW Joint Modeling Environment: Testbed Requirements

7 HW-SW Prototyping Platform: Modeling for Wireless Processing Blocks PBTransmitter (Tx)Receiver (Rx) 1ScramblingPreamble Detection 2Convolutional CodingOFDM Demodulation 3Block InterleavingBPSK Demodulation 4BPSK ModulationBlock De-interleaving 5OFDM ModulationViterbi Decoding 6Preamble InsertionDescrambling Tx: Data BitsTx: Samples Rx: Samples Rx: Data Bits Simulink Model for Tx or Rx path Simulink: Design Synchronous Dataflow (SDF) Models Integrated Profiling: Look at Entire a PHY Layer Processing Chain

8 HW-SW Prototyping Platform: Hardware Components: Xilinx Zynq Zynq-Based Heterogeneous Computing System Zynq-7000 series System-on-Chip (SoC) Processing System: ARM Cortex-A9 CPU Programmable Logic: FPGA with DSPs & BRAM We prototype on 2 varieties: ZC706 & Zedboard FPGA Zynq SoC CPU FPGA Zynq SoC CPU FPGA Zynq SoC CPU

9 JTAG (to FPGA) HW-SW Prototyping Platform: Hardware Components Host PC: Runs SW Tools RF Front End: ADI FMComms3 FPGA Zynq SoC CPU Receive Path Transmit Path Ethernet (to CPU) 2Tx 2Rx AD9361 FMC Slot Zynq-Based Heterogeneous Computing System Radio Frequency (RF) Front End Host Personal Computer (PC) Zynq-Based Heterogeneous Computing System

10 HW-SW Prototyping Platform: Software Tools FPGA Zynq SoC CPU Receive Path Transmit Path JTAG (to FPGA) Ethernet (to CPU) MathWorks Simulink™ Model HDL Code Xilinx Vivado ® C Code ARM Executable FPGA Bitstream Embedded Coder™ HDL Coder™ Zynq-Based Heterogeneous Computing System Host PC: Runs SW Tools Embedded Coder: Generate C code for ARM Processor HDL Coder: Create HW Description Language (HDL) code Vivado: Synthesize, Implement, and Generate FPGA Bitstream

11 Modeling a HW-SW Divide Point FPGA Zynq SoC CPU Receive Path Transmit Path V1 SW HW V2 SW HW V3 SW HW V4 SW HW V5 SW HW V6 SW HW V7 SW HW V1: SW-only model V2: Adds Tx F6 & Rx F1 to HW V3: Adds Tx F5 & Rx F2 to HW V4: Adds Tx F4 & Rx F3 to HW V5: Adds Tx F3 & Rx F4 to HW V6: Adds Tx F2 & Rx F5 to HW V7: HW-only model Zynq-Based Heterogeneous Computing System

12 Advanced eXtensible Interface (AXI): Bus to Connect CPU & FPGA Direct Memory Access (DMA): To Hold Data Sent b/w CPU & FPGA First-In First-Out (FIFO): Queue to Buffer Bits in Transit HW-SW Interfacing: Bus Details Note: the data to transfer between CPU & FPGA has a different size and class for each model variant! FPGA Zynq SoC CPU Receive Path Transmit Path Tx 2Rx DAC: I 1,2, Q 1,2 AD9361 ADC: I 1,2, Q 1,2 AXI DMA Controller FIFOunpack FIFOslice FIFOconcat FIFOpack RF Front End: ADI FMComms3 Zynq-Based Heterogeneous Computing System

13 HW-SW Interfacing: Data Transfer Types & Sizes Data to SendData TypeSize of 1#Elements V1SamplesSigned Fixed Point16 bits80 V2SamplesSigned Fixed Point16 bits64 V3SymbolsSigned Integer1-8 bits64 V4Coded BitsBoolean1 bit48 V5Coded BitsBoolean1 bit48 V6Data BitsBoolean1 bit24 V7Data BitsBoolean1 bit24  Before sending data between CPU & FPGA, we translate to a 32-bit unsigned integer format for transfer on AXI interconnect  We build a library of bundling blocks to facilitate this transfer

14 Results: CPU Execution Time: Transmitter on Zynq  Moving one processing block from SW to HW does not necessarily cause speedup  Increase in Tx frame time on ZC706 from V1 to V2 is proof  V1 is SW-only, requires no AXI communication  Keeps all operations in SW  V2 adds small component to HW  Time saved < time spent on CPU-FPGA data transfer  Our modeling environment can identify location at which HW-SW interface is best placed

15 Results: CPU Execution Time: Receiver on ZC706  Rx maximum CPU frame time decreases as more blocks are moved onto the FPGA  Preamble detection is revealed to be the biggest bottleneck in the Rx model  Moving it in V2 results in the largest drop in frame time  Also drops with FFT in V3 & Viterbi Decoder in V6  Moving Descrambler in V7 does not show decrease, suggesting we can put it in SW  Rx maximum CPU frame time decreases as more blocks are moved onto the FPGA  Preamble detection is revealed to be the biggest bottleneck in the Rx model  Moving it in V2 results in the largest drop in frame time  Also drops with FFT in V3 & Viterbi Decoder in V6  Moving Descrambler in V7 does not show decrease, suggesting we can put it in SW

16 Results: FPGA Resource Utilization and Power Usage PBTxRx Transmitter Res Util Receiver Res Util Power

17 Variants of Processing Blocks: Preamble Detection MF VariantDefaultHDL LongHDL Training Data Path Delay (ns) % LUTs % Registers % DSPs Total Power (W)  Block uses a matched filter to correlate 2 frames with a fixed set of coefficients  1 st MF manually assembled from adders & multipliers  Not ideal: uses 99% of DSPs  2 nd MF correlates with full long preamble  But long preamble composed of repetitions of training seq  3 rd MF correlates with only the training sequence  2.38X reduction in path delay  1.12X reduction in power

18 Variants of Processing Blocks: Viterbi Decoder VD VariantDelay- Based BRAM- Based Data Path Delay (ns) % LUTs % Registers BRAM Tiles02 Total Power (W) 2.36 VD Power (W)  Block reverses effects of Convolutional Encoder  Requires memory to hold intermediate state values  1 st VD uses delay blocks to hold state memory  Exhibits lower path delay  2 nd VD uses BRAM tiles to hold state memory  Uses fewer LUTs and registers  Slightly lower power  Illustrates tradeoff between time and power  Can dynamically tune design to target either objective

19 Reusability & Adaptability to Modern Wireless Standards Processing Block802.11aWi-Fi (802.11g)Mobile (LTE) 1.Scrambling(1) 2.Convolutional Coding(1) 3.PSK Modulation(B)(DB)(Q) 4.Block Interleaving(1) 5.OFDM(1) (DL, ) 6.Preamble Insert/Detect(1)(2) (1): Equivalent, Reusable (2): Not Yet Implemented, but a variant can be reused

20 Variants of Processing Blocks: OFDM IFFT IFFT Size Data Path Delay (ns) % LUTs % Registers % DSPs Total Power (W)  In LTE, OFDM modulation uses different IFFT sizes to spread symbols onto a larger number of subcarriers  We vary the IFFT sizes to identify its impact on FPGA metrics  Delay, resources, and power rises for higher IFFT sizes  Limiting factors: #LUTs for multiple IFFTs on FPGA

21 Conclusions  Introduces a method for modeling HW-SW co-designs for wireless transceivers  Enables profiling of all processing blocks  Identifies bottlenecks such as preamble detection  Explores various HW-SW divide points  Identifies which model variants are most desirable  Details interfacing needed at divide point  Shows when variants use more power from data transfer  Shows added FPGA power is a fraction of CPU power  Improves Preamble detection by fewer MF coefficients  Customizes Viterbi decoder to use different resources  Introduces a method for modeling HW-SW co-designs for wireless transceivers  Enables profiling of all processing blocks  Identifies bottlenecks such as preamble detection  Explores various HW-SW divide points  Identifies which model variants are most desirable  Details interfacing needed at divide point  Shows when variants use more power from data transfer  Shows added FPGA power is a fraction of CPU power  Improves Preamble detection by fewer MF coefficients  Customizes Viterbi decoder to use different resources

22 Future Work  Perform live tests with online radio transmissions  Measure link latency and error rates  Develop rules to automate HW-SW co-designs  Make decisions about HW-SW divide point  Automate bundling for data transfer between HW & SW  Switch out platform to test newest HW  Altera Arria 10®  Xilinx Ultrascale+ MPSoC  Explore co-existence with modern protocols ( & LTE)  OFDM IFFT study is first look at this  Perform live tests with online radio transmissions  Measure link latency and error rates  Develop rules to automate HW-SW co-designs  Make decisions about HW-SW divide point  Automate bundling for data transfer between HW & SW  Switch out platform to test newest HW  Altera Arria 10®  Xilinx Ultrascale+ MPSoC  Explore co-existence with modern protocols ( & LTE)  OFDM IFFT study is first look at this

23 Publications & Acknowledgments  Extended Abstracts & Posters:  BARC 2016, Boston, MA, January 29,  IEEE INFOCOM 2016, San Francisco, CA, April 11-14,  Submitted, Pending:  IEEE Transactions on Emerging Topics in Computing, Special Issue on Next Generation Wireless Computing Systems. Submitted Mar 1,  IEEE Field Programmable Logic & Applications (FPL) Submitted Mar 27,  Plans:  ACM Wireless Network Testbeds, Experimental evaluation, and Characterization (WiNTECH) 2016, October 3,  Acknowledgments:  Extended Abstracts & Posters:  BARC 2016, Boston, MA, January 29,  IEEE INFOCOM 2016, San Francisco, CA, April 11-14,  Submitted, Pending:  IEEE Transactions on Emerging Topics in Computing, Special Issue on Next Generation Wireless Computing Systems. Submitted Mar 1,  IEEE Field Programmable Logic & Applications (FPL) Submitted Mar 27,  Plans:  ACM Wireless Network Testbeds, Experimental evaluation, and Characterization (WiNTECH) 2016, October 3,  Acknowledgments:

24 References [1] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band, IEEE Std a-1999, [2] J. Pendlum, M. Leeser, and K. Chowdhury, “Reducing processing latency with a heterogeneous fpga-processor framework,” in 22 nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, IEEE Computer Society, 2014, pp. 17–20. [Online]. Available: [3] B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, Cognitive Radio Oriented Wireless Networks: 10th International Conference, CROWNCOM 2015, Doha, Qatar, April 21-23, 2015, Revised Selected Papers. Cham: Springer International Publishing, 2015, ch. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver, pp. 164–175. [Online]. Available: http: //dx.doi.org/ / [4] National Instruments, Inc. (2016) Real-time lte/wi-fi coexistence testbed. [Online]. Available: [5] MathWorks, Inc. (2016) Zynq sdr support from communications system toolbox. [Online]. Available: [6] Xilinx, Inc. (2016) Vivado design suite - hlx editions. [Online]. Available: [7] Analog Devices, Inc. (2015) Integrated transceivers, transmitters, and receivers. [Online]. Available: transceivers-transmitters-receivers.html [1] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band, IEEE Std a-1999, [2] J. Pendlum, M. Leeser, and K. Chowdhury, “Reducing processing latency with a heterogeneous fpga-processor framework,” in 22 nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, IEEE Computer Society, 2014, pp. 17–20. [Online]. Available: [3] B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, Cognitive Radio Oriented Wireless Networks: 10th International Conference, CROWNCOM 2015, Doha, Qatar, April 21-23, 2015, Revised Selected Papers. Cham: Springer International Publishing, 2015, ch. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver, pp. 164–175. [Online]. Available: http: //dx.doi.org/ / [4] National Instruments, Inc. (2016) Real-time lte/wi-fi coexistence testbed. [Online]. Available: [5] MathWorks, Inc. (2016) Zynq sdr support from communications system toolbox. [Online]. Available: [6] Xilinx, Inc. (2016) Vivado design suite - hlx editions. [Online]. Available: [7] Analog Devices, Inc. (2015) Integrated transceivers, transmitters, and receivers. [Online]. Available: transceivers-transmitters-receivers.html