1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
University of Michigan Electrical Engineering and Computer Science 1 Application-Specific Processing on a General Purpose Core via Transparent Instruction.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
University of Michigan Electrical Engineering and Computer Science 1 Libra: Tailoring SIMD Execution using Heterogeneous Hardware and Dynamic Configurability.
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Wilson W. L. Fung Ivan Sham George Yuan Tor M. Aamodt Electrical and Computer Engineering.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
11 University of Michigan Electrical Engineering and Computer Science Exploring the Design Space of LUT-based Transparent Accelerators Sami Yehia *, Nathan.
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.
Chapter Hardwired vs Microprogrammed Control Multithreading
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
1 SODA: A Low-power Architecture For Software Radio Yuan Lin 1, Hyunseok Lee 1, Mark Woh 1, Yoav Harel 1, Scott Mahlke 1, Trevor.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture.
University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science High Performance.
University of Michigan Electrical Engineering and Computer Science 1 Resource Recycling: Putting Idle Resources to Work on a Composable Accelerator Yongjun.
11 1 The Next Generation Challenge for Software Defined Radio Mark Woh 1, Sangwon Seo 1, Hyunseok Lee 1, Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
A Scalable Low-power Architecture For Software Radio
University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,
11 1 SPEX: A Programming Language for Software Defined Radio Yuan Lin, Robert Mullenix, Mark Woh, Scott Mahlke, Trevor Mudge, Alastair Reid 1, and Krisztián.
Automobiles The Scale Vector-Thread Processor Modern embedded systems Multiple programming languages and models Multiple distinct memories Multiple communication.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
1 Embedded Computer System Laboratory RTOS Modeling in Electronic System Level Design.
11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
University of Michigan Electrical Engineering and Computer Science 1 Integrating Post-programmability Into the High-level Synthesis Equation* Scott Mahlke.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
University of Michigan Electrical Engineering and Computer Science 1 SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures Yongjun.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
11 1 AnySP: Anytime Anywhere Anyway Signal Processing Mark Woh 1, Sangwon Seo 1, Scott Mahlke 1,Trevor Mudge 1, Chaitali Chakrabarti 2, Krisztian Flautner.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
Physical Design of FabScalar Generated Superscalar Processors EE6052 Class Project Wei Zhang.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Chair MPSoC MPSoC Programming Solution “ CoreManager” hardware unit for:  Dependency checking  Task scheduling  Local memory management of PEs  C programmable.
WarpPool: Sharing Requests with Inter-Warp Coalescing for Throughput Processors John Kloosterman, Jonathan Beaumont, Mick Wollman, Ankit Sethia, Ron Dreslinski,
DSP base-station comparisons. Second generation (2G) wireless 2 nd generation: digital: last decade: 1990’s Voice and low bit-rate data –~14.4 – 28.8.
EKT303/4 Superscalar vs Super-pipelined.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath.
University of Michigan Electrical Engineering and Computer Science 1 Stream Compilation for Real-time Embedded Systems Yoonseo Choi, Yuan Lin, Nathan Chong.
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Presented by: Tim Olson, Architect
Department of Electrical & Computer Engineering
Coe818 Advanced Computer Architecture
Presentation transcript:

1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge Advanced Computer Architecture Laboratory University of Michigan at Ann Arbor Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Kriszti´an Flautner ARM, Ltd. Presenter: Wei Miao Jingcheng Wang

2 U NIVERSITY OF M ICHIGAN Overview  Introduction on SDR  Behavior model and Design tradeoff  Architecture analysis  Performance analysis  Summary

3 U NIVERSITY OF M ICHIGAN INTRODUCTION AND ANALYSIS Wei Miao

4 U NIVERSITY OF M ICHIGAN Basic introduction on SODA  Signal-processing On-Demand Architecture  Support software radio  4-core, containing asymmetric pipeline  Meet requirement of 2Mbps WCDMA/24Mbps a

5 U NIVERSITY OF M ICHIGAN Introduction on SDR  Software Defined Radio(SDR)  Decode different signals on a single processor

6 U NIVERSITY OF M ICHIGAN Why SDR?  Easy to implement & update  Multi-mode operation  Prototyping and bug fixes  Shorter time to develop (Picture From Lin, ISCA’06 slides)

7 U NIVERSITY OF M ICHIGAN Challenges of SDR  Need to achieve high throughput  Power limitation

8 U NIVERSITY OF M ICHIGAN Wireless protocols behavior  Feed-forward, multiple kernel  Low but heterogeneous requirement for inter-kernel communication  Real-time deadline  Heavy data parallelism  8-16 bits data width  Scalar vector operation

9 U NIVERSITY OF M ICHIGAN Design Tradeoff  Concurrent execution vs. Single Context execution  Static Multi-core Scheduling vs. Multi-threading  Vector vs. SIMD vs. VLIW

10 U NIVERSITY OF M ICHIGAN SODA ARCHITECTURE AND RESULTS Jingcheng Wang

11 U NIVERSITY OF M ICHIGAN  4 PEs  static kernel mapping and scheduling  SIMD+Scalar units  1 ARM GPP controller  scalar algorithms and protocol controls SODA System Architecture (From Lin, ISCA’06 slides)

12 U NIVERSITY OF M ICHIGAN SODA PE Architecture (From Lin, ISCA’06 slides)

13 U NIVERSITY OF M ICHIGAN SODA PE Scalar Pipeline (From Lin, ISCA’06 slides)

14 U NIVERSITY OF M ICHIGAN SODA PE SIMD Pipeline (From Lin, ISCA’06 slides)

15 U NIVERSITY OF M ICHIGAN SODA PE SIMD Pipeline (From Lin, ISCA’06 slides)

16 U NIVERSITY OF M ICHIGAN SODA PE SIMD Shuffle Network (From Lin, ISCA’06 slides)

17 U NIVERSITY OF M ICHIGAN W-CDMA Mapping On SODA (From Lin, ISCA’06 slides)

18 U NIVERSITY OF M ICHIGAN

19 U NIVERSITY OF M ICHIGAN SIMD Design and Tradeoffs  40GOPS required  In 4 PE system, 10 GOPS in each

20 U NIVERSITY OF M ICHIGAN Low-power Design  Clustered Register Files with 2 Read Ports and 1 Write Port  Fewer Memory Read/Write Ports  Smaller Instruction Fetch logic

21 U NIVERSITY OF M ICHIGAN Experiment Methodology  Area and power estimation calculated using RTL Verilog model  Synthesized using Synopsys Physical Compiler and TSMC 180nm Library  Memories generated by Artisan SRAM generator  Estimated 90nm and 65nm processes using a quadratic scaling factor  Dynamic power was estimated from behavior simulation on their system simulator  Leakage power was estimated at 30% of the total power

22 U NIVERSITY OF M ICHIGAN Performance results

23 U NIVERSITY OF M ICHIGAN Power Area result  Typical cellular phone power for physical layer ~ 200mW

24 U NIVERSITY OF M ICHIGAN Discussion Points  1. The author only synthesized the core in TSMC180nm and estimated the area and power of 90nm and 65nm. Is that fair to claim that the architecture meet the requirement?  The author claims that he reduces CDMA search algorithm from 26.5Gops in GP processor to 200Mops in SODA. And the main reason is due to SIMD execution. Is SIMD the only and main speedup factor? Is the novelty of paper enough?  2. Utilization of the 4 PEs are 60%, 50%, 100% and 94% respectively. Can it do better?

25 U NIVERSITY OF M ICHIGAN Reference   3&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F1089 9%2F34298%2F pdf%3Farnumber%3D &url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F1089 9%2F34298%2F pdf%3Farnumber%3D