OpenSoC Fabric An open source, parameterized, network generation tool

Slides:



Advertisements
Similar presentations
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Advertisements

Using Virtual Platforms for Firmware Verification James Pangburn Jason Andrews.
CHEP 2012 Computing in High Energy and Nuclear Physics Forrest Norrod Vice President and General Manager, Servers.
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat,
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Some Thoughts on Technology and Strategies for Petaflops.
Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Configurable System-on-Chip: Xilinx EDK
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
SoC for HPC Workshop Overview John Shalf and James Ang David Donofrio & Farzad Fatolli Fard LBL/SNL Computer Architecture Laboratory DAC 2015 San Francisco,
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Area & Power Analysis Comparison Against P2P/Buses 4 4.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
Programmable Logic- How do they do that? 1/16/2015 Warren Miller Class 5: Software Tools and More 1.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
© Copyright Alvarion Ltd. Hardware Acceleration February 2006.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
CAD for Physical Design of VLSI Circuits
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam,
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Automated Design of Custom Architecture Tulika Mitra
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
J. Christiansen, CERN - EP/MIC
Lecture 1 1 Computer Systems Architecture Lecture 1: What is Computer Architecture?
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems Ted Huffmire, Brett Brotherton, Gang Wang, Timothy Sherwood, Ryan.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Reconfigurable Computing Aspects of the Cray XD1 Sandia National Laboratories / California Craig Ulmer Cray User Group (CUG 2005) May.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
An Improved “Soft” eFPGA Design and Implementation Strategy
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Real-Time System-On-A-Chip Emulation.  Introduction  Describing SOC Designs  System-Level Design Flow  SOC Implemantation Paths-Emulation and.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
An Overview CS341 Digital Logic and Computer Organization F2003.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Programmable Hardware: Hardware or Software?
Lynn Choi School of Electrical Engineering
NVIDIA’s Extreme-Scale Computing Project
System On Chip.
Architecture & Organization 1
FPGAs in AWS and First Use Cases, Kees Vissers
Unit 2 Computer Systems HND in Computing and Systems Development
Architecture & Organization 1
Challenges Implementing Complex Systems with FPGA Components
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Presentation transcript:

OpenSoC Fabric An open source, parameterized, network generation tool CoDEx An open source, parameterized, network generation tool Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf SoC for HPC Programmatic Meeting August 25-26, 2014. Denver, CO.

1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5

Power: The New Design Constraint Trends beginning in 2004 are continuing… Power densities have ceased to increase No power efficiency increase with smaller transistors So, why aren’t we at exascale yet?

Power: The New Design Constraint On-chip parallelism increasing to maintain performance increases… We have come to the end of clock frequency scaling Moore’s Law is alive and well Now seeing core count increasing

Parallelism increasing NERSC Trends Franklin Hopper Edison Cori (NERSC 8) Core Count 4 24 48 (logical) >60 Clock Rate 2.3GHz 2.1GHz 2.4 GHz ~1.5GHz Memory 8GB 32GB 64GB 64-128GB +On package Peak Perf .352 PF 1.288 PF 2.57 PF > 3 TF Revisit of how SoC, increased parallelism in general is playing out in a real world system

Hierarchical Power Costs Data movement is the dominant power cost 120 pJ 2000 pJ 250 pJ ~2500 pJ 100 pJ 6 pJ Cost to move data off chip to a neighboring node Cost to move data off chip into DRAM Cost to move off-chip, but stay within the package (SMP) Cost to move data 20 mm on chip Typical cost of a single floating point operation Cost to move data 1 mm on-chip Let’s put aside cooling, building efficiency and focus on one part of data center power consumption – the processor. Data movement is the dominant power consumer, driving the benefit / need for SoCs

1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5

Current Hardware Challenges Embracing embedded designs Power is now limiting factor for leading-edge chips Moore’s Law continues… but now reliant to more exotic technologies such as 3D transistors, FinFET etc. Design Validation / Verification dominating development costs Solution: Smaller is Better Simpler, 5 to 9 stage pipeline cores Parallel is key to energy efficiency: CV2F Large arrays of small simple cores are easier to verify, more resilient Vibrant commodity market in IP components

Building an SoC from IP Logic Blocks It’s Legos with a some extra integration and verification cost Processor Core (ARM, Tensilica, MIPS deriv) With extra “options” like DP FPU, ECC OpenSoC Fabric (on-chip network) (currently proprietary ARM or Arteris) Mem Control DDR memory controller (Denali/Cadence, SiCreations) + Phy & Programmable PLL DRAM Mem Control Memory PCIe FLASH Control IB or GigE IB or GigE DRAM PCIe Gen3 Root complex IO Integrated FLASH Controller 10GigE or IB DDR 4x Channel

SoC – What’s out there? A few examples… AMD Opteron Intel Crystalwell CPU, Gfx and 128MB eDRAM Xeon Phi KNC, KNL Intel Integrated IO + Data Direct IO PCI Controller integrated on die 7W power savings by bypassing processor cache and memory ARM Apple A7 AMD Opteron CPU, PCIe + SATA Controller

SoC – What’s out there? Some common features Cost driven CPUs integrated with IO and Gfx to reduce BOM cost, decrease total PCB area Die power density / pin constraint driven Homogeneous cores Simple Networks Most SoCs rely on ring or cross-bar interconnect Increasing core count will drive need for more complex topologies KNL present in 2016 NERSC machine will have mesh

What Interconnect Provides the Best Power / Performance Ratio? What tools exist to answer this question?

SoC - Interconnect Examples Some common topologies

The Importance of Network Topology Network topology can greatly influence application performance An analysis of on-chip interconnection networks for large-scale chip multiprocessors ACM Transactions on computer architecture and code optimization (TACO), April 2010

The Importance of Networks Networks consume a large fraction of total chip power... A 5-GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro. 2007

What tools exist for SoC research What tools do we have to evaluate large, complex networks of cores? Software models Fast to create, but plagued by long runtimes as system size increases Hardware emulation Fast, accurate evaluate that scales with system size but suffers from long development time A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. FPGA 2008

Software Models C++ based on-chip network simulators Booksim Cycle-accurate Verified against RTL Few thousand cycles per second Garnet Event driven Simulation speed limits designs to 100’s of cores Booksim ISPASS 2013 GARNET ISPASS 2009

Hardware Models Stanford open- source NoC router HDL network generators and implementations Stanford open- source NoC router Verilog Precise but long simulation times Connect network generation Bluespec FPGA Optimized CONNECT: fast flexible FPGA-tuned networks-on-chip. CARL 2012

1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5

Chisel: A New Hardware DSL Using Scala to construct Verilog and C++ descriptions Chisel provides both software and hardware models from the same codebase Object-oriented hardware development Allows definition of structs and other high- level constructs Powerful libraries and components ready to use Working processors fabricated using chisel

Recent Chisel Designs Chisel created cores successfully boot Linux Clock test site SRAM test site DCDC test site Processor Site Raven core – 28nm

Chisel Overview Not “Scala to Gates” Describe hardware functionality How does Chisel work? Not “Scala to Gates” Describe hardware functionality Chisel creates graph representation Flattened Each node translated to Verilog or C++

Chisel Overview All bit widths are inferred Clock and reset implied How does Chisel work? All bit widths are inferred Clock and reset implied Multiple clock domains possible IOs grouped into convenient bundles

OpenSoC Fabric Part of the CoDEx tool suite, written in Chisel An open source, flexible, parameterized, NoC generator Part of the CoDEx tool suite, written in Chisel Dimensions, topology, VCs all configurable Fast functional C++ model for functional validation SystemC ready Verilog based description for FPGA or ASIC Synthesis path enables accurate power / energy modeling AXI Based endpoints Ready for ARM integration

OpenSoC Fabric An open source, flexible, parameterized, NoC generator

OpenSoC: Current Status Projected v1.0 release date of October 1st Available now: 2-D mesh or Flattened Butterfly network of arbitrary size Wormhole routing In Development Virtual Channels AXI Interface

1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5

OpenSoC Code Examples Walkthrough of Switch and Full Network Tester

1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5

Future additions Photonics and circuit switched networks Towards a full set of features Photonics and circuit switched networks Integrated NIC model More diverse topologies and routing functions Validation against RTL and other simulators Standardized (AXI) interfaces at the endpoints More powerful synthetic traffic and trace replay support Power modeling in the C++ model

Acknowledgements UCB Chisel US Dept of Energy Ke Wen Columbia LRL John Bachan Dan Burke BWRC

More Information http://opensocfabric.org