OpenSoC Fabric An open source, parameterized, network generation tool CoDEx An open source, parameterized, network generation tool Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf SoC for HPC Programmatic Meeting August 25-26, 2014. Denver, CO.
1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5
Power: The New Design Constraint Trends beginning in 2004 are continuing… Power densities have ceased to increase No power efficiency increase with smaller transistors So, why aren’t we at exascale yet?
Power: The New Design Constraint On-chip parallelism increasing to maintain performance increases… We have come to the end of clock frequency scaling Moore’s Law is alive and well Now seeing core count increasing
Parallelism increasing NERSC Trends Franklin Hopper Edison Cori (NERSC 8) Core Count 4 24 48 (logical) >60 Clock Rate 2.3GHz 2.1GHz 2.4 GHz ~1.5GHz Memory 8GB 32GB 64GB 64-128GB +On package Peak Perf .352 PF 1.288 PF 2.57 PF > 3 TF Revisit of how SoC, increased parallelism in general is playing out in a real world system
Hierarchical Power Costs Data movement is the dominant power cost 120 pJ 2000 pJ 250 pJ ~2500 pJ 100 pJ 6 pJ Cost to move data off chip to a neighboring node Cost to move data off chip into DRAM Cost to move off-chip, but stay within the package (SMP) Cost to move data 20 mm on chip Typical cost of a single floating point operation Cost to move data 1 mm on-chip Let’s put aside cooling, building efficiency and focus on one part of data center power consumption – the processor. Data movement is the dominant power consumer, driving the benefit / need for SoCs
1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5
Current Hardware Challenges Embracing embedded designs Power is now limiting factor for leading-edge chips Moore’s Law continues… but now reliant to more exotic technologies such as 3D transistors, FinFET etc. Design Validation / Verification dominating development costs Solution: Smaller is Better Simpler, 5 to 9 stage pipeline cores Parallel is key to energy efficiency: CV2F Large arrays of small simple cores are easier to verify, more resilient Vibrant commodity market in IP components
Building an SoC from IP Logic Blocks It’s Legos with a some extra integration and verification cost Processor Core (ARM, Tensilica, MIPS deriv) With extra “options” like DP FPU, ECC OpenSoC Fabric (on-chip network) (currently proprietary ARM or Arteris) Mem Control DDR memory controller (Denali/Cadence, SiCreations) + Phy & Programmable PLL DRAM Mem Control Memory PCIe FLASH Control IB or GigE IB or GigE DRAM PCIe Gen3 Root complex IO Integrated FLASH Controller 10GigE or IB DDR 4x Channel
SoC – What’s out there? A few examples… AMD Opteron Intel Crystalwell CPU, Gfx and 128MB eDRAM Xeon Phi KNC, KNL Intel Integrated IO + Data Direct IO PCI Controller integrated on die 7W power savings by bypassing processor cache and memory ARM Apple A7 AMD Opteron CPU, PCIe + SATA Controller
SoC – What’s out there? Some common features Cost driven CPUs integrated with IO and Gfx to reduce BOM cost, decrease total PCB area Die power density / pin constraint driven Homogeneous cores Simple Networks Most SoCs rely on ring or cross-bar interconnect Increasing core count will drive need for more complex topologies KNL present in 2016 NERSC machine will have mesh
What Interconnect Provides the Best Power / Performance Ratio? What tools exist to answer this question?
SoC - Interconnect Examples Some common topologies
The Importance of Network Topology Network topology can greatly influence application performance An analysis of on-chip interconnection networks for large-scale chip multiprocessors ACM Transactions on computer architecture and code optimization (TACO), April 2010
The Importance of Networks Networks consume a large fraction of total chip power... A 5-GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro. 2007
What tools exist for SoC research What tools do we have to evaluate large, complex networks of cores? Software models Fast to create, but plagued by long runtimes as system size increases Hardware emulation Fast, accurate evaluate that scales with system size but suffers from long development time A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. FPGA 2008
Software Models C++ based on-chip network simulators Booksim Cycle-accurate Verified against RTL Few thousand cycles per second Garnet Event driven Simulation speed limits designs to 100’s of cores Booksim ISPASS 2013 GARNET ISPASS 2009
Hardware Models Stanford open- source NoC router HDL network generators and implementations Stanford open- source NoC router Verilog Precise but long simulation times Connect network generation Bluespec FPGA Optimized CONNECT: fast flexible FPGA-tuned networks-on-chip. CARL 2012
1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5
Chisel: A New Hardware DSL Using Scala to construct Verilog and C++ descriptions Chisel provides both software and hardware models from the same codebase Object-oriented hardware development Allows definition of structs and other high- level constructs Powerful libraries and components ready to use Working processors fabricated using chisel
Recent Chisel Designs Chisel created cores successfully boot Linux Clock test site SRAM test site DCDC test site Processor Site Raven core – 28nm
Chisel Overview Not “Scala to Gates” Describe hardware functionality How does Chisel work? Not “Scala to Gates” Describe hardware functionality Chisel creates graph representation Flattened Each node translated to Verilog or C++
Chisel Overview All bit widths are inferred Clock and reset implied How does Chisel work? All bit widths are inferred Clock and reset implied Multiple clock domains possible IOs grouped into convenient bundles
OpenSoC Fabric Part of the CoDEx tool suite, written in Chisel An open source, flexible, parameterized, NoC generator Part of the CoDEx tool suite, written in Chisel Dimensions, topology, VCs all configurable Fast functional C++ model for functional validation SystemC ready Verilog based description for FPGA or ASIC Synthesis path enables accurate power / energy modeling AXI Based endpoints Ready for ARM integration
OpenSoC Fabric An open source, flexible, parameterized, NoC generator
OpenSoC: Current Status Projected v1.0 release date of October 1st Available now: 2-D mesh or Flattened Butterfly network of arbitrary size Wormhole routing In Development Virtual Channels AXI Interface
1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5
OpenSoC Code Examples Walkthrough of Switch and Full Network Tester
1 2 3 4 5 Motivation State of the Art in SoC OpenSoC Overview Power, parallelism, and data movement drive the need for SoC 1 State of the Art in SoC What technology is being used to build SoCs? What can we leverage? 2 OpenSoC Overview An open source network generator using a new HDL - Chisel 3 OpenSoC Code Deep Dive and Demo OpenSoC module walk through and network output 4 Conclusion and Future Work New features for OpenSoC 5
Future additions Photonics and circuit switched networks Towards a full set of features Photonics and circuit switched networks Integrated NIC model More diverse topologies and routing functions Validation against RTL and other simulators Standardized (AXI) interfaces at the endpoints More powerful synthetic traffic and trace replay support Power modeling in the C++ model
Acknowledgements UCB Chisel US Dept of Energy Ke Wen Columbia LRL John Bachan Dan Burke BWRC
More Information http://opensocfabric.org