WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications Presentation For IPDPS Conference 28 April 2004 Presentation For IPDPS Conference 28 April 2004
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 2 CS301 Up Close Multi-Threaded Array Processor 25.6 GFLOPS 3W worst-case, 2W typical 200MHz 64 PEs, 4 Kbytes each PE Array Control SRAM Bus ClearConnect bus 64-bit full duplex 1.6 Gbyte/s each direction 2x 0.8-Gbyte/s bridge ports Scratchpad memory 128 Kbytes of SRAM Availability Currently available
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 3 Multi-Threaded Array Processing Architecture Multi-threaded Array Processor Fully programmable in C Hardware multi-threading Extensible instruction set Scalable internal parallelism Array of Processing Elements (PEs) Compute, bandwidth scale together From 10s to 1,000s of PEs Built-in PE redundancy High performance, low power ~10 GFLOPS/Watt Multiple high speed I/O channels
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 4 Processing Elements PEs are highly optimised execution units: ALU, MAC, FPU High-bandwidth, multiport register file High bandwidth per PE DMA (PIO, SIO) Closely coupled SRAM for data 64 PEs at 200MHz 25.6 GFLOPS 51.2 Gbyte/s bandwidth to PE memory 12,800 MIPS Supports multiple data types: 8, 16, 24, 32-bit,... fixed-point arithmetic 32-bit IEEE floating-point arithmetic
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 5 ClearConnect TM High-Speed Bus Lanes from 25 to 100Gbit/s full duplex Packet switched architecture Scales to 4 lanes per bus Lane widths: 32 to 256-bit Distributed arbitration Low power Highly flexible
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 6 CS301 Up Close Multi-Threaded Array Processor 25.6 GFLOPS 3W worst-case, 2W typical 200MHz 64 PEs, 4 Kbytes each PE Array Control SRAM Bus ClearConnect bus 64-bit full duplex 1.6 Gbyte/s each direction 2x 0.8-Gbyte/s bridge ports Scratchpad memory 128 Kbytes of SRAM Availability Currently available
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 7 Off the shelf Products CS PE chip - 2W, 25 GFLOPS - Hardware Development Support Fully functional SDK - Application Support - Software Libraries Dual 64 PCI Development Board – 50 GFLOPS performance - Acceleration for clusters and HPC applications - Development environment for embedded applications - Growing catalog of software application libraries - Scalable with robust evolution path
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 8 Systems Integration Examples PC plug-in accelerator Coprocessors in a PC server* Coprocessors in a blade server* COTS hardware *Images courtesy of Angstrom Microsystems **Image courtesy of Office of Naval Research Silver Fox **Algorithmdevelopment for embedded applications
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 9 WorldScape’s Offering Chip Technology - 64 PE/256 PE… - customizable… Support Tools - SDK, VSIPL, PCA morphware… Board Level Integration - custom, I/O, i/f, … Application Integration - FFT, PC, HSI, SceneServer …