HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com
From “Cool Vendors” Report – By Gartner (April 17,2012): Who is ETI ? From “Cool Vendors” Report – By Gartner (April 17,2012): [ ET International Newark, Delaware (www.etinternational.com) Analysis by Carl Claunch Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively. …] 1 minute Finish by 1 minute 15 seconds
Motivation Many-core is coming Hardware is getting more heterogeneous Current paradigms don't have the expressive power to harness concurrency Hardware is getting more heterogeneous Current hybrid programming techniques (OpenMP+MPI+OpenCL) are not maintainable: too complicated Caches are disappearing or becoming non-coherent Distributed memory is everywhere, and at different levels Fine grained power management Use what you need and turn off/down the rest Failure is the norm Resilience must be baked in the whole stack (application, compiler, runtime, hardware) Increasing Application Computation/data Irregularity Static scheduling can no longer properly load balance 1 minute Finish by 1 minute 15 seconds
We need new “Execution Models”! ETI Vision We need new “Execution Models”! Leverage ETI’s deep and growing IP position based on 25+ years of applied R&D expertise and $20M+ in R&D software engineering and development (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede, Intel X86 based machines, Adapteva, etc) Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including: Big Compute (Private HPC Cloud) systems Big Data HPC systems HPC embedded appliances etc 1 minute
Execution Paradigm Comparisons MPI, OpenMP, OpenCL SWARM Time Time Active threads Waiting 1.5 minutes Finish by 2 minute 45 seconds Communicating Sequential Processes Bulk Synchronous Message Passing Asynchronous Event-Driven Tasks Dependencies Resources Active Messages Control Migration
SWARM Execution Overview Enabled Tasks Tasks with Unsatisfied Dependencies Tasks enabled SWARM Tasks mapped to resources Dependencies satisfied Start at Enabled Tasks and work clockwise. Available Resources Resources in Use CPU GPU CPU Resources allocated GPU Resources released
Case Studies of Fine-Gran Execution Models Static Dataflow Model (1970s - ) EARTH Model (1988 - ) TNT Model and Cyclops-64 (2003 - ) Codelet Model under Intel-led DARPA/UHPC 11/19/2018 FT-06-09-2011-Gao
DARPA/Intel Runnemede Program 1000X Energy reduction Heterogeneous, Tightly-Coupled Simple Architecture System Management & Concurrency Assured Operation Event driven codelets Self-aware introspection Code and data motion <10% overhead Checkpoint with Flash/CPM Security Through Sandboxing CPU Resiliency Execution Model ET International, Inc. University of Illinois HW/SW Co-Design Interconnect Fabric Productivity Application Efficiency Data Movement Model-based Goal-oriented Self-morphing Heterogeneous & tapered Large local memory 30 seconds Memory Courtesy of The Intel DARPA UHPC Team 1000X energy reduction Overhauled DRAM mArch Resilient memory Our Collaborators
Progress & Proof Points To-Date
Barnes-Hut SWARM vs OpenMP Ideal SWARM OpenMP 30 seconds Barnes-Hut
SWARM/MPI Performance Comparison Consistent Speed-up from 2X to 14.5X 30 seconds
Cholesky Decomposition (SWARM vs MKL/ScaLAPACK) 30 seconds
Summary and Acknowledgements Summary (productivity observation) N-Body: 1 man-day, 3X G-500: 1 man-month, upto 14x Cholesky: 2 man-week, 1.5x NOTE: the base is performance of optimized code Acknowledgements Our Sponsors Our Collaborators and Colleagues My Host Others .
Cholesky Profiles SWARM OpenMP