HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International.

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
Distributed Systems CS
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
DEPARTMENT OF COMPUTER LOUISIANA STATE UNIVERSITY Models without Borders Thomas Sterling Arnaud & Edwards Professor, Department of Computer Science.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
“This research was, in part, funded by the U.S. Government. The views and conclusions contained in this document are those of the authors and should not.
Presented by Rengan Xu LCPC /16/2014
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Projects Using gem5 ParaDIME (2012 – 2015) RoMoL (2013 – 2018)
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Advanced / Other Programming Models Sathish Vadhiyar.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Cloud Age Time to change the programming paradigm?
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Programmability Hiroshi Nakashima Thomas Sterling.
Contemporary Languages in Parallel Computing Raymond Hummel.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Martin Kruliš by Martin Kruliš (v1.1)1.
Embedded Systems Software Engineering
Productive Performance Tools for Heterogeneous Parallel Computing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
For Massively Parallel Computation The Chaotic State of the Art
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Models and Languages for Parallel Computation
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Task Scheduling for Multicore CPUs and NUMA Systems
ICT NCP Infoday Brussels, 23 June 2010
Parallel Algorithm Design
Multi-Processing in High Performance Computer Architecture:
Collaborative Computing for Heterogeneous Integrated Systems
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Multi-Processing in High Performance Computer Architecture:
Toward a Unified HPC and Big Data Runtime
Parallel and Multiprocessor Architectures – Shared Memory
Summary Background Introduction in algorithms and applications
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Architectures of distributed systems Fundamental Models
Alternative Processor Panel Results 2008
Chapter 1 Introduction.
Architectures of distributed systems Fundamental Models
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
COMP60621 Fundamentals of Parallel and Distributed Systems
HPC User Forum 09/2012 Panel on Emerging Potential Disruptive Technologies Emerging HPC System Software Solution for BigComput/BigData.
Introduction to Heterogeneous Parallel Computing
Architectures of distributed systems Fundamental Models
HPC User Forum: Back-End Compiler Technology Panel
Chapter 01: Introduction
Operating System Overview
COMP60611 Fundamentals of Parallel and Distributed Systems
Support for Adaptivity in ARMCI Using Migratable Objects
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International Inc Newark, Delaware USA ggao@etinternational.com

From “Cool Vendors” Report – By Gartner (April 17,2012): Who is ETI ? From “Cool Vendors” Report – By Gartner (April 17,2012): [ ET International Newark, Delaware (www.etinternational.com) Analysis by Carl Claunch Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively. …] 1 minute Finish by 1 minute 15 seconds

Motivation Many-core is coming Hardware is getting more heterogeneous Current paradigms don't have the expressive power to harness concurrency Hardware is getting more heterogeneous Current hybrid programming techniques (OpenMP+MPI+OpenCL) are not maintainable: too complicated Caches are disappearing or becoming non-coherent Distributed memory is everywhere, and at different levels Fine grained power management Use what you need and turn off/down the rest Failure is the norm Resilience must be baked in the whole stack (application, compiler, runtime, hardware) Increasing Application Computation/data Irregularity Static scheduling can no longer properly load balance 1 minute Finish by 1 minute 15 seconds

We need new “Execution Models”! ETI Vision We need new “Execution Models”! Leverage ETI’s deep and growing IP position based on 25+ years of applied R&D expertise and $20M+ in R&D software engineering and development (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede, Intel X86 based machines, Adapteva, etc) Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including: Big Compute (Private HPC Cloud) systems Big Data HPC systems HPC embedded appliances etc 1 minute

Execution Paradigm Comparisons MPI, OpenMP, OpenCL SWARM Time Time Active threads Waiting 1.5 minutes Finish by 2 minute 45 seconds Communicating Sequential Processes Bulk Synchronous Message Passing Asynchronous Event-Driven Tasks Dependencies Resources Active Messages Control Migration

SWARM Execution Overview Enabled Tasks Tasks with Unsatisfied Dependencies Tasks enabled SWARM Tasks mapped to resources Dependencies satisfied Start at Enabled Tasks and work clockwise. Available Resources Resources in Use CPU GPU CPU Resources allocated GPU Resources released

Case Studies of Fine-Gran Execution Models Static Dataflow Model (1970s - ) EARTH Model (1988 - ) TNT Model and Cyclops-64 (2003 - ) Codelet Model under Intel-led DARPA/UHPC 11/19/2018 FT-06-09-2011-Gao

DARPA/Intel Runnemede Program 1000X Energy reduction Heterogeneous, Tightly-Coupled Simple Architecture System Management & Concurrency Assured Operation Event driven codelets Self-aware introspection Code and data motion <10% overhead Checkpoint with Flash/CPM Security Through Sandboxing CPU Resiliency Execution Model ET International, Inc. University of Illinois HW/SW Co-Design Interconnect Fabric Productivity Application Efficiency Data Movement Model-based Goal-oriented Self-morphing Heterogeneous & tapered Large local memory 30 seconds Memory Courtesy of The Intel DARPA UHPC Team 1000X energy reduction Overhauled DRAM mArch Resilient memory Our Collaborators

Progress & Proof Points To-Date

Barnes-Hut SWARM vs OpenMP Ideal SWARM OpenMP 30 seconds Barnes-Hut

SWARM/MPI Performance Comparison Consistent Speed-up from 2X to 14.5X 30 seconds

Cholesky Decomposition (SWARM vs MKL/ScaLAPACK) 30 seconds

Summary and Acknowledgements Summary (productivity observation) N-Body: 1 man-day, 3X G-500: 1 man-month, upto 14x Cholesky: 2 man-week, 1.5x NOTE: the base is performance of optimized code Acknowledgements Our Sponsors Our Collaborators and Colleagues My Host Others .

Cholesky Profiles SWARM OpenMP