Fluid Software: Handling Heterogeneous Many-Core for Programmer Productivity Nate Clark.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
1 Presenter: Chien-Chih Chen. 2 Dynamic Scheduler for Multi-core Systems Analysis of The Linux 2.6 Kernel Scheduler Optimal Task Scheduler for Multi-core.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Structure-driven Optimizations for Amorphous Data-parallel Programs 1 Mario Méndez-Lojo 1 Donald Nguyen 1 Dimitrios Prountzos 1 Xin Sui 1 M. Amber Hassaan.
University of Colorado at Boulder Core Research Lab Tipp Moseley, Graham Price, Brian Bushnell, Manish Vachharajani, and Dirk Grunwald University of Colorado.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming Models and Paradigms
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
© imec 2001 ARRM’01, Oct.17 Managing dynamic concurrent tasks in real-time multi-media systems Francky Catthoor, IMEC, Belgium.
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
1 Reliable Adaptive Distributed Systems Armando Fox, Michael Jordan, Randy H. Katz, David Patterson, George Necula, Ion Stoica, Doug Tygar.
Presenter: Joshan V John Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan & Tien N. Nguyen Iowa State University, USA Instructor: Christoph Csallner 1 Joshan.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
EECE **** Embedded System Design
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Design Process for Architecture. Architectural Lifecycle Not all lifecycle plans support Architecture! It is hard to achieve architecture based design.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
May 16-18, Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
If You Like Money, General-Purpose Is for You Chris Hughes Parallel Computing Lab Intel.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Basic Memory Management 1. Readings r Silbershatz et al: chapters
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Chair MPSoC MPSoC Programming Solution “ CoreManager” hardware unit for:  Dependency checking  Task scheduling  Local memory management of PEs  C programmable.
Using Dynamic Compilers for Software Testing Ben Breech Lori Pollock John Cavazos.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan.
IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Concurrency and Performance Based on slides by Henri Casanova.
Optimizations for the Multi-Level Computing Architecture Presented by: Utku Aydonat Kirk Stewart Ahmed Abdelkhalek Ivan Matosevic Supervisor: Prof. Tarek.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Algorithms in Programming Computer Science Principles LO
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Computer Architecture: Parallel Task Assignment
15-740/ Computer Architecture Lecture 3: Performance
for the Offline and Computing groups
Parallel Programming By J. H. Wang May 2, 2017.
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Design Process for Architecture
New Scheduling Algorithms: Improving Fairness and Quality of Service
CS 584.
Parallel Programming in C with MPI and OpenMP
Coevolutionary Automated Software Correction
Presentation transcript:

Fluid Software: Handling Heterogeneous Many-Core for Programmer Productivity Nate Clark

2 Heterogeneous Many-Core Need more performance, have many transitors Power limited → Efficiency in designs Domain-specific design/many simpler cores

3 The Biggest Problem: Software Parallel programming is hard Heterogeneous programming is hard Forward compatibility Legacy applications I am a frustrated programmer

4 What Do We Want: Fluid Software Program adjusts to whatever system has –Many-core/accelerators/whatever Automatic, works on legacy code *.c Compiler *.exe Runtime Optimizer

5 What Does This RTO Need to Do? Task Decomposition –Break application into parallelizable pieces Task Mapping –Place them on a processor/accelerator Task Management –Evaluate solution and dynamically adjust

6 Task Decomposition Didn’t this fail in the 80’s? –Hard for programmer to reason about programs –Impossible for compiler Dynamic behavior easily predictable –Find probable data/pipeline parallelism *.exe

7 MPEG2 Decode Thies et al Decode Block SaturateIDCT predict and add block conv420 to422 conv422 to424 store ppm Example dynamically discovered task graph

8 Task Mapping Place each task on best processor –Predict most effective processor –Generate code (runtime/quality tradeoff) –Forward compatible Decode Block SaturateIDCT predict and add block conv420 to422 conv422 to424 store ppm FPGACPUGPU

9 Task Management Monitor and refine task mapping –What to do when new tasks appear –Understand what’s going on –Scalable control algorithm –Architectural support to help monitoring Decode Block SaturateIDCT predict and add block conv420 to422 conv422 to424 store ppm

10 Fluid Software System *.exe A BC D FPGACPUGPU Task Mapper Decompose Task Manager

11 Fluid Software Summary RTO adjusts software for any architecture –Task decomposition –Task mapping –Task management Provide feedback to help programmers write better code I’m a happy programme r