Parallel Portability and Heterogeneous programming Stefan Möhl, Co-founder, CSO, Mitrionics.

Slides:



Advertisements
Similar presentations
GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Parallelizing Audio Feature Extraction Using an Automatically-Partitioned Streaming Dataflow Language Eric Battenberg Mark Murphy CS 267, Spring 2008.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
What is next for accelerators? Turf war or collaboration? Stefan Möhl, Co-Founder, Chief Strategy Officer, Mitrionics.
1 Breakout thoughts (compiled with N. Carter): Where will RAMP be in 3-5 Years (What is RAMP, where is it going?) Is it still RAMP if it is mapping onto.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Implications for Programming Models Todd C. Mowry CS 495 September 12, 2002.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
LOGO “ Add your company slogan ” Comparative analysis of High Level Programming for Reconfigurable Computers: Methodology and Empirical Study Wen-qian.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware presented by Tianyuan Chen.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
© 2010 IBM Corporation IBM InfoSphere Streams Enabling a smarter planet Roger Rea InfoSphere Streams Product Manager Sept 15, 2010.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 Summary of technical.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
GPU Architecture and Programming
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Module 4 Part 2 Introduction To Software Development : Programming & Languages Introduction To Software Development : Programming & Languages.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
CS-303 Introduction to Programming
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Memory-Aware Compilation Philip Sweany 10/20/2011.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
My Coordinates Office EM G.27 contact time:
Background Computer System Architectures Computer System Software.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Trends in Multicore Architecture.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura.
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Processing
Mitrion-C Currently a programming language for FPGA accelerators
SOFTWARE DESIGN AND ARCHITECTURE
Introduction
Parallel Processing - introduction
Real-Time Ray Tracing Stefan Popov.
Many-core Software Development Platforms
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
Summary Background Introduction in algorithms and applications
Introduction to Heterogeneous Parallel Computing
(Computer fundamental Lab)
Multicore and GPU Programming
Multicore and GPU Programming
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Parallel Portability and Heterogeneous programming Stefan Möhl, Co-founder, CSO, Mitrionics

, by Stefan Möhl 2 The Mitrion Virtual Processor Massively fine-grain MIMD parallel –Instruction level parallel –10’s of thousands of PEs A configurable processor design –The processing elements suitable for the application –Processing elements adapted to bit-width –Ad-hoc fused instructions Allows software programming for FPGAs –You program the MVP rather than design HW –Programmed in inherently parallel programming language Mitrion-C int:48 main() { int:48 prev = 1; int:48 fib = 1; int:48 fibonnacci = for(i in ) { fib = fib+prev; prev = fib; } ><fib; } fibonnacci;

, by Stefan Möhl 3 The Mitrion Platform

, by Stefan Möhl 4 An experiment at Mitrionics Allow Mitrion-C code to run efficiently across a Hybrid machine –FPGA –Multi-Core/Many-Core –Clusters –And more to follow (GPU, vector and others) Letting FPGA code out of the FPGA box –The FPGA acceleration market suffers from users afraid of locking their code into FPGA acceleration technology

Many core will require code re-writes

, by Stefan Möhl 6 Code Re-Writes have happened many times… Scalar processors Vector processors Thinking Machines and MasPar SMPs/ccNUMA MPPs Clusters Grids FPGAs and GPGPUs Many-core processors

, by Stefan Möhl 7 A new take on the problem Running sequential code in parallel is hard Running parallel code sequentially is easy Automatically parallelizing for the different paradigms was not possible –Code re-writes Automatically sequentializing for different parallel paradigms is comparatively easy –Less, or maybe even No Code Re-Writes

, by Stefan Möhl 8 Some properties of Mitrion-C Fine-grain (instruction-level) MIMD parallel –The MVP requires fine-grain MIMD for performance –A limit-case of parallelism –Currently scales to 10’s of thousands of PEs in the MVP, no scalability limit yet visible. Memory and bandwidth are taken seriously –FPGAs typically sit on data buses made for 1/10 the performance Though Imperative style syntax, semantics are purely Functional –Very amenable to compiler optimizations and theoretical treatment

, by Stefan Möhl 9 Mitrion-C takes Memory, Locality and Bandwidth seriously Memory –No illusion of single address space (we haven’t had that since the introduction of caches 20 years ago) –Different memory spaces for different bandwidth and latency requirements –Careful control of memory usage (the MVP has limited access to fast memory) Bandwidth –Separate memory spaces implies separate data buses. This allows programming for bandwidth –Streaming data types that help conserve bandwidth –Highlighting of bandwidth consumption in debugger Parallelism allows latency to be hidden

, by Stefan Möhl 10 Does it have to be a new language? Not completely, Mitrion-C is a C-family language –Syntax is as similar to C as Java is similar to C »But with inherently parallel language semantics –A pragmatic language with five years field-experience Learning a new language isn’t hard –Most programmers know 10+ different languages Designing parallel algorithms is hard –Figuring out how to make a sequential algorithm parallel A language should support the hard task –Do for Parallelism what Object-Oriented languages did for state

Multi-Core, Clusters and others aren’t the primary focus for Mitrionics (yet) But this approach for a portable parallel heterogeneous language shows promiseBut this approach for a portable parallel heterogeneous language shows promise –It has at least not yet proven hopeless If anyone is interested in supporting us, please contact me!If anyone is interested in supporting us, please contact me!

Questions? Offers of Support? Mobile: