Contemporary Languages in Parallel Computing Raymond Hummel.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Parallel Processing with OpenMP
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
History of Distributed Systems Joseph Cordina
Introduction CS 524 – High-Performance Computing.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
OPERATING SYSTEM OVERVIEW
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Contemporary Languages in Parallel Computing Raymond Hummel.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Hossein Bastan Isfahan University of Technology 1/23.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
What is Concurrent Programming? Maram Bani Younes.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Lecture 2 : Introduction to Multicore Computing
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
GPU Architecture and Programming
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
CS533 Concepts of Operating Systems Jonathan Walpole.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
CS 732: Advance Machine Learning
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
11 Brian Van Straalen Portable Performance Discussion August 7, FASTMath SciDAC Institute.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel Programming Models
Computer Engg, IIT(BHU)
Prof. Zhang Gang School of Computer Sci. & Tech.
Productive Performance Tools for Heterogeneous Parallel Computing
Chapter 4: Multithreaded Programming
Introduction to Parallel Processing
Microprocessors Personal Computers Embedded Systems Programmable Logic
CS427 Multicore Architecture and Parallel Computing
For Massively Parallel Computation The Chaotic State of the Art
University of Technology
Linchuan Chen, Xin Huo and Gagan Agrawal
Multi-Processing in High Performance Computer Architecture:
HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches Guang R. Gao Founder ET International.
Chapter 4: Threads.
What is Concurrent Programming?
Introduction to CUDA.
Mac OS X thread APIs Mach threads POSIX threads (pthreads)
Graphics Processing Unit
The George Washington University
Presentation transcript:

Contemporary Languages in Parallel Computing Raymond Hummel

Current Languages

Standard Languages Distributed Memory Multiprocessors  MPI Shared Memory Multiprocessors  OpenMP  pthreads Graphics Processing Units  CUDA  OpenCL

MPI Stands for: Message Passing Interface Pros  Extremely Scalable  Portable  Can harness a multitude of hardware setups Cons  Complicated Software  Complicated Hardware  Complicated Setup

MPI

OpenMP Stands for: Open Multi-Processing Pros  Incremental Parallelization  Fairly Portable  Simple Software Cons  Limited Use-Case

OpenMP

POSIX Threads Stands for: Portable Operating System Interface Threads Pros  Portable  Fine Grained Control Cons  All-or-Nothing  Complicated Software  Limited Use-Case

POSIX Threads

CUDA Stands for: Compute Unified Device Architecture Pros  Manufacturer Support  Low Level Hardware Access Cons  Limited Use-Case  Only Compatible with NVIDIA Hardware

CUDA

OpenCL Stands for: Open Compute Language Pros  Portability  Heterogeneous Platform  Works with All Major Manufacturers Cons  Complicated Software  Special Tuning Required

Future Languages

Developing Languages D Rust Harlan

D Performance of Compiled Languages Memory Safety Expressiveness of Dynamic Languages Includes a Concurrency Aware Type-System Nearing Maturity

Rust Designed for creation of large Client-Server Programs on the Internet Safety Memory Layout Concurrency Still Major Changes Occurring

Harlan Experimental Language Based on Scheme Designed to take care of boilerplate for GPU Programming Could be expanded to include automatic scheduling for both CPU and GPU, depending on available resources.

Questions?