X10: Performance and Productivity at Scale

Slides:



Advertisements
Similar presentations
X10 Tutorial PSC Software Productivity Study May 23 – 27, 2005 Vivek Sarkar IBM T.J. Watson Research Center This work has been supported.
Advertisements

IBM’s X10 Presentation by Isaac Dooley CS498LVK Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Testing Implementations of Access Control and Authentication Graduate Students: Ammar Masood, K. Jayaram School of Electrical and Computer Engineering.
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 at Penn, Spring.
Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
Early Adopter Introduction to Parallel Computing: Research Intensive University: 4 th Year Elective Bo Hong Electrical and Computer Engineering Georgia.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Adding PDC within a Six-Course Subset of the CS Major Apan Qasem Texas State University.
Thinking in Parallel Adopting the TCPP Core Curriculum in Computer Systems Principles Tim Richards University of Massachusetts Amherst.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
This module created with support form NSF under grant # DUE Module developed Fall 2014 by Apan Qasem Parallel Computing Fundamentals Course TBD.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
Multi-Semester Effort and Experience to Integrate NSF/IEEE-TCPP PDC into Multiple Department- wide Core Courses of Computer Science and Technology Department.
Overview of Operating Systems Introduction to Operating Systems: Module 0.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Concurrency & Dynamic Programming.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
Synchronization These notes introduce:
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
CSci6702 Parallel Computing Andrew Rau-Chaplin
Martin Kruliš by Martin Kruliš (v1.0)1.
Parallel Computing Presented by Justin Reschke
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Hongbin Li 11/13/2014 A Debugger of Parallel Mutli- Agent Spatial Simulation.
Position Benedict R. Gaster AMD. The world of programs.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 4: Multithreaded Programming
Chapter 4: Threads.
Introduction to Parallel Processing
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 4: Multithreaded Programming
Parallel and Distributed Simulation Techniques
Logistics and Abstract Metrics
Dynamic Parallelism Martin Kruliš by Martin Kruliš (v1.0)
Specifying Multithreaded Java semantics for Program Verification
Chapter 4: Threads.
Distributed Systems CS
COT 5611 Operating Systems Design Principles Spring 2014
Threads and Memory Models Hal Perkins Autumn 2011
Chapter 4: Threads.
Chapter 4: Threads & Concurrency
© 2002, Mike Murach & Associates, Inc.
Concurring Concurrently
Lesson Objectives Aims You should be able to:
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
Threads and Memory Models Hal Perkins Autumn 2009
Distributed Systems CS
Introduction to CUDA.
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Chapter 01: Introduction
Modeling Event-Based Systems in Ptolemy II EE249 Project Status Report
CMSC 202 Threads.
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
ESE532: System-on-a-Chip Architecture
Presentation transcript:

X10: Performance and Productivity at Scale Programming Models -- The Road Ahead 11/16/10 X10: Performance and Productivity at Scale Vijay Saraswat and Dave Grove IBM TJ Watson Nov 17, 2010 Asynchronous PGAS

Programming Models -- The Road Ahead 11/16/10 What is X10? Asynchronous PGAS Programming Model in a Java-like language Not Java! No threads or locks Explicitly designed for concurrency and parallelism Parallelism: places, at Concurrency: async, finish, atomic, when, clocks Sequential: closures, structs, constraint-based types, true generic types, local type inference Target architectures x86, cluster of x86/Power, big Power SMPs, 100K core Power boxes, BlueGene, GPGPUs … Two compilation paths Compile to JVM Run on a cluster of VMs Compile to C++ Runs on BG, clusters,… Both paths use a high performance run-time. No actors, MPI, OpenMP, CUDA… http://x10-lang.org Asynchronous PGAS

Principles and Practice of Parallel Programming Programming Models -- The Road Ahead 11/16/10 Principles and Practice of Parallel Programming Course objective Teaching fundamentals of data-structure design, analysis, implementation for efficient parallel execution Programming abstractions for concurrency Techniques for reasoning about behavior and performance of parallel programs Course history: Fall 09 (23 students) Fall 10 (16 students) Assignments, programming project in X10 (cluster of x86) Course structure Unit I: Introduction Unit 2: Introduction to X10 Unit 3: Abstract performance model (task graphs) Unit 4: Concrete performance model Unit 5: Safe parallelization Unit 6: Indeterminacy Unit 7: Blocking synchronization Semester long programming project Metric: speedup async, finish, clocks, at, commutativity atomic when Asynchronous PGAS