Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.

Slides:



Advertisements
Similar presentations
Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science
Advertisements

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Konstantin Berlin 1, Jun Huan 2, Mary Jacob 3, Garima Kochhar 3, Jan Prins 2, Bill Pugh 1, P. Sadayappan 3, Jaime Spacco 1, Chau-Wen Tseng 1 1 University.
XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
Code Transformations to Improve Memory Parallelism Vijay S. Pai and Sarita Adve MICRO-32, 1999.
PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.
Introductory Courses in High Performance Computing at Illinois David Padua.
University of Houston So What’s Exascale Again?. University of Houston The Architects Did Their Best… Scale of parallelism Multiple kinds of parallelism.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Parallelization Technology v 0.2 Parallel-Developers Discussion 6/29/11.
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
Reference: Message Passing Fundamentals.
Implementing a Compiler based I/O prefetching system for Linux CS Course Project Amit Kumar Manjhi Mahim Mishra.
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Yuri Dotsenko Jason Lee EckhardtJohn Mellor-Crummey Department of.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1: the Berkeley UPC Experience Christian Bell and Wei Chen CS252.
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
Assignment 1 CMSC 714 September 11, Data Collection For this assignment, you will be asked to provide the following: –Effort spent on various.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Unified Parallel C at LBNL/UCB FT Benchmark in UPC Christian Bell and Rajesh Nishtala.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Status of Dynamical Core C++ Rewrite (Task 5) Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
The “State” and “Future” of Middleware for HPEC Anthony Skjellum RunTime Computing Solutions, LLC & University of Alabama at Birmingham HPEC 2009.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
BG/Q vs BG/P—Applications Perspective from Early Science Program Timothy J. Williams Argonne Leadership Computing Facility 2013 MiraCon Workshop Monday.
GPU Architecture and Programming
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Leibniz Supercomputing Centre Garching/Munich Matthias Brehm HPC Group June 16.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,
University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University.
Status of Dynamical Core C++ Rewrite Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller (SCS), Thomas.
Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.
Experiences with Co-array Fortran on Hardware Shared Memory Platforms Yuri DotsenkoCristian Coarfa John Mellor-CrummeyDaniel Chavarria-Miranda Rice University,
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Marketing Research & the Internet Research processes Types of data.
UPC Parallel I/O Library
Loop Parallelism and OpenMP CS433 Spring 2001
Parallel Objects: Virtualization & In-Process Components
Intel® Parallel Studio and Advisor
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
HPC User Forum: Back-End Compiler Technology Panel
Support for Adaptivity in ARMCI Using Migratable Objects
The George Washington University
Parallel I/O for Distributed Applications (MPI-Conn-IO)
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Benchmarking and Applications

Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations Quantify potential benefit from improvements –Against ideal behavior –Against other paradigms (e.g. MPI)

Strategy Develop hand optimizations that mimic the automatic optimizations Compare to compiler and run-time based optimizations

Possible optimizations under the UPC Model Privatization of local shared accesses Prefetching Aggregation Overlapping remote accesses with computations Low level optimized implementations of libraries (collective, I/O, other) Other

Applicable and easy to compare Privateization Aggregation and Prefetching Effect of low level optimized implementations under other paradigms (such as collective MPI calls)

Action Plan Matured applications and kernels should should be hosted on the main web site in a common area For a subgroup of the applications and kernels the following versions of the code should be made available by members of consortium –C –C or FORTRAN + MPI –UPC with no hand optimization –UPC with privatization optimization –UPC with privatization, perfecting and access aggregation –UPC with all optimizations possible

Action Plan cont. Producing Benchmarking results –C vs. C under UPC vs. UPC with one thread –Each hand optimized version against corresponding automatic optimizations from compiler and run-time –Fully hand optimized against MPI –Fully optimized (automatic) against MPI

Action Plan cont.: Programming Practices Groups should compile by SC 2002 observations on efficient UPC porting and programming methods –Start with C –Optimize single thread or parallelize and optimize parallel –Avoiding MPI bias and getting UPC benefit –Other

Action Plan cont.: Ease of Use Expressability for different problem domains –Data structures –Classic problems –Methods of parallelization Lines of codes for different problem domains Compare against HPF, MPI, OpenMP, Co-Array,.. etc Time to solution –To having a running parallel code –To having a tuned running parallel code

Action Plan cont.: Ease of use Strategy SC 2002, contest at booth SC 2003, conference level Worked examples

Action Plan cont.: Applications Micro benchmarks NAS SPLASH 2