© 2008 Pittsburgh Supercomputing Center Proposed ideas for consideration under AUS.

Slides:

Advertisements

Similar presentations

Managed by UT-Battelle for the Department of Energy Vinod Tipparaju P2S2 Panel: Is Hybrid Programming a Bad Idea Whose Time Has Come?

Advertisements

Simple but slow: O(n 2 ) algorithms Serial Algorithm Algorithm compares each particle with every other particle and checks for the interaction radius Most.

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.

Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Priority Research Direction Key challenges General Evaluation of current algorithms Evaluation of use of algorithms in Applications Application of “standard”

Thoughts on Shared Caches Jeff Odom University of Maryland.

SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Outline October 9th 2008 Ross C. Walker.

MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,

MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.

CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.

NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.

Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

2005/6/2 IWOMP05 1 IWOMP05 panel “OpenMP 3.0” Mitsuhisa Sato ( University of Tsukuba, Japan)

Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.

Threshold Voltage Assignment to Supply Voltage Islands in Core- based System-on-a-Chip Designs Project Proposal: Gall Gotfried Steven Beigelmacher 02/09/05.

Michael L. Norman, UC San Diego and SDSC

Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.

Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.

Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.

Overview Coding Practices Balancing standards Configuration Management Providing Change Control Software Estimation Reasons why Programmers and Managers.

Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.

Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.

The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.

Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.

Software Evaluation Catherine McKeveney Medical Informatics 1st March 2000.

Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:

N*Grid – Korean Grid Research Initiative Funded by Government (Ministry of Information and Communication) 5 Years from 2002 to million US$ Including.

1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

BG/Q vs BG/P—Applications Perspective from Early Science Program Timothy J. Williams Argonne Leadership Computing Facility 2013 MiraCon Workshop Monday.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

Performance Benefits on HPCx from Power5 chips and SMT HPCx User Group Meeting 28 June 2006 Alan Gray EPCC, University of Edinburgh.

CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Scaling Up User Codes on the SP David Skinner, NERSC Division, Berkeley Lab.

4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors

ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.

5SIA Luc Waeijen. Embedded Computer Architecture 5SAI0 Simulation Lab Assignment 1 - introduction - Luc Waeijen 16 Nov 2015.

Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.

System Programming Basics Cha#2 H.M.Bilal. Operating Systems An operating system is the software on a computer that manages the way different programs.

Parallel Computing Presented by Justin Reschke

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Chen Jin (HT016952H) Zhao Xu Ying (HT016907B)

GMAO Seasonal Forecast

4D-VAR Optimization Efficiency Tuning

Performance Analysis and optimization of parallel applications

NGS computation services: APIs and Parallel Jobs

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

CSCE 212 Chapter 4: Assessing and Understanding Performance

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Transactional Memory Coherence and Consistency

Department of Computer Science University of California, Santa Barbara

BlueGene/L Supercomputer

P A R A L L E L C O M P U T I N G L A B O R A T O R Y

Hybrid Programming with OpenMP and MPI

BigSim: Simulating PetaFLOPS Supercomputers

Department of Computer Science, University of Tennessee, Knoxville

Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.

Presentation transcript:

© 2008 Pittsburgh Supercomputing Center Proposed ideas for consideration under AUS

© 2008 Pittsburgh Supercomputing Center Revisit Mixed MPI-OpenMP Programming model? Is it time to revive mixed MPI-OpenMP programming model? –People have been looking at it for a while –Not very successful so far –Not enough cores/processor to justify –Not enough processors/node to justify –Halfhearted attempts? Have things changed? –8-16 cores per node in current T2 systems –Are we at the tipping point? Or, where is the tipping point?

© 2008 Pittsburgh Supercomputing Center Hybrid OpenMP-MPI Benchmark Simple benchmark code Permits systematic evaluation –Vary compute-comm ratio –Vary comm message sizes –Vary MPI – OpenMP balance Should we expect better performance? Is this a worthwhile approach for real applications? Hopefully provides some limits in the idealized case

© 2008 Pittsburgh Supercomputing Center Real application with Mixed MPI-OpenMP WRF ENZO POPS Other user codes?

© 2008 Pittsburgh Supercomputing Center FFT benchmarks on T2 systems Not the HPCC one More realistic dimensions 256^3 to 4086^3 –2D processor decomposition already implemented (PK’s code) –How would this compare with a slab decomposition with Mixed MPI-OpenMP? Does one exist? Also can result in tuning suggestions for users

© 2008 Pittsburgh Supercomputing Center Hybrid Architectures (Hardware) Emerging architecture There is a lot to be learned Benchmark existing applications? –NAMD –WRF

© 2008 Pittsburgh Supercomputing Center IO Benchmarking? Experiment with models actually used by users Create "Best Practices“ for IO?