Overview of Berkeley UPC

Slides:



Advertisements
Similar presentations
MPI 3 RMA Bill Gropp and Rajeev Thakur. 2 Why Change RMA? Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations.
Advertisements

1 Implementing PGAS on InfiniBandPaul H. Hargrove Experiences Implementing Partitioned Global Address Space (PGAS) Languages on InfiniBand Paul H. Hargrove.
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick, 1Berkeley UPC: Optimizing Bandwidth Limited Problems Using One-Sided Communication.
Parallel Processing with OpenMP
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.
PGAS Language Update Kathy Yelick. PGAS Languages: Why use 2 Programming Models when 1 will do? Global address space: thread may directly read/write remote.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen, Dan Bonachea, Jason Duell, Parry Husbands, Costin Iancu,
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick U.C. Berkeley, EECS LBNL, Future Technologies Group.
1 Berkeley UPC Kathy Yelick Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Rajesh Nishtala, Mike Welcome.
Applications for K42 Initial Brainstorming Paul Hargrove and Kathy Yelick with input from Lenny Oliker, Parry Husbands and Mike Welcome.
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations Dan Bonachea & Jason Duell U. C. Berkeley / LBNL
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
1 Titanium and UPCKathy Yelick UPC Benchmarks Kathy Yelick LBNL and UC Berkeley Joint work with The Berkeley UPC Group: Christian Bell, Dan Bonachea, Wei.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Unified Parallel C at LBNL/UCB Message Strip-Mining Heuristics for High Speed Networks Costin Iancu, Parry Husbans, Wei Chen.
Unified Parallel C at LBNL/UCB Empirical (so far) Understanding of Communication Optimizations for GAS Languages Costin Iancu LBNL.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
A Behavioral Memory Model for the UPC Language Kathy Yelick University of California, Berkeley and Lawrence Berkeley National Laboratory.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Unified Parallel C at LBNL/UCB Overview of Berkeley UPC Kathy Yelick Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands,
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Introduction to Charm++ Machine Layer Gengbin Zheng Parallel Programming Lab 4/3/2002.
Report on Communication Architecture for Clusters (CAC) Workshop Dhabaleswar K. (DK) Panda Department of Computer and Info. Science The Ohio State University.
Unified Parallel C at LBNL/UCB An Evaluation of Current High-Performance Networks Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove,
1 SIAC 2000 Program. 2 SIAC 2000 at a Glance AMLunchPMDinner SunCondor MonNOWHPCGlobusClusters TuePVMMPIClustersHPVM WedCondorHPVM.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Software Caching for UPC Wei Chen Jason Duell Jimmy Su Spring 2003.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.
NERSC/LBNL UPC Compiler Status Report Costin Iancu and the UCB/LBL UPC group.
Unified Parallel C Kathy Yelick EECS, U.C. Berkeley and NERSC/LBNL NERSC Team: Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu,
A Multi-platform Co-Array Fortran Compiler for High-Performance Computing Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey {dotsenko, ccristi,
IEEE Workshop on HSLN 16 Nov 2004 SCI Networking for Shared-Memory Computing in UPC: Blueprints of the GASNet SCI Conduit Hung-Hsun Su, Burton C. Gordon,
Communication Support for Global Address Space Languages Kathy Yelick, Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands,
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
Christian Bell, Dan Bonachea, Kaushik Datta, Rajesh Nishtala, Paul Hargrove, Parry Husbands, Kathy Yelick The Performance and Productivity.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
1 PGAS LanguagesKathy Yelick Partitioned Global Address Space Languages Kathy Yelick Lawrence Berkeley National Laboratory and UC Berkeley Joint work.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
UPC at NERSC/LBNL Kathy Yelick, Christian Bell, Dan Bonachea,
Dynamo: A Runtime Codesign Environment
An Emerging, Portable Co-Array Fortran Compiler for High-Performance Computing Daniel Chavarría-Miranda, Cristian Coarfa, Yuri.
Unified Parallel C at NERSC
Parallel Programming By J. H. Wang May 2, 2017.
Programming Models for SimMillennium
Grid Computing Colton Lewis.
Fei Cai Shaogang Wu Longbing Zhang and Zhimin Tang
Amir Kamil and Katherine Yelick
CS775: Computer Architecture
What is Parallel and Distributed computing?
UPC and Titanium Kathy Yelick University of California, Berkeley and
Immersed Boundary Method Simulation in Titanium Objectives
MPJ: A Java-based Parallel Computing System
Amir Kamil and Katherine Yelick
Department of Computer Science University of California, Santa Barbara
The George Washington University
Cluster Computers.
Presentation transcript:

Overview of Berkeley UPC Kathy Yelick Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome

Goals of the Berkeley UPC Project Make UPC Ubiquitous on Parallel machines Workstations and PCs for development A portable compiler: for future machines too Components of research agenda: Runtime work for Partitioned Global Address Space (PGAS) languages in general Compiler optimizations for parallel languages Application demonstrations of UPC

Where Does Berkeley UPC Run? Runs on most SMPs, clusters & supercomputers Support Operating Systems: Linux, FreeBSD, Tru64, AIX, IRIX, HPUX, Solaris, MSWindows(cygwin), MacOSX, Unicos, SuperUX Supported CPUs: x86, Itanium, Alpha, Sparc, PowerPC, PA-RISC GASNet communication: Myrinet GM, Quadrics Elan, Mellanox Infiniband VAPI, IBM LAPI, Cray X1, SGI Altix, Cray/SGI SHMEM Specific supercomputer platforms: Cray T3e, Cray X1, IBM SP, NEC SX-6, Cluster X (Big Mac), SGI Altix 3000

Recent Progress on Runtime Runtime portability, interoperability [Jason] New pthread version runs on SGI Altix, SMPs, clusters of SMPs Support for Intrepid, C++, mixed MPI GASNet communication layer [Dan] Previously existing ports: IBM LAPI, Myrinet GM, Quadrics Elan-3 FY04 Ports: Infiniband, UDP, Shmem, GM+threads, Elan-4 Research on support for pinning-based networks such as Infiniband and Myrinet Third party ports: SCI by UFL

Recent Progress on Compiler Enabled optimizations in Open64 base Static analyses for parallel code Understand when code motion is legal without changing views from other processors Extended cycle detection to arrays with three different algorithms Message Coalescing Replacing small messages with larger ones [Wei] Message strip-mining Find optimal message size for pipelining [Costin] Experiments with vectorization on the X1 [Christian and Wei]

Recent Progress on Applications NAS PB-size problems Berkeley NAS MG avoids most global barriers and relies on UPC relaxed memory model [Parry] Berkeley NAS CG has several versions, including simpler, fine-grained communication Berkeley NAS FT [Christian] Sparse triangular solve [Rajesh] Algorithms that are challenging in MPI 2D Delauney Triangulation [SIAM PP ‘04] [Parry] AMR in UPC: Chombo Poisson solver [Mike] Investigation into AMR potential

Progress on the Language Specification of UPC memory model in progress Joint with MTU Behavioral spec [Dagstuhl03] UPC IO nearly finalized Joint with GWU and ANL UPC Collectives V 1.0 finalized Effort led by MTU Optimized version on GASNet underway [Paul] Investigation of automatic tuning [Rajesh] Improvements to UPC Language Spec Led by IDA

External Activities Participation in UPC bi-annual consortium meeting 4 Tutorials: PSC, SIAM PP04, SC02/SC03, IPDPS03 UCB Parallel Computing course Assignment using 4 problems in 2 PGAS languages Slides used at elsewhere (UCSB,…) 10 Presentations at workshops, conferences, and panels, poster sessions 11 Publications 7 in refereed conferences/journals 4 are language or runtime interface specifications

Presentation Details Performance Workshop: Programming Models Panel, K. Yelick, April 2004. PSC Petamethods Workshop: “UPC” B. Carlson, K. Yelick, April 2004. Open64 Workshop: “Berkeley UPC Compiler,” C. Iancu, 2004. SIAM PP04 Tutorial: “PGAS Languages,” T. El-Ghazawi, K. Yelick, B. Carlson, B. Numerich, Feb. 2004. SIAM PP04 talk: “Delaunay Triangulation in UPC,” P. Husbands, Feb. 2004. SIAM PP04 Poster: “Automatic Tuning of Collectives,” R. Nishtala et al, Feb. 2004. NRC/LBNL Site visit: “Programming Language Issues,” K. Yelick, Jan. 2004. SC03 Workshop on Petaflops Programming: “What is wrong with MPI for Petaflops Progrmaming?”, Nov. 2003. SC03 Poster: “GASNet,” D. Bonachea, J. Duell, P. Hargrove, Nov. 2003. Dagstuhl workshop on Memory Consistency Models: A Behavioral Model for the UPC Language, K. Yelick, Oct. 2003.

2003/4 Publications Evaluating Support for Global Address Space Languages on the Cray X1 C. Bell, W. Chen, D. Bonachea, K. Yelick. ICS 2004 (to appear). Message Strip Mining Heuristics for High Speed Networks C. Iancu, P. Husbands, W. Chen. VECPAR 2004 (to appear). Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations D. Bonachea and J. Duell. 2nd Workshop on Hardware/Software Support for High Perf. Scientific and Engineering Computing, SHPSEC-PACT03. (Also to appear in IJHPCN.) Polynomial-time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays W. Chen, A. Krishnamurthy, K. Yelick. 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2003. A Performance Analysis of the Berkeley UPC Compiler W. Chen, D. Bonachea, J. Duell, P. Husbands, C. Iancu, K. Yelick. 17th Annual International Conference on Supercomputing (ICS), 2003. A New DMA Registration Strategy for Pinning-Based High Performance Networks C. Bell and D. Bonachea. Communication Architecture for Clusters (CAC'03), 2003. An Evaluation of Current High-Performance Networks C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, K. Yelick. 17th Intern’l Parallel and Distributed Processing Symposium (IPDPS), 2003. Proposal for Extending the UPC Memory Copy Library Functions, v0.7 D. Bonachea , UPC community forum, 2004. A Proposal for a UPC Memory Consistency Model, v1.0 K. Yelick, D. Bonachea and C. Wallace, LBNL Tech Report LBNL-54983. UPC-IO: A Parallel I/O API for UPC, v1.0pre10 T. El-Ghazawi, F. Cantonnet, P. Saha, R. Thakur, R. Ross, D. Bonachea, http://upc.gwu.edu. GASNet Specification, v1.1 D. Bonachea. U.C. Berkeley Tech Report CSD-02-1207.

Schedule Introduction 8:15 Coffee 12:00 - Lunch (45 Minutes) 8:30 Overview of Berkeley UPC Runtime Session 8:40 Runtime (Duell) 9:00 Gasnet (Bonachea) 9:40 Collectives (Hargrove, Nishtala) 10:00 - Break (15 min) Compiler Session 10:15 Berkeley UPC on the X1 (Bell, Chen) 11:00 Message Coalescing (Chen) 11:30 Message Stripmining (Iancu) 12:00 - Lunch (45 Minutes) 12:00 - Lunch (45 Minutes) 12:45 AMR in UPC (Welcome) 1:10 FFT (Bell) 1:20 Sparse Triangular Solve (Nishtala) 1:30 Scaling UPC Applications (Husbands) Summary 1:45 Future Directions (Kathy Yelick) 2:30 Break and move to 50F conference room 2:45 Discussion