MPI and C-Language Seminars 2010. Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
MPI and C-Language Seminars Seminar Plan (1/3)  Aim: Introduce the ‘C’ Programming Language.  Plan to cover: Basic C, and programming techniques.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
High Performance Computing
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
Reference: Message Passing Fundamentals.
1 Fall 2005 Hardware Addressing and Frame Identification Qutaibah Malluhi CSE Department Qatar University.
Performance Analysis of MPI Communications on the SGI Altix 3700 Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan Distributed & High Performance.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Nor Asilah Wati Abdul Hamid, Paul Coddington, Francis Vaughan School of Computer Science, University of Adelaide IPDPS - PMEO April 2006 Comparison of.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Designing Parallel Programs David Rodriguez-Velazquez CS-6260 Spring-2009 Dr. Elise de Doncker.
The hybird approach to programming clusters of multi-core architetures.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Basic Concepts of Computer Networks
DCT 1123 PROBLEM SOLVING & ALGORITHMS INTRODUCTION TO PROGRAMMING.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
With RTAI, MPICH2, MPE, Jumpshot, Sar and hopefully soon OProfile or VTune Dawn Nelson CSC523.
2a.1 Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1 Scaling Collective Multicast Fat-tree Networks Sameer Kumar Parallel Programming Laboratory University Of Illinois at Urbana Champaign ICPADS ’ 04.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Experience with Using a Performance Predictor During Development a Distributed Storage System Tale Lauro Beltrão Costa *, João Brunet +, Lile Hattori #,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Circuit & Packet Switching. ► Two ways of achieving the same goal. ► The transfer of data across networks. ► Both methods have advantages and disadvantages.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
Parallelization of the Classic Gram-Schmidt QR-Factorization
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Blackfin Array Handling Part 1 Making an array of Zeros void MakeZeroASM(int foo[ ], int N);
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
CSCI-455/552 Introduction to High Performance Computing Lecture 6.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
Performance Tuning John Black CS 425 UNR, Fall 2000.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Advanced Computer Networks Lecture 1 - Parallelization 1.
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
Parallel IO for Cluster Computing Tran, Van Hoai.
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Introduction to Parallelism.
Parallel Programming in C with MPI and OpenMP
By Brandon, Ben, and Lee Parallel Computing.
Creating Computer Programs
Virtual Memory: Working Sets
Optimizing MPI collectives for SMP clusters
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Presentation transcript:

MPI and C-Language Seminars 2010

Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O, Memory  Week 3 – Compiler Options and Debugging  Week 4 – MPI in C and Using the HPSG Cluster  Week 5 – “How to Build a Performance Model”  Week 6-9 – Coursework Troubleshooting (Seminar tutors available in their office)

Performance Models  Aim to predict the runtime of an application.  Allow you to predict beyond grid size and unknown hardware.  Gauge the scalability of the algorithm / code.  Help analyse the parallel efficiency of code.  See where bottle necks are. Both hardware and software.

Factors of a Model  Computation – Active Processing: The time spent doing actual work. More processors => less work per processor. Overall computation time should fall with increase in processors.  Communication – Message Passing: Communication between processors. Overall I/O time will increase with processor count. More network contention means slower communication.

Getting the Balance (1 / 2) Number Of Processors

Getting the Balance (2 / 2)  Fixed Costs: The work done by all processors. More processors will not reduce this time.  Variable Costs: Portion of work which varies with processor count. Generally based on problem size decomposition.

Timers  Lots of different timers: CPU Time – Actual time spent of CPU. Wall Time – Total time since program start.  Different timers have different overheads.  Try and avoid timing timers.  Recommend C timer – Need to call with 2 double pointers. double cpuStart, wallStart; Timers(&wallStart, &cpuStart);

Where is the Expense?  Need to establish what the expensive operations are: Functions which are called frequently. Functions which take a long time. Work out a percentage break down of total time.  Is it communicational or computational expense?  Is it a fixed cost or a variable cost?

Computational Model (1 / 2)  How will the number of processors effect the amount of work done by each processor.  Will they all the same amount of work? Even decomposition.  Are loops dependent on the problem size?  Need to look at: How long operations take. How many times they are performed.

Computational Model (2 / 2)  A basic model: Time how long each different operation takes. Calculate how many times those operations are used. Add them all up.  Inaccuracy: When using timers consider their overhead. Always more accurate to time the repetition of operations and divide through.  Note: Communication will show in wall time.

Communication Model (1 / 2)  Many different types of communication: Send and receives. Collective operations. Barriers.  Need to build a model of the network: Can use existing programs: PingPong / SKaMPI Or write your own.  How much data is being sent?

Communication Model (2 / 2)  Communication times are based on packet size. There is an initial cost of a send – Handshake. Then a variable cost – Payload.  Where is the data being sent? Are the source and destination on the same node? Message Size (Bytes)

Bringing it all Together  What you need: Computation benchmark application. Communication benchmark application. Spreadsheet model.  Run benchmarks on cluster and plug data into model.  Make predictions for different processor configurations.