Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.

Slides:

Advertisements

Similar presentations

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Advertisements

CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Parallel System Performance CS 524 – High-Performance Computing.

Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.

Reference: Message Passing Fundamentals.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.

Parallel System Performance CS 524 – High-Performance Computing.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

The hybird approach to programming clusters of multi-core architetures.

Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Computer System Architectures Computer System Software

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Kurochkin I.I., Prun A.I. Institute for systems analysis of RAS Centre for grid-technologies and distributed computing GRID-2012, Dubna, Russia july.

Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:

Independent Component Analysis (ICA) A parallel approach.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Example: Sorting on Distributed Computing Environment Apr 20,

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.

Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.

Data Management for Decision Support Session-4 Prof. Bharat Bhasker.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.

Static Process Scheduling

Advanced Computer Networks Lecture 1 - Parallelization 1.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Processing - introduction

The University of Adelaide, School of Computer Science

Introduction to Parallelism.

Parallel Programming in C with MPI and OpenMP

Distributed Shared Memory

CSE8380 Parallel and Distributed Processing Presentation

Hybrid Programming with OpenMP and MPI

By Brandon, Ben, and Lee Parallel Computing.

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

The University of Adelaide, School of Computer Science

Database System Architectures

Lecture 17 Multiprocessors and Thread-Level Parallelism

The University of Adelaide, School of Computer Science

Presentation transcript:

Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University

.. Krakow, CGW 07, Okt 2

3 RTU Cluster ■ Initially RTU cluster started with five servers AMD Opteron TB ■ Additionaly was installed eight dual core AMD Opteron 2210 M2. ■ Therefore now there are 9 working nodes with 21 CPU units. ■ Total amount of memory is 1,8 TB. ■ RTU cluster successfully completed many calculation tasks including LHCB virtual organization orders. Krakow, CGW 07, Okt

4 RTU Cluster Krakow, CGW 07, Okt

RTU Cluster 5 Krakow, CGW 07, Okt

Computing Algorithms ■ Serial algorithm  One task – one WN (working node);  Parts of task performed serial;  Task execution time depend on WN performance only! ■ Paralel algorithm  One task – several WN;  Parts of task performed: ► Consecutive on separate WN ► In parallel on number of WN; rezults summerizing  Task execution time depend on: ► WN performance; ► Network performance; ► Bandwith of shared data stocks; ► Type of coding. 6 Krakow, CGW 07, Okt

Bottlenecks in distributive computing system 7 Krakow, CGW 07, Okt

8

Interconnections between CPU nodes 9 ************************************************************ task 0 is on wn03.grid.etf.rtu.lv partner= 2 task 1 is on wn10.grid.etf.rtu.lv partner= 3 task 2 is on wn10.grid.etf.rtu.lv partner= 0 task 3 is on wn10.grid.etf.rtu.lv partner= 1 ************************************************************ ***Message size: *** best / avg / worst (MB/sec) task pair: 0 - 2: / / task pair: 1 - 3: / / task pair: 1 - 3: / / OVERALL AVERAGES: / / use of multicore servers help to achieve higher data transmission rate in MPI applications! Krakow, CGW 07, Okt

Local interconnection rate CPU number Low rate Mb/sMedium rate Mb/s Peek rate Mb/s Transmission rate dependence of number of CPU....MPI used number of CPU have influence to intermediate connection rate!!! Krakow, CGW 07, Okt

Parallel application execution time Krakow, CGW 07, Okt 11

Paralel speedup determination ■ During experiment multiplication of large matrixes has been done. ■ Test create traffic between WN more than some 10 Mb and loaded processors. ■ Main task of the experiment is to find beginning of horizontal part of speed up curve. ■ Experiment on 1 CPU in RTU cluster takes 420 seconds. Krakow, CGW 07, Okt 12

2x WN ≠ H/2...according to Amdal’s law that speed-up conform with 20% serial algorithm code! 13 Krakow, CGW 07, Okt

Possible solutions: ■ Internal connection improvement:  Infiniband, Myranet….connections between WN;  Multicore WN implementation (RTUETF);  NFS network file system abandonment. ■ Data transfer process optimizing:  Number of flows using;  Replace standard TCP protocol to Scalable TCP; ■ Parallel algorithm processing optimization:  Minimize transactions between WN;  Reduce sequential part of MPI code;  Optimization of MPI threat number. ■ Optimization of requested resource management 14 Krakow, CGW 07, Okt

. Thank you for attention! 15 Krakow, CGW 07, Okt