CluBORun: tool for utilizing idle resources of computing clusters in volunteer computing. Maxim Manzyuk, Oleg Zaikin, Mikhail Posypkin Volgograd state.

Slides:



Advertisements
Similar presentations
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Advertisements

© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
In this tutorial I will be taking you through how the clients folder works in the online version of Solidata Mortgage Tracker.
P-GRADE and WS-PGRADE portals supporting desktop grids and clouds Peter Kacsuk MTA SZTAKI
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Monitoring in DIRAC environment for the BES-III experiment Presented by Igor Pelevanyuk Authors: Sergey BELOV, Igor PELEVANYUK, Alexander UZHINSKIY, Alexey.
Process Description and Control Module 1.0. Major Requirements of an Operating System Interleave the execution of several processes to maximize processor.
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Maintaining and Updating Windows Server 2008
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
1 Using Compressed Files and Folders Applications and operating systems read and write to compressed files. NTFS uncompresses the file before making it.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
1 Chapter Overview Creating Sites and Subnets Configuring Intersite Replication Troubleshooting Active Directory Replication.
THE AFFORDABLE SUPERCOMPUTER HARRISON CARRANZA APARICIO CARRANZA JOSE REYES ALAMO CUNY – NEW YORK CITY COLLEGE OF TECHNOLOGY ECC Conference 2015 – June.
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.
Overview Print and Document Services Print Management console Printer properties Troubleshooting.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
Database Administration TableSpace & Data File Management
Resource management system for distributed environment B4. Nguyen Tuan Duc.
CIM6400 CTNW (04/05) 1 CIM6400 CTNW Lesson 6 – More on Windows 2000.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
Example STP runs on bridges and switches that are 802.1D-compliant. There are different flavors of STP, but 802.1D is the most popular and widely implemented.
ORGANIZING AND ADMINISTERING OF VOLUNTEER DISTRIBUTED COMPUTING PROJECT Oleg Zaikin, Nikolay Khrapov Institute for System Dynamics and Control.
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Two Installing and Configuring Exchange Server 2003.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
Block1 Wrapping Your Nugget Around Distributed Processing.
The EDGeS project receives Community research funding 1 SG-DG Bridges Zoltán Farkas, MTA SZTAKI.
1350 TAC Training © 2000, Cisco Systems, Inc. Cisco Aironet 350 Series Product and Software Update WNBU Technical Marketing.
User Modeling of Assistive Technology Rich Simpson.
Finding the minimum number of Sudoku clues Yu-Ting CHEN Academia Sinica Grid Computing.
E-science grid facility for Europe and Latin America E2GRIS1 Gustavo Miranda Teixeira Ricardo Silva Campos Laboratório de Fisiologia Computacional.
Matlab Demo #1 ODE-solver with parameters. Summary Here we will – Modify a simple matlab script in order to split the tasks to be sent to the cluster.
BOINC Projects promotion – experience Oleg Zaikin Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
Program Systems Institute RASTDB TDB: THE INTERACTIVE DISTRIBUTED DEBUGGING TOOL FOR PARALLEL MPI PROGRAMS.
1 IDGF International Desktop Grid Federation PORTING APPLICATION TO THE BOINC DESKTOP GRID: ISA RAS EXPERIENCE Mikhail Posypkin Oleg Zaikin Nikolay Khrapov.
Status of the new NA60 “cluster” Objectives, implementation and utilization NA60 weekly meetings Pedro Martins 03/03/2005.
Process Description and Control Chapter 3. Source Modified slides from Missouri U. of Science and Tech.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
1 IDGF International Desktop Grid Federation ASSESSING THE PERFORMANCE OF DESKTOP GRID APPLICATIONS A. Afanasiev, N. Khrapov, and M. Posypkin DEGISCO is.
The Search for Systems of Diagonal Latin Squares Using the Project Oleg Zaikin, Stepan Kochemazov Matrosov Institute for System Dynamics and Control.
Solving Weakened Cryptanalysis Problems for the Bivium Keystream Generator in the Volunteer Computing Project Oleg Zaikin, Alexander Semenov,
the project of the voluntary distributed computing ver.4.06 Ilya Kurochkin Institute for information transmission problem Russian academy of.
/Reimage-Repair-Tool/ /u/6/b/ /channel/UCo47kkB-idAA-IMJSp0p7tQ /alexwaston14/reimage-system-repair/
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
What kinds of problems are suitable to be solved by desktop grid environment? Alexander Afanasyev The Institute for Information Transmission Problems (Kharkevich.
Checkpoint/restart in Slurm: current status and new developments
Exercise 3: Configuring User Home Folders
Bluetooth connection & GAIA protocol
How to connect your DG to EDGeS? Zoltán Farkas, MTA SZTAKI
Oleg Zaikin, Alexander Semenov, Mikhail Posypkin
David Cameron ATLAS Site Jamboree, 20 Jan 2017
Workload Management System
New developments for deploying
bh
Chapter 6: CPU Scheduling
CPU Scheduling G.Anuradha
Process Description and Control
Multithreaded Programming
Windows Server Administration Fundamentals
Software - Operating Systems
Presentation transcript:

CluBORun: tool for utilizing idle resources of computing clusters in volunteer computing. Maxim Manzyuk, Oleg Zaikin, Mikhail Posypkin Volgograd state technical university, Volgograd, Russia ISDCT SB RAS, Irkutsk, Russia IITP RAS, Moskow, Russia

Why volunteer computing on clusters? There are BOINC volunteer computing projects based mainly on desktop PCs that have performance greater than 1 PFLOPs. Despite this fact there are some reasons why resources of computing clusters can be useful in volunteer computing. computing cluster is quite reliable device, so results obtained on it can be taken as a reference when checking the results computing cluster can significantly help to increase performance of a new volunteer project with low amount of participants clusters resources are usually idle 2

CluBORun Cluster for BOINC Run (CluBORun) – tool for launching BOINC volunteer computing on clusters. Main features: use only ordinary cluster’s user rights utilize only idle resources of computing clusters (just as BOINC-manager does it for PCs) launch BOINC volunteer computing for any volunteer project with linux client application 3

CluBORun Folders with BOINC clients (own folder for each node) Files-flags for BOINC clients. Start flag corresponds to launched BOINC client, stop-flag means that BOINC client must be stopped Text file all_tasks.txt with list of tasks to launch (with stop-flag, start-flag, BOINC client path as parameters) MPI program start_boinc for launching BOINC client on a particular cluster node. Script catch_node.sh for analyzing cluster workload and launching start_boinc on free nodes 4

start_boinc start_boinc is a MPI C++ program that can be launched on several nodes of a cluster. On every node MPI processes are provided with roles. 1 control node process. It launch BOINC on a node (by system command), then periodically (once in 30 seconds) check if stop flag appeared or if time exceeded. Besides this checking this process slуep k-1 sleeping processes MPI processes doesn’t do any concrete tasks computing. BOINC “think” that he work on a simple PC. BOINC connects to a project and do all computations. 5

catch_node.sh One can set maximum amount of launched BOINC processes (by default I equal to cluster cores). catch_node.sh analyze cluster workload. If there are free nodes (and limit for BOINC jobs is not exceede) then launch BOINC by start_boinc If in queue appears new task of other user with status “waiting for freed resources” then catch_node.sh stop BOINC tasks (but only if it will help new task to be launched) 6

MVS-100k cluster workload for 1 month 7

Restricted queue info In cluster management system showing jobs of other users can be disabled (for example, in MVS-10P with SLURM system). In this case CluBORun can see own jobs, but no jobs of other users. By command sinfo CluBORun can see how many nodes are free occupied by BOINC occupied by other users 8

Restricted queue info Solution decrease maximum amount of launched BOINC processes if there are free resources, then add BOINC job (call it jobA) to queue if added job obtained status “waiting for resources” then in queue exists at least 1 job of other user with the same status hence CluBOrun switch to restricted mode a) stop adding new jobs in queue b) stop one after one BOINC jobs in queue until jobA will launch 9

Restricted queue info Example waiting BOINC jobA waiting jobs of other users running BOINC job1 running BOINC job2 running BOINC job3 10

Restricted queue info Variant 1. two stopped BOINC jobs helped other users jobs and BOINC jobA to run. running BOINC jobA running BOINC job3 Variant 2. three stopped BOINC jobs didn’t help BOINC jobA to run. waiting BOINC jobA no running BOINC jobs 11

Restricted queue info If stopping BOINC jobs can help, it will help. If not, then CluBOrun will wait for launching of BOINC jobA. After running of jobA CluBORun switchs from restricted mode to normal mode. This algorithm was implemented and works right now on MVS-10P cluster. 12

CluBORun resources in Range of performance of MVS-100k in is about %. With the help of CluBORun in several problems were solved faster: searching for new pairs of orthogonal diagonal Latin squares of order 10 (17 new pairs were found); solving logical cryptanalysis problem for weakened Bivium cipher (with 10 known bits from 177 bits of secret key, 3 problems were solved); solving A5/1 logical cryptanalysis problems (3 problems were solved) 13

Comparison with 3G Bridge 3G Bridge (by LPDS at MTA-SZTAKI, Hungary) is an open-source core job bridging component between different grid infrastructures. For example, between service grid (made of clusters) and desktop grid (made of PCs). Similatity: possibility to move jobs from dektop grid to a cluster. Differences in concrete situations. + 3G Bridge can bridge job not only from desktop grid to service grid, but in the opposite direction too + CluBOrun can use ordinary cluster user rights, idle recourses are used 14

Thank you for your attention! 15