Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

Job Submission Using PBSPro and Globus Job Commands.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
workshop February 5, 2010 Geert Jan Bex
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
Types of Operating System
Task Farming on HPCx David Henty HPCx Applications Support
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
Eos Center-wide File Systems Chris Fuson Outline 1 Available Center-wide File Systems 2 New Lustre File System 3 Data Transfer.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
HPC at HCC Jun Wang Outline of Workshop1 Overview of HPC Computing Resources at HCC How to obtain an account at HCC How to login a Linux cluster at HCC.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
CCPR Workshop Introduction to the Cluster March 2, 2005.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
M. Schott (CERN) Page 1 CERN Group Tutorials CAT Tier-3 Tutorial October 2009.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Chapter 9: Networking with Unix and Linux. Objectives: Describe the origins and history of the UNIX operating system Identify similarities and differences.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Next Generation of Apache Hadoop MapReduce Owen
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
IPPP Grid Cluster Phil Roffe David Ambrose-Griffith.
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
Hands on training session for core skills
GRID COMPUTING.
Specialized Computing Cluster An Introduction
Welcome to Indiana University Clusters
PARADOX Cluster job management
Experience of PROOF cluster Installation and operation
Big Data is a Big Deal!.
HPC usage and software packages
Welcome to Indiana University Clusters
How to use the HPCC to do stuff
Types of Operating System
Parallel computation with R & Python on TACC HPC server
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
Architecture & System Overview
Hadoop Clusters Tess Fulkerson.
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Parallel computation with R & Python on TACC HPC server
Introduction to High Performance Computing Using Sapelo2 at GACRC
Quick Tutorial on MPICH for NIC-Cluster
Presentation transcript:

Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex

Overview Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A

Birds eye view of VIC3 login1 login2 svcs1 svcs2 r1i0n0 r1i0n1 r1i3n15 r2i0n0 r2i0n1 netapp ~vsc30034 /bin r2i3n15

VIC3 nodes Compute nodes – 112 nodes with 2 quad core 'harpertown', 8GB RAM – 80 nodes with 2 quad core 'nehalem', 24GB RAM – 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk Storage – 20 TB disk space shared between home directories and scratch space, access via NFS – 4 nodes with disks for a parallel file system (needed for MPI I/O jobs) Service nodes include 2 login nodes 1584 cores, for  16.6 TFlop (theoretical peak)

What can you run? All open source linux software All linux software the K.U.Leuven has a license for that covers the cluster, and you are a K.U.Leuven staff member All linux software you have a license for that covers the cluster No Windows software R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users

Overview Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A

Running example: SAS code Your SAS program, e.g., ' clmk.sas ' – is usually interactive – depends on parameters, e.g., type of distribution alpha, beta – has to be run for several types and values of alpha and beta

Running example: batch mode 1 st step: convert it for batch mode – capture command line variables: – run it from the command line: … %LET type = "%scan(&sysparm, 1, %str(:))"; %LET alpha = %scan(&sysparm, 2, %str(:)); %LET beta = %scan(&sysparm, 3, %str(:)); … $ sas –batch –noterminal –sysparm discr:1.3:15.0 clmk.sas

login I've got a job to do: PBS files compute nodes queue system/scheduler: Torque/Moab #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas clmk.pbs $ msub clmk.pbs

No more modifying! #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas $ msub clmk.pbs #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas $ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0

Going parallel… or nuts? Parameter sets… – are independent, so computations can be done in parallel! – but all combination of type, alpha, beta: large number of jobs Worker framework

Overview Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A

Conceptually typealphabeta discr discr discr discr ……… cont ……… #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

Concrete typealphabeta discr discr discr discr ……… cont ……… #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas clmk.pbs clmk.csv $ module load worker/1.0 $ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8 N N rows will be computed in parallel by 2 × 8 – 1 = 15 cores

Caveat 1: time is of the essence… How long does your job need? (= walltime) – time to compute N rows/requested cores walltime limitations – more than 5 minutes – less than 2 days hence, if walltime exceeds 2 days, split data and submit multiple jobs explicitly request sufficient walltime: No hard limits, but guidelines to reduce queue time $ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00

Caveat 2: slave labour P cores, how to choose P? – functions 1 master P – 1 slaves – each compute node has 8 cores, so P mod 8 = 0 – N  >> P: better load balancing, efficiency – larger P shorter walltime (potentially) longer time in queue shortest turn-around: hard to predict turn-around = queue time + walltime

Caveat 3: independence #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR log_name="clmk-$type-$alpha-$beta.log" print_name="clmk-$type-$alpha-$beta.lst" sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas #!/bin/bash –l module load SAS/9.2 cd $PBS_O_WORKDIR log_name="clmk-$type-$alpha-$beta.log" print_name="clmk-$type-$alpha-$beta.lst" sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas SAS locks log and output files! Make sure each computation writes to its own files!

Overview Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A

Conceptually: MapReduce data.txt data.txt.1 data.txt.2 data.txt.7 … result.txt result.txt.1 result.txt.2 result.txt.7 … map reduce

Concrete: -prolog & -epilog data.txt data.txt.1 data.txt.2 data.txt.7 … result.txt result.txt.1 result.txt.2 result.txt.7 … prolog.shepilog.shprolog.sh batch.sh $ wsub –prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8

Overview Cluster VIC3: hardware & software Statistics research scenario Worker framework MapReduce with Worker Q&A

Where to find help? UHasselt staff: