Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Slides:



Advertisements
Similar presentations
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
Advertisements

Voodoo D0 on EDG. Get certificate certificate.nikhef.nl Organization nikhef ? Open ssl necessary on dors?
ICHEP visit to NIKHEF. D0 Monte Carlo farm hoeve (farm) MC request schuur (barn) SAM MHz 2-CPU nodes 50 * 40 GB 1.2 TB.
Installation mcc from minitar mc_runjob from upd –Copied from /d0dist to /d0/mcc/mc_runjob –Changes in /d0 –Undo mod of Rod Walker.
Information and Communications Theory Labs, School of Computer & Communication Sciences FILE: kickstart.sxi / 24/01/03 / Page 1
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
D0 NIKHEF. March 22, Nijmegen2 Amsterdam D0 farm (SAM station hoeve, node schuur) –2 servers, 50 dual cpu nodes –Used for MC production.
Shell Script Assignment 1.
Status of the new CRS software (update) Tomasz Wlodek June 22, 2003.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
CMPE 151: Network Administration Spring Class Description Focus: system and network administration. Sequence of exercises. E.g., installing/configuring.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
1 THE UNIX FILE SYSTEM By Chokechai Chuensukanant ID COSC 513 Operating System.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
SAM Job Submission What is SAM? sam submit …… Data Management Details. Conclusions. Rod Walker, 10 th May, Gridpp, Manchester.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
Session 2 Wharton Summer Tech Camp Basic Unix. Agenda Cover basic UNIX commands and useful functions.
Introduction to Linux OS (IV) AUBG ICoSCIS Team Prof. Volin Karagiozov March, 09 – 10, 2013 SWU, Blagoevgrad.
UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.
1 Week 2 The Crunchy Shell to the Soft and Chewy Kernel… Sarah Diesburg 8/3/2010 COP4610 / CGS5765.
Agenda Link of the week Use of Virtual Machine Review week one lab assignment This week’s expected outcomes Review next lab assignments Break Out Problems.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Job handling in Ganga Jakub T. Moscicki ARDA/LHCb GANGA-DIRAC Meeting, June, 2005.
1 Operating Systems Lecture 2 UNIX and Shell Scripts.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
SAM Installation Lauri Loebel Carpenter and the SAM Team February
Unix/Linux cs3353. The Shell The shell is a program that acts as the interface between the user and the kernel. –The shell is fully programmable and will.
A New CDF Model for Data Movement Based on SRM Manoj K. Jha INFN- Bologna 27 th Feb., 2009 University of Birmingham, U.K.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
CS252: Systems Programming Ninghui Li Slides by Prof. Gustavo Rodriguez-Rivera Topic 7: Unix Tools and Shell Scripts.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Lecture 02 File and File system. Topics Describe the layout of a Linux file system Display and set paths Describe the most important files, including.
Update on NIKHEF D0 farm June June 6, 2002D0RACE2 Outline Status of D0 farm –Upgrades –Move into grid network Use of EDG testbed and DAS-2 cluster.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
McFarm Improvements and Re-processing Integration D. Meyer for The UTA Team DØ SAR Workshop Oklahoma University 9/26 - 9/27/2003
UNIX-21 WEEK 2 4/5/2005. UNIX-22 TOPICS Functions (contd.) pushd, popd, dirs Debugging Shell scripts Scheduling Unix jobs Job Management.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
3 Copyright © 2006, Oracle. All rights reserved. Installation and Administration Basics.
1 By: Solomon Mikael (UMBC) Advisors: Elena Vataga (UNM) & Pavel Murat (FNAL) Development of Farm Monitoring & Remote Concatenation for CDFII Production.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
How to run MC on the D0 farm. Steps On hoeve (user fbsuser) –Get MC request –Create macro –Submit jobs On schuur (user willem) –Store files into SAM –Clear.
Compute and Storage For the Farm at Jlab
GRID COMPUTING.
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
Linux 101 Training Module Linux Basics.
UBUNTU INSTALLATION
Running a job on the grid is easier than you think!
SAM at CCIN2P3 configuration issues
The Linux Operating System
Shell Script Assignment 1.
UNIX Reference Sheets CSE 2031 Fall 2010.
Michael P. McCumber Task Force Meeting April 3, 2006
Linux Shell Script Programming
gLite Job Management Christos Theodosiou
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
Chapter 3 The UNIX Shells
Production client status
Presentation transcript:

Upgrade D0 farm

Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis Integration in farm network

MC production on farm Input: requests Request translated in mc_runjob macro Stages: 1.mc_runjob on batch server (hoeve) 2.MC job on node 3.SAM store on file server (schuur)

farm server file server node SAM DB datastore fbs(rcp,sam) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB FNAL SARA control data metadata fbs job: 1 mcc 2 rcp 3 sam 100 cpu’s

farm server file server node SAM DB datastore fbs(rcp[,sam]) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB FNAL SARA control data metadata fbs job: 1 mcc 2 rcp 100 cpu’s cron: sam

fbsuser:cp fbsuser:mcc fbsuser: rcp willem:sam hoevenode schuur fbsuser: mc_runjob fbs submit data control cron

SECTION mcc EXEC=/d0gstar/curr/minbias /batch NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/curr/minbias /stdout STDERR=/d0gstar/curr/minbias /stdout SECTION rcp EXEC=/d0gstar/curr/minbias /batch_rcp NUMPROC=1 QUEUE=IOQ DEPEND=done(mcc) STDOUT=/d0gstar/curr/minbias /stdout_rcp STDERR=/d0gstar/curr/minbias /stdout_rcp

#!/bin/sh. /usr/products/etc/setups.sh cd /d0gstar/mcc/mcc-dist. mcc_dist_setup.sh mkdir -p /data/curr/minbias cd /data/curr/minbias cp -r /d0gstar/curr/minbias /*. touch /d0gstar/curr/minbias /.`uname -n` sh minbias sh `pwd` > log touch /d0gstar/curr/minbias /`uname -n` /d0gstar/bin/check minbias #!/bin/sh i=minbias if [ -f /d0gstar/curr/$i/OK ];then mkdir -p /data/disk2/sam_cache/$i cd /data/disk2/sam_cache/$i node=`ls /d0gstar/curr/$i/node*` node=`basename $node` job=`echo $i | awk '{print substr($0,length-8,9)}'` rcp -pr $node:/data/dest/d0reco/reco*${job}*. rcp -pr $node:/data/dest/reco_analyze/rAtpl*${job}*. rcp -pr $node:/data/curr/$i/Metadata/*.params. rcp -pr $node:/data/curr/$i/Metadata/*.py. rsh -n $node rm -rf /data/curr/$i rsh -n $node rm -rf /data/dest/*/*${job}* touch /d0gstar/curr/$i/RCP fi batch runs on node batch_rcp runs on schuur

#!/bin/sh locate(){ file=`grep "import =" import_${1}_${job}.py | awk -F \" '{print $2}'` sam locate $file | fgrep -q [ return $? }. /usr/products/etc/setups.sh setup sam SAM_STATION=hoeve export SAM_STATION tosam=$1 LIST=`cat $tosam` for job in $LIST do cd /data/disk2/sam_cache/${job} list='gen d0g sim' for i in $list do until locate $i || (sam declare import_${i}_${job}.py && locate ${i}) do sleep 60; done done list='reco recoanalyze' for i in $list do sam store --descrip=import_${i}_${job}.py --source=`pwd` return=$? echo Return code sam store $return done echo Job finished... declare gen, d0g, sim store reco, recoanalyze runs on schuur called by fbs or cron

Filestream Fetch input from sam Read input file from schuur Process data on node Copy output to schuur

rcp d0exe rcp sam hoevenode schuur mc_runjob fbs submit data control cron attach filestream

Analysis on farm Stages: –Read files from sam –Copy files to node(s) –Perform analysis on node –Copy files to file server –Store files in sam

farm server file server node SAM DB datastore 1.2 TB 40 GB FNAL SARA control (fbs) data metadata 100 cpu’s 1.sam + rcp 2.analyze 3.rcp + sam fbs(1), fbs(3) fbs(2)

triviaal node-2 fbsuser:rcp fbsuser: analysis program willem:sam input output

SECTION sam EXEC=/home/willem/batch_sam NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout #!/bin/sh. /usr/products/etc/setups.sh setup sam SAM_STATION=triviaal export SAM_STATION sam run project get_file.py --interactive > log /usr/bin/rsh -n -l fbsuser triviaal rcp -r /stage/triviaal/sam_cache/boo node-2:/data/test >> log batch.jdf batch_sam

farm server file server node SAM DB datastore 1.2 TB 40 GB FNAL SARA control (fbs) data metadata 100 cpu’s 1.sam 2.rcp + analyze + rcp 3.rcp + sam fbs(1), fbs(3) fbs(2)

triviaal node-2 fbsuser: rcp analysis program rcp willem:sam input output fbsuser:fbs submit

SECTION sam EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout #!/bin/sh uname -a date rsh -l fbsuser triviaal fbs submit ~willem/batch_node.jdf

#!/bin/sh. /usr/products/etc/setups.sh setup fbsng setup sam SAM_STATION=triviaal export SAM_STATION sam run project get_file.py --interactive > log /usr/bin/rsh -n -l fbsuser triviaal fbs submit /home/willem/batch_node.jdf SECTION sam EXEC=/home/willem/batch NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout SECTION ana EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout #!/bin/sh rcp -pr server:/stage/triviaal/sam_cache/boo /data/test. /d0/fnal/ups/etc/setups.sh setup root -q KCC_4_0:exception:opt:thread setup kailib root -b -q /d0gstar/test.C { gSystem->cd("/data/test/boo"); gSystem->Exec("pwd"); gSystem->Exec("ls -l"); }

# # This file sets up and runs a SAM project. # import os, sys, string, time, signal from re import * from globals import * import run_project from commands import * ######################################### # # Set the following variables to appropriate values # Consult database for valid choices sam_station = "triviaal" # Consult Database for valid choices project_definition = "op_moriond_p1014" # A particular snapshot version, last or new snapshot_version = 'new' # Consult database for valid choices appname = "test" version = "1" group = "test" # The maximum number of files to get from sam max_file_amt = 5 # for additional debug info use "--verbose" #verbosity = "--verbose" verbosity = "" # Give up on all exceptions give_up = 1 def file_ready(filename): # Replace this python subroutine with whatever # you want to do # to process the file that was retrieved. # This function will only be called in the event of # a successful delivery. print "File ",filename," has been delivered!" # os.system('cp '+filename+' /stage/triviaal/sam') return get_file.py

Disk partitioning hoeve /d0 /fnal /d0dist /d0usr /mcc /mcc-dist/mc_runjob /curr /ups /db/etc/prd /fnal -> /d0/fnal /d0usr -> /fnal/d0usr /d0dist -> /fnal/d0dist /usr/products -> /fnal/ups /fbsng

ana_runjob Is analogous to mc_runjob Creates and submits analysis jobs Input –get_file.py with SAM project name Project defines files to be processed –analysis script

Integration with grid (1) At present separate clusters: –D0, LHCb, Alice, DAS cluster hoeve and schuur in farm network

Present network layout hoeve schuur switch node router hefnet surfnet ajax NFS

New network layout farmrouter switch D0 LHCb hefnet lambda hoeve schuur alice ajax NFS booder

New network layout farmrouter switch D0 LHCb hefnet lambda hoeve schuur alice ajax NFS booder das-2

Server tasks hoeve –software server –farm server schuur –fileserver –sam node booder –home directory server –in backup scheme

Integration with grid (2) Replace fbs with pbs or condor –pbs on Alice and LHCb nodes –condor on das cluster Use EDG installation tool LCGF –Install d0 software with rpm Problem with sam (uses ups/upd)

Integration with grid (3) Package mcc in rpm Separate programs from working space Use cfg commands to steer mc_runjob Find better place for card files Input structure now created on node

Grid job #!/bin/sh macro=$1 pwd=`pwd` cd /opt/fnal/d0/mcc/mcc-dist. mcc_dist_setup.sh cd $pwd dir=/opt/fnal/d0/mcc/mc_runjob/py_script python $dir/Linker.py script=$macro willem]$ cat test.pbs # PBS batch job script #PBS -o /home/willem/out #PBS -e /home/willem/err #PBS -l nodes=1 # Changing to directory as requested by user cd /home/willem # Executing job as requested by user./submit minbias.macro PBS jobsubmit

RunJob class for grid class RunJob_farm(RunJob_batch) : def __init__(self,name=None) : RunJob_batch.__init__(self,name) self.myType="runjob_farm" def Run(self) : self.jobname = self.linker.CurrentJob() self.jobnaam = string.splitfields(self.jobname,'/')[-1] comm = 'chmod +x ' + self.jobname commands.getoutput(comm) if self.tdconf['RunOption'] == 'RunInBackground' : RunJob_batch.Run(self) else : bq = self.tdconf['BatchQueue'] dirn = os.path.dirname(self.jobname) print dirn comm = 'cd ' + dirn + '; sh ' + self.jobnaam + ' `pwd` >& stdout' print comm runcommand(comm)

To be decided Location of minimum bias files Location of MC output

Job status Job status is recorded in –fbs –/d0/mcc/curr/ –/data/mcc/curr/

SAM servers On master node: –station –fss On master and worker nodes: –stager –bbftp