Presentation is loading. Please wait.

Presentation is loading. Please wait.

Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis.

Similar presentations


Presentation on theme: "Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis."— Presentation transcript:

1 Upgrade D0 farm

2 Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis Integration in farm network

3 MC production on farm Input: requests Request translated in mc_runjob macro Stages: 1.mc_runjob on batch server (hoeve) 2.MC job on node 3.SAM store on file server (schuur)

4 farm server file server node SAM DB datastore fbs(rcp,sam) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB FNAL SARA control data metadata fbs job: 1 mcc 2 rcp 3 sam 100 cpu’s

5 farm server file server node SAM DB datastore fbs(rcp[,sam]) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB FNAL SARA control data metadata fbs job: 1 mcc 2 rcp 100 cpu’s cron: sam

6 fbsuser:cp fbsuser:mcc fbsuser: rcp willem:sam hoevenode schuur fbsuser: mc_runjob fbs submit data control cron

7 SECTION mcc EXEC=/d0gstar/curr/minbias-02073214824/batch NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/curr/minbias-02073214824/stdout STDERR=/d0gstar/curr/minbias-02073214824/stdout SECTION rcp EXEC=/d0gstar/curr/minbias-02073214824/batch_rcp NUMPROC=1 QUEUE=IOQ DEPEND=done(mcc) STDOUT=/d0gstar/curr/minbias-02073214824/stdout_rcp STDERR=/d0gstar/curr/minbias-02073214824/stdout_rcp

8 #!/bin/sh. /usr/products/etc/setups.sh cd /d0gstar/mcc/mcc-dist. mcc_dist_setup.sh mkdir -p /data/curr/minbias-02073214824 cd /data/curr/minbias-02073214824 cp -r /d0gstar/curr/minbias-02073214824/*. touch /d0gstar/curr/minbias-02073214824/.`uname -n` sh minbias-02073214824.sh `pwd` > log touch /d0gstar/curr/minbias-02073214824/`uname -n` /d0gstar/bin/check minbias-02073214824 #!/bin/sh i=minbias-02073214824 if [ -f /d0gstar/curr/$i/OK ];then mkdir -p /data/disk2/sam_cache/$i cd /data/disk2/sam_cache/$i node=`ls /d0gstar/curr/$i/node*` node=`basename $node` job=`echo $i | awk '{print substr($0,length-8,9)}'` rcp -pr $node:/data/dest/d0reco/reco*${job}*. rcp -pr $node:/data/dest/reco_analyze/rAtpl*${job}*. rcp -pr $node:/data/curr/$i/Metadata/*.params. rcp -pr $node:/data/curr/$i/Metadata/*.py. rsh -n $node rm -rf /data/curr/$i rsh -n $node rm -rf /data/dest/*/*${job}* touch /d0gstar/curr/$i/RCP fi batch runs on node batch_rcp runs on schuur

9 #!/bin/sh locate(){ file=`grep "import =" import_${1}_${job}.py | awk -F \" '{print $2}'` sam locate $file | fgrep -q [ return $? }. /usr/products/etc/setups.sh setup sam SAM_STATION=hoeve export SAM_STATION tosam=$1 LIST=`cat $tosam` for job in $LIST do cd /data/disk2/sam_cache/${job} list='gen d0g sim' for i in $list do until locate $i || (sam declare import_${i}_${job}.py && locate ${i}) do sleep 60; done done list='reco recoanalyze' for i in $list do sam store --descrip=import_${i}_${job}.py --source=`pwd` return=$? echo Return code sam store $return done echo Job finished... declare gen, d0g, sim store reco, recoanalyze runs on schuur called by fbs or cron

10 Filestream Fetch input from sam Read input file from schuur Process data on node Copy output to schuur

11 rcp d0exe rcp sam hoevenode schuur mc_runjob fbs submit data control cron attach filestream

12 Analysis on farm Stages: –Read files from sam –Copy files to node(s) –Perform analysis on node –Copy files to file server –Store files in sam

13 farm server file server node SAM DB datastore 1.2 TB 40 GB FNAL SARA control (fbs) data metadata 100 cpu’s 1.sam + rcp 2.analyze 3.rcp + sam fbs(1), fbs(3) fbs(2)

14 triviaal node-2 fbsuser:rcp fbsuser: analysis program willem:sam input output

15 SECTION sam EXEC=/home/willem/batch_sam NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout #!/bin/sh. /usr/products/etc/setups.sh setup sam SAM_STATION=triviaal export SAM_STATION sam run project get_file.py --interactive > log /usr/bin/rsh -n -l fbsuser triviaal rcp -r /stage/triviaal/sam_cache/boo node-2:/data/test >> log batch.jdf batch_sam

16 farm server file server node SAM DB datastore 1.2 TB 40 GB FNAL SARA control (fbs) data metadata 100 cpu’s 1.sam 2.rcp + analyze + rcp 3.rcp + sam fbs(1), fbs(3) fbs(2)

17 triviaal node-2 fbsuser: rcp analysis program rcp willem:sam input output fbsuser:fbs submit

18 SECTION sam EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout #!/bin/sh uname -a date rsh -l fbsuser triviaal fbs submit ~willem/batch_node.jdf

19 #!/bin/sh. /usr/products/etc/setups.sh setup fbsng setup sam SAM_STATION=triviaal export SAM_STATION sam run project get_file.py --interactive > log /usr/bin/rsh -n -l fbsuser triviaal fbs submit /home/willem/batch_node.jdf SECTION sam EXEC=/home/willem/batch NUMPROC=1 QUEUE=IOQ STDOUT=/home/willem/stdout STDERR=/home/willem/stdout SECTION ana EXEC=/d0gstar/batch_node NUMPROC=1 QUEUE=FastQ STDOUT=/d0gstar/stdout STDERR=/d0gstar/stdout #!/bin/sh rcp -pr server:/stage/triviaal/sam_cache/boo /data/test. /d0/fnal/ups/etc/setups.sh setup root -q KCC_4_0:exception:opt:thread setup kailib root -b -q /d0gstar/test.C { gSystem->cd("/data/test/boo"); gSystem->Exec("pwd"); gSystem->Exec("ls -l"); }

20 # # This file sets up and runs a SAM project. # import os, sys, string, time, signal from re import * from globals import * import run_project from commands import * ######################################### # # Set the following variables to appropriate values # Consult database for valid choices sam_station = "triviaal" # Consult Database for valid choices project_definition = "op_moriond_p1014" # A particular snapshot version, last or new snapshot_version = 'new' # Consult database for valid choices appname = "test" version = "1" group = "test" # The maximum number of files to get from sam max_file_amt = 5 # for additional debug info use "--verbose" #verbosity = "--verbose" verbosity = "" # Give up on all exceptions give_up = 1 def file_ready(filename): # Replace this python subroutine with whatever # you want to do # to process the file that was retrieved. # This function will only be called in the event of # a successful delivery. print "File ",filename," has been delivered!" # os.system('cp '+filename+' /stage/triviaal/sam') return get_file.py

21 Disk partitioning hoeve /d0 /fnal /d0dist /d0usr /mcc /mcc-dist/mc_runjob /curr /ups /db/etc/prd /fnal -> /d0/fnal /d0usr -> /fnal/d0usr /d0dist -> /fnal/d0dist /usr/products -> /fnal/ups /fbsng

22 ana_runjob Is analogous to mc_runjob Creates and submits analysis jobs Input –get_file.py with SAM project name Project defines files to be processed –analysis script

23 Integration with grid (1) At present separate clusters: –D0, LHCb, Alice, DAS cluster hoeve and schuur in farm network

24 Present network layout hoeve schuur switch node router hefnet surfnet ajax NFS

25 New network layout farmrouter switch D0 LHCb hefnet lambda hoeve schuur alice ajax NFS booder

26 New network layout farmrouter switch D0 LHCb hefnet lambda hoeve schuur alice ajax NFS booder das-2

27 Server tasks hoeve –software server –farm server schuur –fileserver –sam node booder –home directory server –in backup scheme

28 Integration with grid (2) Replace fbs with pbs or condor –pbs on Alice and LHCb nodes –condor on das cluster Use EDG installation tool LCGF –Install d0 software with rpm Problem with sam (uses ups/upd)

29 Integration with grid (3) Package mcc in rpm Separate programs from working space Use cfg commands to steer mc_runjob Find better place for card files Input structure now created on node

30 Grid job #!/bin/sh macro=$1 pwd=`pwd` cd /opt/fnal/d0/mcc/mcc-dist. mcc_dist_setup.sh cd $pwd dir=/opt/fnal/d0/mcc/mc_runjob/py_script python $dir/Linker.py script=$macro [willem@tbn09 willem]$ cat test.pbs # PBS batch job script #PBS -o /home/willem/out #PBS -e /home/willem/err #PBS -l nodes=1 # Changing to directory as requested by user cd /home/willem # Executing job as requested by user./submit minbias.macro PBS jobsubmit

31 RunJob class for grid class RunJob_farm(RunJob_batch) : def __init__(self,name=None) : RunJob_batch.__init__(self,name) self.myType="runjob_farm" def Run(self) : self.jobname = self.linker.CurrentJob() self.jobnaam = string.splitfields(self.jobname,'/')[-1] comm = 'chmod +x ' + self.jobname commands.getoutput(comm) if self.tdconf['RunOption'] == 'RunInBackground' : RunJob_batch.Run(self) else : bq = self.tdconf['BatchQueue'] dirn = os.path.dirname(self.jobname) print dirn comm = 'cd ' + dirn + '; sh ' + self.jobnaam + ' `pwd` >& stdout' print comm runcommand(comm)

32 To be decided Location of minimum bias files Location of MC output

33 Job status Job status is recorded in –fbs –/d0/mcc/curr/ –/data/mcc/curr/

34 SAM servers On master node: –station –fss On master and worker nodes: –stager –bbftp


Download ppt "Upgrade D0 farm. Reasons for upgrade RedHat 7 needed for D0 software New versions of –ups/upd v4_6 –fbsng v1_3f+p2_1 –sam Use of farm for MC and analysis."

Similar presentations


Ads by Google