Presentation is loading. Please wait.

Presentation is loading. Please wait.

ILD MCProduction with ILCDirac

Similar presentations


Presentation on theme: "ILD MCProduction with ILCDirac"— Presentation transcript:

1 ILD MCProduction with ILCDirac
Akiya Miyamoto (KEK) Hiroaki Ono (NDU) AWLC17 Sim/Rec session June 26, 2017

2 Akiya Miyamoto, Sim/Rec session @ AWLC17
DIRAC and ILCDirac DIRAC ( Distributed Infrastructure with Remote Agent Control) :High level interface between users and distributed resources Job managements, File catalog, .. Transformation system for productions Written in Python 2 Web interface ILCDirac : An extension for the ILC VO and CALICE VO - Developed & operated by CLICdp group since 2010. - Provide simple interface for user jobs - DIRAC file catalog for file and metadata - Central system for large scale production - Now 45 sites (LCG and OSG) in 11 countries for SiD, ILD, CLICdp   - From Japan, only KEK. Provide storage and CPUs Japanese group is responsible for ILD MC production with ILCDirac since 2015 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

3 ILD production with ILCDirac
Transformation concept Job workflow is implemented in DIRAC server and shared among groups. Automatically creates a job script, submit to a site selected, retry when failed, and maintain job logs, … ILD production DIRAC server for CLICdp and the most of existing scripts developed by CLICdp are utilized. However, production workflow of ILD ( ILC ) is not exactly same as CLICdp and some modifications are mandatory File name conventions, directory structure, a way to generate samples, number of event types, etc. ILD specific modification is mandatory. Additional modification of server scripts is in progress in order to fully utilize a transformation functionality. ILCDirac 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

4 Akiya Miyamoto, Sim/Rec session @ AWLC17
ILD production chain Step1: SplitStdhep stdhep (~500MB ) Stdhep files are common to SiD and ILD, prepared in advance. Splitted to small pieces before production stdhep-split splitted ( ~ 50x10MB) Step2: MCSimulation by DIRAC MC simulation of each splitted-file - < 1000 events/jobs due to a limit by BG overlay, CPU time & dat size 20~200MB sim 0.5~1GB bkgs MC reconstruction with Overlay Step3: MCReconstruction by DIRAC dst rec DST is too small and too many for tape 1~10MB 1~2GB Merge dsts Step4: MergeDST - Merge files interactively, after rec. - Replicate files to several sites merged-dst Replication ~500MB 2017/06/26 KEK, DESY, IN2P3 …, .. Akiya Miyamoto, Sim/Rec AWLC17

5 Issues: Naming convention
Generator file names: compose of meta information 1 character of key, followed by values connected by “.” E<energy-machine>.P<process_name>.G<program>.e<pole>.p<polp>.I<processID>.stdhep ex: E500-TDR_ws.P2f_z_l.Gwhizard-1_95.eL.pR.I stdhep Sim & Rec files: similar conventions r<rec-config>..E<energy-machine>..I<processID>…<prodID>_<jobID>.slcio Issues <prodID> and <jobID> are attached by DIRAC server. Not follow convention. Other part were generated when the transformation was created. 1 <processID>  2 transformations (sim, rec): many trans. required. Note: ~2000 processes for 250, 350, 500GeV. # of detector models  File name and directory name were constructed by meta values defined in DIRAC catalog  generator meta data are not always consistency 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

6 Akiya Miyamoto, Sim/Rec session @ AWLC17
ILDProductionChainOpt2017.py MCSimulation_ILD, MCReconstruction_Overlay_ILD Output file name is defined from the input file when DIRAC server submit job. FilenameEncoder.py: Utility function decode file name to a python dictionary object : key and value encode a dictionary object to file or directory name Input splited-stdhep files have to be in one directory. Set a directory meta key, ProdID, to select files in the directory disregarding file names. Status and Issues This tools has been used for ILDProduction since April Working well. StdhepSplit_ILD does not work. File sequence number of the original stdhep files are not kept in the DST files. Work in progress For the moment, original stdhep files are splitted locally, then uploaded for SIM & REC. Production of calibration samples, like uds- and single-events Naming convention does not match ILCDirac production system.  UserJob tools have been used. 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

7 Issue: DST merged files and SE
About 50 DST files (~ 10MB) are merged to 1 DST-Merged file (~ 500MB) Served as the main input data for user analysis. Merge procedure based on processID, DST-merged files contain the same processID events. Wishes to keep the event sequence of the original STDHEP files. To start merge, need to wait a completion of all jobs, but job failure happens. For the moment, local interactive jobs are used. Downloading of many files to KEK is time consuming. UserJob can be used. Hope to implement the workflow in ILCDirac. SE to output DST: SE media, tape or disk MCReconstruction of ILCDirac had output REC files (~1GB) and DST files(~10MB) to the same SE, DESY-SRM or KEK-SRM. Physically they are tape device and not good to write many small files (DST). Output directory of DST files were changed to /ilc/prod/ilc/md-dbd.log, which is mapped to disk at DESY-SRM. A special setting available only at DESY-SRM 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

8 Production activities – DBD prod
Frequent request from physics group More statistics of existing samples, using the same software as DBD samples. DBD/Snowmass samples: ~ 500 /1fb for 500 GeV, ~250 1/fb for 250 GeV Recent run scenario : > /fb Existing stdhep files were not fully sim/rec produced Need to start simulation from a middle of files. ILCDirac can not.  split stdhep files locally, then upload for production. Should be avoided in the future production. Additional stdhep files for the existing process, or those for new process LC generator group is producing new stdhep files. The extention of existing samples. ProcessID ? : use the same ID or assign new ID ? the same process_ID will be preferred, but careful in updating the existing meta data and file logging. New processes Respect the naming convention and be friendly to ILCDirac. Directory path: /ilc/prod/ilc/mc-dbd/generated/<energy-machine>/<event_class>/ Avoid too many <event_class> levels. Too many files below <event_class> 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

9 Statistics of ILD Production in last 2 weeks
Status = Running and Done Note: For the moment, very high failure rate (~30%) of simulation jobs are seen probably because the number of events in the last splitted stdhep file mismatches the input specification of Mokka’s g4macro defined by ILCDirac, which defines only one value for all jobs in one production.

10 Production for optimization
Goal Produce O(10) benchmark processes with at least 2 detector models for detector optimization and served as new validated physics samples. Before launch the production ( issues other than production ) DD4Sim and new reconstruction tools with new detector models are yet to be validated. Hopefully to use Whizard2.  Test productions are necessary to estimate needs for CPU, storage and time. Additonal features needed in ILD ILCDirac production: Plan to overlay not only aa_lowpt events, but also GuineaPig pair events. Requires production of multiple detector models, hopefully simultaneously. ILCDirac can not process multiple detector models in one production. Probably splited stdhep should be shared among models. Reduce a work load to create DSTMerge files and logging, which will be multipled by the number of detector models. New directory, ( no more mc-dbd ) /ilc/prod/ilc/mc-opt/ild : For sim, rec, dst-merged files /ilc/prod/ilc/mc-opt.dsk/ild : dst and log files. 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

11 Akiya Miyamoto, Sim/Rec session @ AWLC17
Logging ILCDirac itself have Generator meta information are attached to Sim, Rec, DST and DSTmerged Job log files are stored in ILCDirac, but not in the catalog nor visible without a production role. Log files on the catalog After completion of production, log files are tar-gzipped and save in /ilc/prod/ilc/mc-dbd/log/[sim, rec] Web page by ELOG: There are many ILD colleagues not familiar with DIRAC and do analysis only by local batch server. Meta information on ILCDirac is not always convenient for user analysis.  A web database by ELOG has been developed. Needs to reduce interactive work as much as possible. 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

12 Preparing ELOG for MC-production logging : https://ild. ngt. ndu. ac
Previous production information is also under migration For the moment, the server is shared with BELLE2 DIRAC at NDU. Requesting a university budget for a dedicated server.

13 Akiya Miyamoto, Sim/Rec session @ AWLC17
Summary New production script, ILDProductionChain2017.py, has been developed The output file name is determined from the input file name thus more than one process_ID files can be produced by one transformation. The number of production jobs to process ILC standard generator samples are significantly reduced. The baby siting work as well. Physics samples have been produced continuously with the DBD validated software according to requests from physics group. Further production with DBD soft will continue. The validation of a version of ILCSoft for detector optimization are in progress, wishing to start production soon. Production workflow has to be updated and tested for the coming optimization production. 2017/06/26 Akiya Miyamoto, Sim/Rec AWLC17

14 BACKUP

15 Filename convention class FilenameEncoder(object): '''
A utility class to decode a output filename from an input file name according to the file name convension used by ILD. Once rules are defined, output file, directory and meta values can be generated base on the :mod:`dict` object. See __main__ attached below. Examples to use this class will be found in ILCDIRAC/Core/Utilities/tests/Test_FilenameEncoder.py Following keys are used for ILDProduction [meta] is meta key defined for corresponding directory:: %s: ILDConfig for simulation %r: ILDConfig for Marlin %m: Detector model %E: Energy-Machine %I: GenProcessID %P: ProcessName %C: Event Class %G: Generator program %e: electron polarization or type of photon beam %p: positron polarization or type of photon beam %d: Data type (gen, sim, rec, dst, dstm, .. ) %t: Production ID %T: Directory name for Production ID %n: Generator file number ( could be [0-9]+_[0-9]+, when splitted ) %j: Job number %J: Sub directory ( Job number/ Namely 000, 001, 002, ... ) %F: File type %B: Base directory %D: Upper case Data type. Used for meta value %w: Energy. for meta value %o: Machine parameter. such as TDR_ws for meta value

16 Akiya Miyamoto, Sim/Rec session @ AWLC17
self.rules={} self.rules["gen"]={} self.rules["gen"]["file"] = "E%E.P%P.G%G.e%e.p%p.I%I.n%n.d_%d_%t_%j.%F" self.rules["gen"]["dir"] = "%B/%d/%E/%C/%T/%J" self.rules["gen"]["meta"] = {"%B/%d" :{"Datatype":"%D"}, "%B/%d/%E" :{"Energy":"%w", "MachineParams":"%o"}, "%B/%d/%E/%C" :{"EventClass":"%C"}, "%B/%d/%E/%C/%T" :{"ProdID":"%t"}, "%B/%d/%E/%C/%T/%J":{"kJobNumber":"%J"} } self.rules["sim"]={} self.rules["sim"]["file"] = "s%s.m%m.E%E.I%I.P%P.e%e.p%p.n%n.d_%d_%t_%j.slcio" self.rules["sim"]["dir"] = "%B/%d/%E/%C/%m/%s/%T/%J" self.rules["sim"]["meta"] = {"%B/%d" :{"Datatype":"%D"}, "%B/%d/%E" :{"Energy":"%w", "MachineParams":"%o"}, "%B/%d/%E/%C" :{"EventClass":"%C"}, "%B/%d/%E/%C/%m" :{"DetectorModel":"%m"}, "%B/%d/%E/%C/%m/%s" :{"ILDConfig":"%s"}, "%B/%d/%E/%C/%m/%s/%T" :{"ProdID":"%t"}, "%B/%d/%E/%C/%m/%s/%T/%J":{"kJobNumber":"%J"} } self.rules["rec"]={} self.rules["rec"]["file"] = "r%r.s%s.m%m.E%E.I%I.P%P.e%e.p%p.n%n.d_%d_%t_%j.slcio" self.rules["rec"]["dir"] = "%B/%d/%E/%C/%m/%r/%T/%J" self.rules["rec"]["meta"] = {"%B/%d" :{"Datatype":"%D"}, "%B/%d/%E/%C/%m/%r" :{"ILDConfig":"%r"}, "%B/%d/%E/%C/%m/%r/%T" :{"ProdID":"%t"}, "%B/%d/%E/%C/%m/%r/%T/%J":{"kJobNumber":"%J"} } self.rules["dst"]={} self.rules["dst"]["file"] = "r%r.s%s.m%m.E%E.I%I.P%P.e%e.p%p.n%n.d_%d_%t_%j.slcio" self.rules["dst"]["dir"] = self.rules["rec"]["dir"] self.rules["dst"]["meta"] = self.rules["rec"]["meta"] Akiya Miyamoto, Sim/Rec AWLC17


Download ppt "ILD MCProduction with ILCDirac"

Similar presentations


Ads by Google