Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data production using CernVM and LxCloud Dag Toppe Larsen Warsaw, 2014-02-11.

Similar presentations


Presentation on theme: "Data production using CernVM and LxCloud Dag Toppe Larsen Warsaw, 2014-02-11."— Presentation transcript:

1 Data production using CernVM and LxCloud Dag Toppe Larsen Warsaw, 2014-02-11

2 2 Outline ● CernVM/LxCloud data production ● Automatic data production ● Data production management ● Production database ● Web interface

3 3 CernVM cluster at LxCloud ● Requested and obtained new “NA61” project on final production Lxcloud service ● Same quota (200VCPUs/instances) as before ● Access controlled by new e-group “na61-cloud” ● Migration completed ● Software currently used: ● Legacy: 13e ● Shine: v0r5p0 ● Software, databases & calibration data distributed via CvmFS ● Mass production of BeBe160 (11_040) to compare/validate to latest LxBatch mass production ● If results are “similar”, CernVM validation is considered successful

4 4 Test production ● Recently, a new BeBe160 test production was submitted to CernVM running on LxCloud ● Job description file created by automatic data production manager – But manually submitted to CernVM cluster ● Output written to /castor/cern.ch/na61/prod/Be_Be_158_11 /040_13e_v0r5p0_pp_cvm2_phys ● To be compared to /castor/cern.ch/na61/11/prod/13E040 ● (Same legacy, shine, global key, mode) ● For some reason, legacy software does not enter event loop (next slide) ● Shine part of processing appear to work OK though

5 5 CernVM production error ● Should have got: Unmarking... DSPACK 1.602, 1 Aug 2007 (dswrite, server: dag_28311_lxplus0099) Staging dataset: bos:/afs/cern.ch/work/d/dag/test/run-014923x023.bos DSPACK 1.602, 1 Aug 2007 (dsopen, server: dag_28311_lxplus0099) Input file: /tmp/R.28582.fifo Read definitions DSPACK 1.602, 1 Aug 2007 (dsread, server: dag_28311_lxplus0099) Read one event ________________________________________________________________________________ Run: 14923 Event: 1896087552 ________________________________________________________________________________ ● But got: Unmarking... DSPACK 1.602, 1 Aug 2007 (dswrite, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) Staging dataset: bos:/home/condor/execute/dir_31365/run-014923x028/run-014923x028.bos DSPACK 1.602, 1 Aug 2007 (dsopen, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) Input file: /tmp/R.31686.fifo DS_OPEN_TOOL Error: No definition block Finishing.... DSPACK 1.602, 1 Aug 2007 (dskill, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) ● What does “DS_OPEN_TOOL Error: No definition block” mean? ● Did not get this error when producing data on CernVM in the past ● If the exact same production script is ran on LxPlus, using software from CvmFS (also mounted on Lxplus/batch), it works fine ● Some missing file (that is found on AFS in the case of LxPlus)?

6 6 Automatic data production

7 7 Production DB ● Production DB has grown a bit beyond what was originally intended ● Difficult to work with the production information without a proper SQL database ● Tedious to access information from Castor and bookkeeping DB ● Elog data not always consistent (needed to be standardised) – Elog data needed as input for data production (magnetic field) ● Created a sqlite DB with three tables: run, production and chunkproduction ● To contain information about all runs as well as all produced chunks – After importing information from bookkeeping DB (elog), productions can be initiated without first query bookkeeping DB – For transferring information back to bookkeeping DB after production, propose to run SQL queries from bookkeeping DB to retrieve relevant information ● Elog information imported can be used to select data for processing/analysis via web interface (target in/out, trigger, run length, “ok”, etc.)

8 8 Production DB schema ● runs ● All information for given run ● Primary key: run ● Fields target, beam, momentum, define reaction run belongs to ● Information imported from elog via bookkeeping DB ● Most fields are obtained by “data- mining” elog ● Field elog contains original elog entry ● production ● All information for given production ● Combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type should be unique ● Primary key: production (automatically generated ID) ● chunkproductions ● All produced chunks ● One row per produced chunk ● Primary key: production, run, chunk ● Table has potential to contain order of ~10^6 rows

9 9 runs table ● Contains all information for given run ● Fields beam, target, momentum & year define which reaction run belongs to ● Information imported from eLog via bookkeeping database ● All eLog information for all runs is imported ● Elog information is processed and stored in separate fields ● Including fields defining the reaction ● Original eLog entry also stored to allow later reprocessing ● Field “ok” intended to mark runs that are “ok” or not ● To be set manually

10 10 chunkproductions table ● Stores all chunks produced ● Associated to production, run and chunk ● Has potential to contain order of 10^6 rows ● By far largest table in DB ● Potential performance issue ● Only using numerical values ● production: e.g. 1 ● run: e.g. 123456 ● chunk e.g. 123 ● rerun: number of times chunk has failed and been reprocessed ● status: waiting / processing / checking / ok / failed (numeric values) ● size_*: size of output files ● error_*: number of errors of given type found in log file from latest processing

11 11 productions table ● A unique combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type is a production ● Primary key production ● Auto-generated unique number ● production: e.g. 1 ● target: e.g. Be ● beam: e.g. Be ● momentum: e.g. 158 ● year: e.g. 11 ● key: e.g. 040 ● legacy: e.g. 13c ● shine: e.g. v0r5p0 ● mode: e.g. pp ● os: e.g. slc5 ● source: e.g. phys (sim) ● type: e.g. prod (test) ● path_in: raw file used path ● path_out: path for output files ● path_layout: how output files are stored under path_out ● description: free text

12 12 Automated data production system commands./na61prod Usage:./na61prod one of: elogImport - import all elog information from bookkeeping elogConvert - process elog information and fill database setProduction - register new production in database produce - start new production check - check, resubmit and update database for errors setRunOk - mark runs as OK any of: runs - list and/or range of runs [all] type - prod or test [prod] beam - beam type No default value target - target type No default value momentum - beam momentum No default value year - year of data taking No default value key - global key (no year) [latest] legacy - version of legacy software [latest] shine - version of Shine software [latest] mode - pp or pA [pp] os - cvm2 or slc6 [cvm2] source - phys or sim [phys] path_in - path to data (for sim. Data) [root://castorpublic.cern.ch//castor/cern.ch/na61] comment - free-text production comment [] ok - 0 or 1 [1] one of: setNameValue - set possible value for key-value pair any of: name - type, legacy, shine, mode, os, source, path_in, path_out or path_layout value - value corresponding to name [] pref - preferred value, 0 or 1 [1] The system will choose [default] values for keys that are not set.

13 13 Data production command usage ● na61prod command=elogImport runs=7000- 18000 ● Will obtain eLog information for all runs in this range ● na61prod command=elogConvert runs=all ● Process imported eLog information and fill relevant fields in runs table ● na61prod command=setProduction beam=Be target=Be momentum=158 year=11 comment=”New TPC calibration data.” ● Registers a new production in the production table using default values ● na61prod command=setProduction beam=Be target=Be momentum=158 year=11 key=044 prod=test os=slc6 mode=pA path_in=root://castorpublic.cern.ch//castor/na61 /path/to/simulated/ data source=sim runs=14866-14880,14888,14890,15000-15011 comment=”Check of simulated data before mass production.” ● Registering a new production with less default values... ● na61prod command=produce beam=Be target=Be momentum=158 year=11 ● Initiates automatic submitting of reaction (must already be registered using setProduction command) ● na61prod command=check ● Checks completed jobs for errors, resubmits as needed, updates database. Intended to be run from cron job. ● na61prod command=setNameValue name=shine value=v0r8p0 ● Registers a new version of Shine in the name/value database, and makes it default choice for future productions.

14 14 Automatic data production manager status ● Can generate the files needed for submitting jobs (both LxBatch & CernVM) ● Now uses native SqLite language bindings for better performance ● Named value key pair table implemented to store allowed/default values for production parameters ● Part being worked on: ● Automatic submitting/checking/resubmitting jobs ● Not “difficult”, but rather “tedious” ● Plan to soon use it for productions (both LxBatch and CernVM)

15 15 Web interface

16 16 Web interface ● Web interface to production DB ● http://cern.ch/na61cld/cgi-bin/prod http://cern.ch/na61cld/cgi-bin/prod ● Experimenting with best interface/usability for different use cases ● Currently can only display information ● Will add ability to log in for starting productions, etc ● Working on script that will import information about already existing productions into database ● Can generate list of chunks from set of filtering criteria ● Link below will generate a list of chunks for BeBe13 (2012), marked as “ok”, with target in and GTPC on for production with global key “12_021”, legacy “13f”, shine “v0r6p0”, mode “pp” and from physics data, and data in mini shoe format from EOS ● http://cern.ch/na61cld/cgi- bin/chunkli?productions.key=021&productions.lega cy=13f&productions.mode=pp&productions.os=slc6 &productions.shine=v0r6p0&productions.source=ph ys&productions.type=prod&runs.beam=Be&runs.m omentum=13&runs.ok=1&runs.status_gtpc=1&run http://cern.ch/na61cld/cgi- bin/chunkli?productions.key=021&productions.lega cy=13f&productions.mode=pp&productions.os=slc6 &productions.shine=v0r6p0&productions.source=ph ys&productions.type=prod&runs.beam=Be&runs.m omentum=13&runs.ok=1&runs.status_gtpc=1&run

17 17 General plan forward ● Complete CernVM test BeBe160 test production ● Finish the automatic submission/checking/resubmit of jobs for automatic data production manager ● Add possibility to submit jobs from web interface

18 18 Proposal to migrate software, calib- ration data & databases to CvmFS ● CvmFS is based on the HTTP protocol ● Distributed globally via an hierarchy of cache servers ● Files are compressed on server side ● Downloaded on-demand, decompressed and semi- permanently cached on client side – A bit slow first time a software is ran (to allow for software download), but at native speeds at later runs ● Originally developed to distribute software to CernVM virtual machines ● Has gained popularity on conventional (non- virtualised) computing clusters as well – Is now the preferred way of distributing software for e.g. ATLAS & CMS ● Proposal: migrate NA61/SHINE software, calibration data & databases to CvmFS, also for LxBatch/Plus processing ● Tedious to maintain two parallel installations on both CvmFS & AFS ● CvmFS is mounted on all LxPlus/Batch machines: /cvmfs/na61.cern.ch/ ● External dependencies (e.g. ROOT, etc.) is available from /cvmfs/sft.cern.ch/ ● Will make it possible to use identical production scripts on CernVM & LxBatch ● Will globally distribute NA61 software, i.e. NA61 processing at e.g. Belgrade will not require a separate NA61 software installation, but mount /cvmfs/na61.cern.ch/ directly ● CvmFS can be mounted directly on any computer (both virtualised and non-virtualised) by installing the FUSE drivers, allowing “official” installation of Shine to run directly on laptops anywhere


Download ppt "Data production using CernVM and LxCloud Dag Toppe Larsen Warsaw, 2014-02-11."

Similar presentations


Ads by Google