Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07.

Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07

2 Outline Data production status Virtualisation status Production database Data production script Web interface Plan forward Proposal for new production directory structure

3 Data production status Data production team: Dag, Bartek, Kevin So far this year 17 mass productions 72 test productions Castor sometimes slow and/or unresponsive Typically lasts for a couple of days, then gets better Also a problem for check of produced data since nsls also hangs Have contacted Castor support, but problem often goes away before properly diagnosed Consider moving to EOS? Irregular batch queue Number of simultaneously running jobs can vary between 10 – 3000 Depends on general batch system load and NA61 relative priority Hard to plan/schedule data production Usually not a big problem, but can be if data urgently needs processing Tried to exclusivelly use xRootd, but nsls- equivalent slow on Castor Have contacted IT  Limitation of current implementation  Fix might be available on the time scale of ~6 months Not a problem on EOS xRootd

4 Virtualisation status Have requested and obtained new “NA61” project on final Lxcloud service Same quota (200VCPUs/instances) as before Access controlled by new e-group “na61-cloud” Migration completed  A few minor issues had to be worked out with IT Latest software versions (13e legacy, v0r5p0) installed on CVMFS Mass production of BeBe160 has started Next step Compare output to Lxbatch production If results are comparable, declare “victory” Scripts for automated data production in “beta” Prototype data production web interface created

5 Production DB Production DB has grown a bit beyond what was originally intended Complicated to access information from Castor and bookkeeping DB Elog data not always consistent (needs to be standardised)  Elog data needed as input for data production (magnetic field) Difficult to work with the production information without a proper SQL database Created a sqlite DB with three tables: run, production and chunkproduction To contain information about all runs as well as all produced chunks  After importing information from bookkeeping DB (elog), productions can be initiated without first query bookkeeping DB  For transferring information back to bookkeeping DB after production, I propose to run SQL queries from bookkeeping DB to retrieve relevant information Elog information imported, can (in principle) be used to select data for processing/analysis (e.g. trigger information?)

6 Production DB schema runs All information for given run Information imported from elog via bookkeeping DB Primary key: run Fields target, beam, momentum, year obtained from elog production All information for given production Combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type should be unique Primary key: production (automatically generated ID) chunkproductions All produced chunks One row per produced chunk Primary key: production, run, chunk Table has potential to contain order of ~10^6 rows

7 runs table Contains all information for given run All elog information for run is imported Elog information are used to fill the fields target, beam, momentum, year and magnet Some normalising required to reduce elog entropy Separate elog_* contain elog entries as extracted from elog Can be used with SQL select queries, but entropy sometimes makes this challenging Can be interesting to add additional fields to table to contain “standardised” elog values, easier to select with SQL Are there any further elog entries that are missing in the table? Have imported all raw files found on Castor About 369 chunks do not have elog entries (mostly test runs) sqlite3 prod.db "select count(*) from runs where target=''" 369

8 runs.beam Contains the beam type for given run Derived from elog_beam_type Not too much entropy (right), but some standardisation required Used to determine “reaction” sqlite3 prod.db "select elog_beam_type, count(*) from runs group by elog_beam_type" |369 Be|1209 Be |15 K-|89 No beam|6 None|314 Pb|31 Pb fragment|223 h|8 h |23 h+|168 h-|221 p|3694 pi-|46

11 productions table A unique combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type is a production Primary key production Auto-generated unique number production: e.g. 1 target: e.g. Be beam: e.g. Be momentum: e.g. 158 year: e.g. 11 key: e.g. 040 legacy: e.g. 13c shine: e.g. v0r5p0 mode: e.g. pp os: e.g. slc5 source: e.g. phys (sim) type: e.g. prod (test) path_in: raw file used path path_out: path for output files path_layout: how output files are stored under path_out description: free text

12 chunkproductions table Stores all chunks produced Associated to production, run and chunk Has potential to contain order of 10^6 rows By far largest table in DB Potential performance issue Only using numerical values production: e.g. 1 run: e.g. 123456 chunk e.g. 123 rerun: number of times chunk has failed and been reprocessed status: waiting / processing / checking / ok / failed (numeric values) size_*: size of output files error_*: number of errors of given type found in log file from latest processing

13 Magnetic field Originally planed to store this information in separate field in run table (extracted from elog) Needed for KEY5 and residual corrections However, Seweryn has now added this information in same database as global key (but not part of global key) Working on integrating this information into production scripts Will make automatic data production much simpler

14 Database Currently using sqlite Pro:  DB contained in single file on file system  No need to set up data base  Everybody can easily access it with custom SQL quires  Open format/code, we “really” own the data Con  Not sure if performance will be an issue  Backup via normal file system backup Have also tried central Oracle database (na61_cloud@pdbr1)na61_cloud@pdbr1 Pro  Better performance  Better backup  Better functionality Con  More complicated to access for everybody  Did not notice performance differences with current DB size  May be forced to follow Oracle update cycles, potential data preservation issue SQL used for queries are compatible with both sqlite and Oracle  Exception: creation of tables Should be possible to move to Oracle if performance becomes an issue All in all I feel sqlite is the best choice until it is proved to have performance issues

15 Automated data production script commands./prodna61-produce.sh Usage:./prodna61-produce.sh one of: reactions - list all reactions in database productions - list all productions in database./prodna61-produce.sh one of: regreaction - register all reactions found at path_in in database./prodna61-produce.sh [ ] one of: regproduction - register new production in database produce - start new production check - check production for errors and update database summary - production summary reproduce - reprocess chunks with with errors for production okchunks - list all OK chunks for production

16 Data production command usage prodna61-produce.sh regreaction /afs/cern.ch/11/Be/Be160 Will register all runs found at the path in the runs table Obtains run information from bookkeeping database/elog Only has to be done one time per reaction (path) prodna61-produce.sh regproduction Be Be 158 11 040 13e v0r5p0 pp phys def “A new prod.” Creates a new production on the production table, and inserts a new row in the chunkproductions table for each of the chunks of the reaction “def” means “use default value, can be used for all parameters except for first  Typically takes the latest known value for parameter Has to be done when a new reaction is to be processed prodna61-produce.sh reactions List all reactions registered in database prodna61-produce.sh productions List all productions registered in database prodna61-produce.sh produce Be Be 158 11 040 13e v0r5p0 pp phys Create job files and submit jobs for reaction prodna61-produce.sh check Be Be 158 11 040 13e v0r5p0 pp phys Check which chunks were processed OK, and which need to be reprocessed prodna61-produce.sh summary Be Be 158 11 040 13e v0r5p0 pp phys Write a summary of the outcome of the check command prodna61-produce.sh reproduce Be Be 158 11 040 13e v0r5p0 pp phys Resubmit the chunks the check command found to be not OK prodna61-produce.sh okchunks Be Be 158 11 040 13e v0r5p0 pp phys Write a list of chunks that are OK after the check command

17 Data production script status Can in most cases produce data Further work on standardisation on elog data needed Sometimes reactions have “specialities” that has to be taken into account Lxbatch and CernVM versions have diverted a bit, need to be (re-)unified Could be nice to use key-value pairs of parameters Need to add possibility to process range of runs (test productions) Not expected to be difficult Plan to soon use it for mass productions (both Lxbatch and CernVM)

18 Web interface (prototype)

19 Web interface (prototype) Web interface to production DB http://na61cld.web.cern.ch/na61cld/cgi- bin/start?reaction=Be|Be|158|11 http://na61cld.web.cern.ch/na61cld/cgi- bin/start?reaction=Be|Be|158|11 Experimenting with best interface/usability for different use cases Make it “intuitive” and easy to use Have not put effort into making it “look” good  But should be easy to do, relies on style sheets (CSS) for design Think “reaction” and “production” are the main entities to build around Currently can only display information Will add ability to log in for starting productions, etc  CERN single-sign-on probably best option Trying to create script that will import information about already existing productions into database Current implementation is slow since DB is open/closed for every query, but with proper language bindings DB can be kept open for multiple queries

20 General plan forward Complete CernVM test BeBe160 test production Finish outstanding issues with automatic production script Finalise web interface for data production Add functionality, improve performance

21 Proposal for production directory structure (after moving to Shine) Preferably, all unique production parameters should be encoded in the path to avoid conflicts A deep directory structure is however undesirable Proposal: divide directory path into four levels: “type”, “reaction”, “reconstruction conditions” and “file type” /castor/cern.ch/na61/ / _ _ _ / _ _ _ _ / / run- x. Examples: /castor/cern.ch/na61/ prod/ Be_Be_158_11/ 040_v0r5p0_pp_slc5_phys/ shoe.root/run-012345x678.shoe.root /castor/cern.ch/na61/ test/ LHT_p_158_11/ 020_v0r5p0_pp_cvm2_sim/ log.bz2/run-987654x321.log.bz2 Advantages: Separated the test productions from the “real” productions Easier to get overview  nsls /castor/cern.ch/prod will show all existing reactions  nsls /castor/cern.ch/prod/ will show all productions for reaction  Parameters are organised in order of “importance”

Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07.

Similar presentations

Presentation on theme: "Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07.

Similar presentations

Presentation on theme: "Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07."— Presentation transcript:

Similar presentations

About project

Feedback