Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07.

Similar presentations


Presentation on theme: "Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07."— Presentation transcript:

1 Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07

2 2 Outline Data production status Virtualisation status Production database Data production script Web interface Plan forward Proposal for new production directory structure

3 3 Data production status Data production team: Dag, Bartek, Kevin So far this year 17 mass productions 72 test productions Castor sometimes slow and/or unresponsive Typically lasts for a couple of days, then gets better Also a problem for check of produced data since nsls also hangs Have contacted Castor support, but problem often goes away before properly diagnosed Consider moving to EOS? Irregular batch queue Number of simultaneously running jobs can vary between 10 – 3000 Depends on general batch system load and NA61 relative priority Hard to plan/schedule data production Usually not a big problem, but can be if data urgently needs processing Tried to exclusivelly use xRootd, but nsls- equivalent slow on Castor Have contacted IT  Limitation of current implementation  Fix might be available on the time scale of ~6 months Not a problem on EOS xRootd

4 4 Virtualisation status Have requested and obtained new “NA61” project on final Lxcloud service Same quota (200VCPUs/instances) as before Access controlled by new e-group “na61-cloud” Migration completed  A few minor issues had to be worked out with IT Latest software versions (13e legacy, v0r5p0) installed on CVMFS Mass production of BeBe160 has started Next step Compare output to Lxbatch production If results are comparable, declare “victory” Scripts for automated data production in “beta” Prototype data production web interface created

5 5 Production DB Production DB has grown a bit beyond what was originally intended Complicated to access information from Castor and bookkeeping DB Elog data not always consistent (needs to be standardised)  Elog data needed as input for data production (magnetic field) Difficult to work with the production information without a proper SQL database Created a sqlite DB with three tables: run, production and chunkproduction To contain information about all runs as well as all produced chunks  After importing information from bookkeeping DB (elog), productions can be initiated without first query bookkeeping DB  For transferring information back to bookkeeping DB after production, I propose to run SQL queries from bookkeeping DB to retrieve relevant information Elog information imported, can (in principle) be used to select data for processing/analysis (e.g. trigger information?)

6 6 Production DB schema runs All information for given run Information imported from elog via bookkeeping DB Primary key: run Fields target, beam, momentum, year obtained from elog production All information for given production Combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type should be unique Primary key: production (automatically generated ID) chunkproductions All produced chunks One row per produced chunk Primary key: production, run, chunk Table has potential to contain order of ~10^6 rows

7 7 runs table Contains all information for given run All elog information for run is imported Elog information are used to fill the fields target, beam, momentum, year and magnet Some normalising required to reduce elog entropy Separate elog_* contain elog entries as extracted from elog Can be used with SQL select queries, but entropy sometimes makes this challenging Can be interesting to add additional fields to table to contain “standardised” elog values, easier to select with SQL Are there any further elog entries that are missing in the table? Have imported all raw files found on Castor About 369 chunks do not have elog entries (mostly test runs) sqlite3 prod.db "select count(*) from runs where target=''" 369

8 8 runs.beam Contains the beam type for given run Derived from elog_beam_type Not too much entropy (right), but some standardisation required Used to determine “reaction” sqlite3 prod.db "select elog_beam_type, count(*) from runs group by elog_beam_type" |369 Be|1209 Be |15 K-|89 No beam|6 None|314 Pb|31 Pb fragment|223 h|8 h |23 h+|168 h-|221 p|3694 pi-|46

9 9 runs.target Contains the target type of given run Derived from elog_target_type Some entropy (right), standardisation required Field used to determine “reaction” sqlite3 prod.db "select elog_target_type, count(*) from runs group by elog_target_type" |369 2C target IN|116 2C target OUT|34 Be target IN|1003 Be target IN |11 Be target OUT|203 Be target OUT |4 C target IN|844 C target OUT|68 C_2cm target IN|106 C_2cm target OUT|28 C_2cm_target_IN|12 C_2cm_target_OUT|1 Empty target|9 LH full target|20 LH target EMPTY|80 LH target Empty|2 LH target FULL|173 LH target EMPTY|362 LH target FULL|1638 LH_target_EMPTY|1 LH_target_FULL|2 Long target|82 None|585 Pb|4 Pb brick IN|33 Pb target IN|500 Pb target OUT|121 Target holder|1 targetholder IN|4

10 10 runs.momentum Contains beam momentum for given run Derived from elog_beam_momentu m Some entropy (right), but not too much Assume 30GeV!=31GeV Assume 75GeV!=80GeV Field used to determine reaction sqlite3 prod.db "select elog_beam_momentu m, count(*) from runs group by elog_beam_momentu m" |372 0 GeV/c|313 10 GeV/c|15 100 GeV/c|14 120 GeV/c|161 13 GeV/c|1137 13 GeV/c |15 158 GeV/c|2387 20 GeV/c|371 30 GeV/c|193 31 GeV/c|558 31GeV/c|345 350 GeV/c|62 40 GeV/c|83 40GeV/c|114 75 GeV/c|13 80 GeV/c|263

11 11 productions table A unique combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type is a production Primary key production Auto-generated unique number production: e.g. 1 target: e.g. Be beam: e.g. Be momentum: e.g. 158 year: e.g. 11 key: e.g. 040 legacy: e.g. 13c shine: e.g. v0r5p0 mode: e.g. pp os: e.g. slc5 source: e.g. phys (sim) type: e.g. prod (test) path_in: raw file used path path_out: path for output files path_layout: how output files are stored under path_out description: free text

12 12 chunkproductions table Stores all chunks produced Associated to production, run and chunk Has potential to contain order of 10^6 rows By far largest table in DB Potential performance issue Only using numerical values production: e.g. 1 run: e.g. 123456 chunk e.g. 123 rerun: number of times chunk has failed and been reprocessed status: waiting / processing / checking / ok / failed (numeric values) size_*: size of output files error_*: number of errors of given type found in log file from latest processing

13 13 Magnetic field Originally planed to store this information in separate field in run table (extracted from elog) Needed for KEY5 and residual corrections However, Seweryn has now added this information in same database as global key (but not part of global key) Working on integrating this information into production scripts Will make automatic data production much simpler

14 14 Database Currently using sqlite Pro:  DB contained in single file on file system  No need to set up data base  Everybody can easily access it with custom SQL quires  Open format/code, we “really” own the data Con  Not sure if performance will be an issue  Backup via normal file system backup Have also tried central Oracle database (na61_cloud@pdbr1)na61_cloud@pdbr1 Pro  Better performance  Better backup  Better functionality Con  More complicated to access for everybody  Did not notice performance differences with current DB size  May be forced to follow Oracle update cycles, potential data preservation issue SQL used for queries are compatible with both sqlite and Oracle  Exception: creation of tables Should be possible to move to Oracle if performance becomes an issue All in all I feel sqlite is the best choice until it is proved to have performance issues

15 15 Automated data production script commands./prodna61-produce.sh Usage:./prodna61-produce.sh one of: reactions - list all reactions in database productions - list all productions in database./prodna61-produce.sh one of: regreaction - register all reactions found at path_in in database./prodna61-produce.sh [ ] one of: regproduction - register new production in database produce - start new production check - check production for errors and update database summary - production summary reproduce - reprocess chunks with with errors for production okchunks - list all OK chunks for production

16 16 Data production command usage prodna61-produce.sh regreaction /afs/cern.ch/11/Be/Be160 Will register all runs found at the path in the runs table Obtains run information from bookkeeping database/elog Only has to be done one time per reaction (path) prodna61-produce.sh regproduction Be Be 158 11 040 13e v0r5p0 pp phys def “A new prod.” Creates a new production on the production table, and inserts a new row in the chunkproductions table for each of the chunks of the reaction “def” means “use default value, can be used for all parameters except for first  Typically takes the latest known value for parameter Has to be done when a new reaction is to be processed prodna61-produce.sh reactions List all reactions registered in database prodna61-produce.sh productions List all productions registered in database prodna61-produce.sh produce Be Be 158 11 040 13e v0r5p0 pp phys Create job files and submit jobs for reaction prodna61-produce.sh check Be Be 158 11 040 13e v0r5p0 pp phys Check which chunks were processed OK, and which need to be reprocessed prodna61-produce.sh summary Be Be 158 11 040 13e v0r5p0 pp phys Write a summary of the outcome of the check command prodna61-produce.sh reproduce Be Be 158 11 040 13e v0r5p0 pp phys Resubmit the chunks the check command found to be not OK prodna61-produce.sh okchunks Be Be 158 11 040 13e v0r5p0 pp phys Write a list of chunks that are OK after the check command

17 17 Data production script status Can in most cases produce data Further work on standardisation on elog data needed Sometimes reactions have “specialities” that has to be taken into account Lxbatch and CernVM versions have diverted a bit, need to be (re-)unified Could be nice to use key-value pairs of parameters Need to add possibility to process range of runs (test productions) Not expected to be difficult Plan to soon use it for mass productions (both Lxbatch and CernVM)

18 18 Web interface (prototype)

19 19 Web interface (prototype) Web interface to production DB http://na61cld.web.cern.ch/na61cld/cgi- bin/start?reaction=Be|Be|158|11 http://na61cld.web.cern.ch/na61cld/cgi- bin/start?reaction=Be|Be|158|11 Experimenting with best interface/usability for different use cases Make it “intuitive” and easy to use Have not put effort into making it “look” good  But should be easy to do, relies on style sheets (CSS) for design Think “reaction” and “production” are the main entities to build around Currently can only display information Will add ability to log in for starting productions, etc  CERN single-sign-on probably best option Trying to create script that will import information about already existing productions into database Current implementation is slow since DB is open/closed for every query, but with proper language bindings DB can be kept open for multiple queries

20 20 General plan forward Complete CernVM test BeBe160 test production Finish outstanding issues with automatic production script Finalise web interface for data production Add functionality, improve performance

21 21 Proposal for production directory structure (after moving to Shine) Preferably, all unique production parameters should be encoded in the path to avoid conflicts A deep directory structure is however undesirable Proposal: divide directory path into four levels: “type”, “reaction”, “reconstruction conditions” and “file type” /castor/cern.ch/na61/ / _ _ _ / _ _ _ _ / / run- x. Examples: /castor/cern.ch/na61/ prod/ Be_Be_158_11/ 040_v0r5p0_pp_slc5_phys/ shoe.root/run-012345x678.shoe.root /castor/cern.ch/na61/ test/ LHT_p_158_11/ 020_v0r5p0_pp_cvm2_sim/ log.bz2/run-987654x321.log.bz2 Advantages: Separated the test productions from the “real” productions Easier to get overview  nsls /castor/cern.ch/prod will show all existing reactions  nsls /castor/cern.ch/prod/ will show all productions for reaction  Parameters are organised in order of “importance”


Download ppt "Data production and virtualisation status Dag Toppe Larsen Wrocław, 2013-10-07."

Similar presentations


Ads by Google