DUNE Software and Computing News and Announcements Tom Junk DUNE Software and Computing General Meeting February 2, 2016
New Web Sites dune-data.fnal.gov Monte Carlo – Challenge 5.0 and future MC MC samples and tiers Data Files from the 35-ton prototype File list – automatically updated from file transfer script samweb usage tips – tells you how to access files! dune-young.hep.net Content copied from lbne-young.hep.net (still not up to date) lbne-dqm.fnal.gov Online and Nearline monitoring for 35-ton Tom Junk | DUNE S&C General News2
New Build Node dunebuild01.fnal.gov 16 Cores! (AMD Opteron 6320) 32 GB of RAM, 5 GB of swap To be used for building code only (we’ll watch for misuse) mrb i –j16 now gives you a big boost in speed dCache disks are not mounted /dune/data and /dune/data2 however are still mounted. /build/ has 2.8 TB in it. Not clear how to use this effectively. Let Tom know if you need something different on it. 16 Cores was chosen based on Lynn Garren’s build speed test: With builds using BlueArc (/dune/app), more cores than 16 gives diminishing returns in speed due to disk i/o bottlenecks. That and the fact that machines with > 16 cores are even less available than the one we got with 16 cores Tom Junk | DUNE S&C General News3
New Redmine Sites dunebsm Exotic Physics with DUNE dunefgt Fine-Grained Tracker dunelbl Long-Baseline Physics WG dunendk Nucleon Deay HighLAND Analysis Tool WA105 Dual-Phase protoDUNE Tom Junk | DUNE S&C General News4
CILogon Certificates Replacing OSG Grid certificates – DUNE VO user entries with OSG Grid Certificates now given entries for CILogon certificates Current OSG Grid certificates remain valid until their expiration – no need to hurry and get a replacement CILogon certificate but the next time it’s refreshed there will be a new procedure. Eileen and Anne have contacted certificate users of the docdb’s and gave instructions for obtaining and using CILogon certificates with the docdb’s. CILogon will replace KCA certificates too. jobsub client called kx509 to generate short-lived certificates using the user’s Kerberos ticket. other uses, like SAM, required the user to execute kx509 or get- cert.sh (which calls kx509) to get a certificate. Jobsub use of CILogon “to be transparent to the users” Tom Junk | DUNE S&C General News5
AFS at Fermilab is being shut down Feb. 25, 2016 Web sites at /afs/fnal.gov/files/expww are migrated to the NFS storage area /web/sites/. Available on FNALU and dunegpvm01 (but not other dunegpvm’s) Home areas in /afs/fnal.gov/home/room[1,2,3]/username being replaced with other networked storage. I was never fond of our AFS home areas anyhow Very small quotas in the home area: 500 MB (!) Authentication token which expires after 26 hours has caused user confusion. It has its own syntax for managing. Want to know your quota? fs lq. Not available on grid workers (wouldn’t want that anyhow for the replacement.) Backups in /afs/fnal.gov/files/backup/home documentation (becoming irrelevant) Tom Junk | DUNE S&C General News6
New Home Areas and Web sites Used to have personal “professional” web areas in ~/public_html/index.html for example. Accessed via Directory listings over http disabled without a special Service Desk request. Now there are NFS web areas dunegpvm*:/publicweb/ / where is the first letter of your user ID (= kerberos principal) Backups in /publicweb/.snapshot in case you accidentally delete something Home area snapshots and backups in the post-AFS era to be defined and documented Tom Junk | DUNE S&C General News7
lbnegpvm*.fnal.gov dunegpvm*.fnal.gov Users were in the lbne group active users or recently active users given new accounts in the dune group New dunegpvm11 spun up with new group and new user list. No /lbne/data, /lbne/data2, /lbne/app mounts on new dune machine. Same areas are mounted under /dune Still have /pnfs/lbne mounted (needed as some files are accessible only that way). Same with /scratch/lbne Current status: migrated lbnegpvm06 – lbnegpvm10 to dunegpvm machines. Gave back dunegpvm11. lbnegpvm01 through lbnegpvm05 (with dunegpvm convenience names) being converted as I write this. Finding missing things (like dCache mounts) and iterating with the Service Desk Tom Junk | DUNE S&C General News8
BlueArc Dismount on Grid Workers Affects us in particular! /lbne/data, /lbne/data2 not mounted on dunegpvm6-10 machines, but still mounted on grid worker nodes. /dune/data, /dune/data2 not mounted on grid worker nodes (!). These mount points were made after the decision to migrate away from BlueArc on the grid was taken. Two ways to store your data: ifdh cp it to dCache: /pnfs/dune/persistent/users and /pnfs/dune/scratch/users Ask about tape-backed space! (We prefer SAM so the files won’t get lost) ifdh cp the files to BlueArc (many people still do this). This too will be disabled! End of 2016 shutdown! Tom Junk | DUNE S&C General News9
Metadata Changes Existing data tiers: raw simulated detector-simulated full-reconstructed New data tier: sliced The slicer/stitcher input source only works on raw data – limited number of data products it has to know how to slice and stitch. A new problem: The slicer/stitcher reformats events based on a software trigger definition. Do we need to store which trigger def was used in metadata? Tack it on the end of the detector type string? Tom Junk | DUNE S&C General News10
A Good Run List Proposal So far only 35-ton has data and thus needs a good-run list. One person’s bad data is another person’s good data. Alex Himmel suggested it would make SAM dataset queries simpler if good-run status were part of the metadata Can request a new good-run metadata field: arbitrary string so we can encode various kinds of goodness or badness. CDF had good run lists that were distributed as root trees and text files. Didn’t make sense to limit public datasets to a particular good- run set because runs would be re-classified and it takes a long time to reprocess everything. Need curation of the good run list. Who decides? Shift tool? Data Quality Team needed to make judgments. For 35-ton, we probably want analyzers to be tightly coupled to the data taking. Label special data runs for special analyses and record run numbers and ranges that are intended for subsequent analyses Tom Junk | DUNE S&C General News11
FIFE News Summer 2016 FIFE Workshop during the week of June 20 Fermilab GPGrid new features: partitionable slots, priority queueing instead of quotas: ings%20Library/CSLiaison_01_13_16.pdf&action=default Job Efficiency Links Tom Junk | DUNE S&C General News12
Job Resource Limits Enforced on FNAL GPGrid Last year the grid was more forgiving about going over time limits (not CPU, wall-clock time is what counts) virtual memory size disk space used But now these limits are enforced. See the page For examples of how to ask for resources and links to more documentation. What happens if your job goes over the limit? It doesn’t get killed, but rather gets Held. To find out what went wrong, jobsub_q --held --user= You can use fifemon.fnal.gov to monitor how many jobs you have in each state. Policy may be different on non-FNAL OSG sites Tom Junk | DUNE S&C General News13
Very minor... Users in the LBNE VO are getting s saying that their AUP (Acceptable Use Policy) signatures are expiring (1 year). Users can ignore these and use the DUNE VO instead Tom Junk | DUNE S&C General News14
/dune/app Filled up briefly yesterday Tom Junk | DUNE S&C General News15
Reminder: DAQ Workshop at CERN Dates: Feb at CERN DAQ Hardware, Software, and Offline Computing Infrastructure Ask Maxine about site access for non-CERN Tom Junk | DUNE S&C General News16