Mu2e and Muon g-2: FIFE Plans and Issues Rob Kutschke and Adam Lyon FIFE Workshop June 15, 2013 Mu2e-doc-4269-v1.

Slides:



Advertisements
Similar presentations
TOPIC FOR THE DISCUSSION SECTION AUTHOR ONE AUTHOR TWO AUTHOR THREE FIFE Workshop Template Service Name.
Advertisements

1Malcolm Ellis - Tracker Meeting - 18th May 2006 Reconstruction Plans  Staged process of building up Reconstruction over the rest of this year  Will.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 CSSE 477 – A bit more on Performance Steve Chenoweth Friday, 9/9/11 Week 1, Day 2 Right – Googling for “Performance” gets you everything from Lady Gaga.
SQL Reporting II Another tool in our IT toolbox. A free with Microsoft SQL that empowers a few levels of users. By Bryan Yates - Programmer.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
Common Experiment Workbook Status Report Anne Heavey, Rob Kutschke May 22, 2013.
How Do I Find a Job to Apply to?
Introduction to UNIX/Linux Exercises Dan Stanzione.
#RefreshCache Redmine Learn why RefreshCache is the community developer's new best friend. Daniel Hazelbaker Information Technology Director.
1 port BOSS on Wenjing Wu (IHEP-CC)
COMP Introduction to Programming Yi Hong May 13, 2015.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
1 Productivity Benefits of the Instant Data Warehouse 27/7/ As more and more large organisations use the Instant Data Warehouse we are starting.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
1 Instant Data Warehouse Utilities Extended (Again!!) 14/7/ Today I am pleased to announce the publishing of some fantastic new functionality for.
David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,
Welcome to CS 115! Introduction to Programming. Class URL Write this down!
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Mtivity Client Support System Quick start guide. Mtivity Client Support System We are very pleased to announce the launch of a new Client Support System.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
Alliance Alliance Performance Status - CREQ Régis ELLING July 2011.
MIS 7003 MBA Core Course in MIS Professor Akhilesh Bajaj The University of Tulsa Introduction to S/W Engineering © All slides in this presentation Akhilesh.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
TB1: Data analysis Antonio Bulgheroni on behalf of the TB24 team.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
UHCS 2005, slide 1 About Continuous Integration. UHCS 2005, slide 2 Why do you write Unit Test ? Improve quality/robustness of your code Quick feedback.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
1 LPC Data Organization Robert M. Harris Fermilab With contributions from Jon Bakken, Lothar Bauerdick, Ian Fisk, Dan Green, Dave Mason and Chris Tully.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
TANYA LEVSHINA Monitoring, Diagnostics and Accounting.
DUNE Software and Computing News and Announcements Tom Junk DUNE Software and Computing General Meeting February 2, 2016.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
2-December Offline Report Matthias Schröder Topics: Monte Carlo Production New Linux Version Tape Handling Desktop Computers.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Event Mixing Rob Kutschke, Fermilab Software and Simulation Meeting October 5, 2011 Mu2e-doc-1874-v1.
Recent Evolution of the Offline Computing Model of the NOA Experiment Talk #200 Craig Group & Alec Habig CHEP 2015 Okinawa, Japan.
Fabric for Frontier Experiments at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab ISGC – Thu,
Analysis Model Zhengyun You University of California Irvine Mu2e Computing Review March 5-6, 2015 Mu2e-doc-5227.
Mu2e FY15 and FY16 Computing Needs Rob Kutschke Scientific Computing Portfolio Management Team (SC-PMT) Review 4 March 2015.
Canadian Bioinformatics Workshops
Computing Infrastructure Arthur Kreymer 1 ● Power status in FCC (UPS1) ● Bluearc disk purchase – coming soon ● Planned downtimes – none ! ● Minos.
Minos Computing Infrastructure Arthur Kreymer 1 ● Grid – Going to SLF 5, doubled capacity in GPFarm ● Bluearc - performance good, expanding.
Compute and Storage For the Farm at Jlab
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
DCS Status and Amanda News
Diskpool and cloud storage benchmarks used in IT-DSS
Status of the CERN Analysis Facility
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
OffLine Physics Computing
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
Exploring Multi-Core on
Presentation transcript:

Mu2e and Muon g-2: FIFE Plans and Issues Rob Kutschke and Adam Lyon FIFE Workshop June 15, 2013 Mu2e-doc-4269-v1

Mu2e and Muon g-2 I am speaking for Mu2e Presenting some slides on behalf of Adam Lyon for Muon g-2. 11/9/13Kutschke/Simulations2

Most Important Slide Thanks to the FIFE team and related service providers for all of their hard work. –We have used lots of resources. –We have received lots of help and advice, some of it on a rush basis. –In most cases we have very good communication. We very much appreciate all of it. There is a lot more work to do … 11/9/13Kutschke/Simulations3

Mu2e Timeline We are in the P5 plan in all scenarios! – Some stretch-out possible CD-2/3 Director’s Readiness Review: July 8-11 CD-2/3 DOE Review: August –CD 3 only for selected subsystems –Other subsystems: CD-3 mini review over ~1 year –Continue heavy simulations through this period Projected construction start this fall Projected engineering runs ~ /9/13Kutschke/Simulations4

Grid Usage - 1 GPGrid and CDFgrid –Average ~1000 slots active at any time –Peaks to > 4000 slots –Likely to continue for at least a year (but see next page … ) Two major classes of jobs –CPU heavy, little IO –IO dominated Dominated by “wait for lock” to access bluearc Stage in O(10 GB) and compute for only 1 or 2 hours Often hundreds of processes waiting to stage in files Expect that dcache will help a lot – testing now Expect fraction of IO dominated jobs to increase 11/9/13Kutschke/Simulations5

Grid Usage - 2 A new need has arisen. Known weak points in cosmic ray veto coverage –Over the river; off the expressway; through the window; off the scoreboard; nothin’ but net. Need to simulate at least 10 x the live time of the experiment with targeted simulations –~4,000,000 cpu hours over 6 months+ –( = 1000 cores for ~5.6 months ) Learning to use offsite OSG will be important –We will initiate this in the next weeks 11/9/13Kutschke/Simulations6

Grid Issues Intermittent problem: jobs that die for reasons we do not understand – O(1%). –Hundreds of processes in one cluster will complete successfully and a few will die seconds after starting. –Current ticket: INC INC Mu2e processes sit idle when there are thousands of free slots on both CDF Grid and GP Grid Wait-for-lock issue from page 5 – hopefully dcache is enough mitigation. Have not yet tried client-server jobsub. 11/9/13Kutschke/Simulations7

FermiCloud One important use case: making pre-mixed background overlays –IO limited 1000 x [ O(8 GB) in O(8 GB) out ] –Enough additional degrees of freedom to safely oversample many times. –Needs a machine with about 8 GB peak virtual size –Because of IO limitations we can only use a few such machines at one time. –Fermicloud has been a great solution This will be an intermittent but ongoing need. 11/9/13Kutschke/Simulations8

Disk - 1 Bluearc: 82 TB data, 1.4 TB app - full Recently started to use dcache scratch –SCD response was fast –Some glitches but we are working through them –Some feature requests in the ifdh queue We are gambling that we can use dcache scratch as a safe place to stash files for a few more weeks until we have our SAM instance configured. Migrating our high IO jobs to stage-in large files from dcache, not from bluearc. Will also look at xrootd. 11/9/13Kutschke/Simulations9

Disk – 2 We renew our request for individual bluearc quotas For now, only a group quota We would like to be able to control individual quotas ourselves, ie without a service desk ticket Under normal operating conditions we have a few spare TB of bluearc disk space –Not all should be written to tape – in fact most of it should not. –It’s easy for someone who is not paying attention to use it all and cause everyone else’s work to fail. –Expect this to get worse as more people join –dcache can help but it is not really the complete answer 11/9/13Kutschke/Simulations10

SAM We have had the initial meetings to set up SAM On our plate: –Decide on file families, meta-data and policy for what gets written to tape. –Ray Culbertson has prepared a draft plan –Expect to make decisions O( <= 1 week) I understand that we can be up and running O(1 week) after that. Decision was made a few weeks ago that new files will be written to T10000 tapes. Then we will need to adapt scripts O( 1 week) including debugging in the field. 11/9/13Kutschke/Simulations11

art Overall very happy with art. About bug-fix and feature requests. –Things that are both important and easy to do are usually dealt with quickly. –In recent months NOvA has had priority so other work has been delayed (understandably so) –There is a large and growing backlog of requests, including many Mu2e requests. –The client base is growing –The art team needs to grow or it will not be able to keep up. 11/9/13Kutschke/Simulations12

jobsub We have our own submit scripts that live on top of jobsub –mu2eart –mu2eg4bl –mu2emars So far only use them for GP and CDF grids Good response on feature requests Have not yet exercised client server. Will do so as soon as well can post TDR. 11/9/13Kutschke/Simulations13

ifdh We use it and it general works well. Periodic issues and good response. We have an issue open now regarding stage-in from mixed sources ( both bluearc and dcache ). 11/9/13Kutschke/Simulations14

OASIS/CVMFS On our list so deploy soon. We know that we need it for easiest offsite running. 11/9/13Kutschke/Simulations15

Conditions Databases Three needs: –Fall 2014: DBs for travellers for construction project DB’s will be O(a few GB?) –~ ?: conditions DB for prototyping the conditions system and learning to do alignment Guess that scale is O(100 GB)? –??: Test beam conditions data Guess that the scale is < O(1 GB) GUI’s for data entry for the traveller DBs. –Historically done by postdocs/grad students (aka free labor) on project but Mu2e is too far from data to have many of these –Is this availabled within FIFE? Elsewhere in SCD? 11/9/13Kutschke/Simulations16

Other ELOG –We have an instance but it is not heavily used. –Expect use to grow as testbeam starts –Not clear yet if construction projects will use it 11/9/13Kutschke/Simulations17

Not Sure If this is the Right Venue … The Mu2e collaboration asked for a discussion forum technology ~2 years ago. –I understand that a prototype is being designed in Sharepoint 2013 but I have not yet heard when it might be available to test. Our outside collaborators have urged me to talk about this in the strongest possible terms – they are already telling their colleagues that FNAL CS just does not care. 11/9/13Kutschke/Simulations18

FIFE & Muon g-2 Adam Lyon for the Muon g-2 collaboration

What we use now We’re a big user, supporter, and cheerleader for the art framework –Simulation and offline code is under development in art –Our ArtG4 is now supported by the SCD Simulation group Our development environment has become “mrb”, in use by LArSoft and others We’ve had new users go through the art workbook and they like it

What we use now We rely on Redmine and git (writing the conceptual and technical design reports were managed by git – even the spokespersons learned git [though not happily]) We use CMake for builds (same as art itself) We deliver applications/libraries with CVMFS (including to our mac laptops; Cornell U and U Washington groups use OASIS to get releases on their interactive nodes)

A glaring omission Several collaborators want to use their Windows Laptops (including a spokesperson) –The lab basically does not support kerberos on non-lab owned windows laptops. There is no good free kerberos Windows solution –Installing and using kerberos via Cygwin is painful –Some people don’t appreciate being told to buy a Mac (but of course that’s the right thing to do ) Our solution: –Windows user installs VirtualBox (free virtualization system - easy) –Download and install our specially prepared SLF6 Vbox Image file –This gives the user a virtual SLF6 machine that comes with Kerberos, ssh, git, LaTeX, X-windows, and CVMFS Oasis automounted – all installed and instantly ready to go. Files can be shared with the Windows side. –This solution has worked nicely, but wish we didn’t have to do this

Plans for this summer Adopt SAM/IFDH/dCache –Right now we’re still using Bluearc via jobsub’s nice copyback mechanism (uses IFDH) –Plan to put important simulations on tape and use the scratch cache –SAM is already set up and is waiting for us Try new JobSub Try Build Facility Going slowly because people are busy with work on TDR/reviews.

Plans By fall we’ll be a fully on-boarded OSG experiment using SAM, dCache, etc Thanks for everyone’s hard work and help!

Backup Slides 11/9/13Kutschke/Simulations25

Computing Projections Continue current CPU and IO load for about another year to finalize the design –Background rates as we modify details of shielding. Shielding for system A can increase rates in system B! –Weak points in cosmic ray veto system. –Improve robustness of reco algorithms in higher background scenarios. –Calibrate-out some some effects that are modeled in the simulation but not yet in reco 11/9/13Kutschke/Simulations26

Mu2e Background Model 1 Event = 3.1 x 10 7 protons on target ~3,000 hits in the tracker during the live gate Simulate one proton at a time –Done 6 stages: 4 simulation + premix + reco/analysis The last 2 are IO heavy Premixing requires 8 GB peak virtual size –Some stages resample the output of previous stage many times –Lots of knobs and dials to increase the effective statistics so that we can have effective statistics of many millions of events. –Form samples of pre-mixed background to overlay on signal events 11/9/13Kutschke/Simulations27