Presentation is loading. Please wait.

Presentation is loading. Please wait.

Margaret Votava / Scientific Computing Division FIFE Workshop 20 June 2016 State of the Facilities.

Similar presentations


Presentation on theme: "Margaret Votava / Scientific Computing Division FIFE Workshop 20 June 2016 State of the Facilities."— Presentation transcript:

1 Margaret Votava / Scientific Computing Division FIFE Workshop 20 June 2016 State of the Facilities

2 Process: SPPM, SC-PMT, and SC-PMT is the annual process in which experiments make computing requests and these are reviewed by [lab] committee against available resources –https://fermipoint.fnal.gov/project/sppm/SitePages/Home.aspxhttps://fermipoint.fnal.gov/project/sppm/SitePages/Home.aspx SPPM weekly meetings provide opportunities for updates handled. Reviewed by SCD management. Reminder: How we plan 6/20/2016FIFE Workshop - State of the Facilities2 The SPPM process is the fundamental means by which we align Facility operations with the P5 / Lab program

3 Information was provided on FY15 usage Processing, disk & tape storage, network, servers,… Information was requested for FY16, FY17, FY18 New this year was the problem of large memory requirements –Experiments were asked to provide info on both 2GB processing requests Current “job slot” sizing assumes 1 core, 2 GB Large memory jobs influence the resource needs Communication with the experiments 6/20/2016FIFE Workshop - State of the Facilities3

4 Information received in the form of spreadsheets and/or presentations from: Information provided by the experiments 6/20/2016FIFE Workshop - State of the Facilities4 Annie Captain-Minerva CDF CDMS / SCDMS CHIPS COUPP DES DUNE DZERO Genie Muon g-2 Holometer LArIAT LSST MARS (for Mu2e, g-2) MicroBoone Minerva Minos Mu2e Nova Numi-X Patriot SBND Seaquest See https://fermipoint.fnal.gov/project/sppm/Working%20Group%20Documents/Forms/2016pres.aspxhttps://fermipoint.fnal.gov/project/sppm/Working%20Group%20Documents/Forms/2016pres.aspx Relevant to KA22 Represent the initiatives at Experiment, Test, or Proposal stages

5 Will step through each of these components, noting capacities and effort Facility resources – the total picture 6/20/2016FIFE Workshop - State of the Facilities5 ProcessingDisk SystemsTape StorageNetworks

6 Facility resources – Tape storage 6/20/2016FIFE Workshop - State of the Facilities6 Tape Storage Capacity: 7x 10,000 slot libraries With 5.4 TB T10000C ~ 375 PB With 8.5 TB T10000D ~ 595 PB Allocation: General purpose: 4 libraries CMS: 3 libraries Current usage General purpose ~ 22 PB CMS ~ 43 PB Legacy CDF, DZero ~ 30 PB (includes migration duplicates) Storage policy is to migrate data to denser medium when capacity ~ doubles

7 Tape media and library cost is ~ $30 / TB ($30K / PB) (T10KD, 8.5TB/tape) Table below shows *addition* of: FY1616 PB$480K FY1717 PB$510K FY1829 PB$870K SC-PMT : Tape Usage and Requests 6/20/2016FIFE Workshop - State of the Facilities7 Tape is a major cost! But needs are not easily predicted Much of this originates from protoDUNEs at CERN at 200 MB/s rate SC-PMT Process SC-PMT Slides

8 Facility resources – Disk storage 6/20/2016FIFE Workshop - State of the Facilities8 Disk Systems Cache disk (dCache): General purpose ~ 8.5 PB CMS ~ 22 PB Legacy CDF ~ 1.5 PB Project/user disk (NAS): General purpose ~ 2 PB CMS (EOS) ~ 4 PB Legacy CDF, DZero ~ 1 PB HPC disk (Lustre): LQCD ~ 1 PB

9 Cache disk configuration matches usage patterns –Sized (and monitored) to meet cache retention lifetimes Cache disk utilization monitored to reflect priorities of experiments Disk storage utilization 6/20/2016FIFE Workshop - State of the Facilities9

10 6/20/2016FIFE Workshop - State of the Facilities10

11 6/20/2016FIFE Workshop - State of the Facilities11

12 Cache disk transfers monitored to reflect priorities of expts. Disk and tape storage transfers 6/20/2016FIFE Workshop - State of the Facilities12

13 Facility resources - Processing 6/20/2016FIFE Workshop - State of the Facilities13 Processing “Worker node” core counts: General purpose: 16,608 cores (GP Grid) Also manage: 17,872 cores (CMS Tier-1) 4,984 cores (CMS LPC) 28,240 cores (HPC / LQCD) 2,008 cores (DZero legacy) Currently in the process of re-allocating 4,500 decommissioned cores from LQCD to GP Grid

14 GP Grid Processing requests: Large memory or multi-core as single slot 6/20/2016 FIFE Workshop - State of the Facilities 14 Last year’s SC-PMT 2016 Capacity 2016 Capacity 2015 Capacity 2015 Capacity

15 CPU Resources: Age of worker nodes 6/20/2016FIFE Workshop - State of the Facilities15 Out of warranty! SC-PMT Process SC-PMT Slides

16 6/20/2016FIFE Workshop - State of the Facilities16

17 FY16: Ongoing work Experiments meeting their FY16 computing demands –Project data cataloguing 2x FY15 –Projected CPU hours 40% more than FY15 Revamped batch system configuration to better support jobs with large memory footprints (> 2GB) –New configuration results in reduced availability of nodes while slots are dynamically reconfigured: ~10% CPU efficiency loss –Temporarily suspended management of quotas. Working to reinstate. –continually working to reduce the number of and impact of high memory jobs Completed Phase II of monitoring overhaul Moving to federated identity support for batch processing 6/20/2016 17 FIFE Workshop - State of the Facilities

18 FY16: How much computing so far 6/20/2016 FIFE Workshop - State of the Facilities 18 Over 83M wall clock hours

19 FY16: Memory footprint 6/20/2016 FIFE Workshop - State of the Facilities 19 Large footprint jobs since Jan 2016

20 FY16: 7% CPU efficiency gained 6/20/2016 20 FIFE Workshop - State of the Facilities

21 Running jobs – onsite and offsite 6/20/2016 FIFE Workshop - State of the Facilities 21

22 Running jobs - offsite only 6/20/2016 FIFE Workshop - State of the Facilities 22

23 Reporting Computing at All Experimenters Meeting 6/20/2016 FIFE Workshop - State of the Facilities 23

24 We will not have enough CPU in FY17 to satisfy requests on site. We will follow lab/P5 priorities (see SCPMT recommendations). –Efficiency of your jobs matter –Success rate of your jobs matter –Memory footprints of your jobs matter –Run offsite regularly – keep the pump primed. Will be working on pre-emption and priority policies Summary 6/20/2016FIFE Workshop - State of the Facilities24

25 Take Away – Onsite FY17 resources are tight 6/20/2016FIFE Workshop - State of the Facilities25

26 6/20/201626 Girl Scout Law I will do my best to … use resources wisely, … make the world a better place, …. Let’s make the world a better place FIFE Workshop - State of the Facilities


Download ppt "Margaret Votava / Scientific Computing Division FIFE Workshop 20 June 2016 State of the Facilities."

Similar presentations


Ads by Google