Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 

Similar presentations

Presentation on theme: "Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT "— Presentation transcript:


2 Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT  Wigner  AAA  Outlook

3 Maria Girone, CERN  What follows is the work of a number of people  Optimization and development by CMS Offline (Especially Brian Bockelman)  More details at ‘Optimizing High-Latency I/O in CMSSW’, CHEP2013  CMS Commissioning and Operations (especially David Colling, James Letts, Nicolò Magini, Ian Fisk and Daniele Bonacorsi)  All the work from the CMS data federation activity (especially Ken Bloom)

4 Maria Girone, CERN  Even two halves of the Tier-0 are separated by 30ms  To efficiently read data over high latency connections, CMS has invested effort in the last 3 years in code improvements and I/O optimizations using ROOT best practices and other additional improvements 130ms 250ms 300ms

5 Maria Girone, CERN  CMS has built on the work of the ROOT team in TTreeCache  ROOT does a good job in reading ahead objects/branches that are frequently accessed in a local environment using a cache  The cache training techniques do not work well in a high latency environment as during the training (first 20 events) every object is read with a separate network access  A typical file format in CMS has more than 1000 branches; if you read half of them at 130ms latency then the first event took longer than a minute to read (20 minutes for the training)

6 Maria Girone, CERN  The two highest impact techniques used by CMS both rely on making a secondary TTreeCache with a small number of calls  To speed up the cache training CMS loads all branches for the first 20 events in 1 call  Loading them into a (<20MB) memory cache and using this for access  More data than needed is read for the first 20 events, but many fewer calls than getting individual baskets from every branch

7 Maria Girone, CERN  A common analysis application is reading only some of the branches to select (trigger branches) and then write out all the objects for a limited number of events  The default TTreeCache deals well with the triggering quantities, but issues single reads for each non- trigger branch that was not in the cache  When CMSSW requests a non-trigger branch, instead of passing the read request to ROOT, creates a temporary TTreeCache and trains it for the non-trigger branches  The temporary TTreeCache will fetch objects from each branch in one network request 1000 branches unused and uncached, but written if there is a selection Used for selectio n

8 Maria Girone, CERN  CMS ran 2 tests reading 30% of the branches for every event and then selecting every 50 th event selecting all branches  This is a sparse selection skimming application, I/O intensive  The tests were run reading 1000 events from a local server (0.3msRTT) and from CERN servers from Nebraska (137ms RTT)  The results of removing the most significant improvements are shown in the table. Results are normalized to the local read with all improvements  The startup optimization even gives a 20% improvement even in the local environment

9 Maria Girone, CERN  We use CPU efficiency for demonstrating how well resources are used  It is not a perfect metric, because it’s coupled with the speed of the CPU  A high value only tells that you have sufficient bandwidth for the particular resource being measured  A low value tells us the application is waiting for data  We look at 3 environments  HLT – CERN P5- 60Gbps link P5 to CC (low latency, 0.7ms)  Wigner – Budapest – 200Gbps link Wigner to CC (Medium Latency, 30ms)  AAA (with large variations of latency up to 300ms)

10 Maria Girone, CERN  The Higher Level Trigger farm is 6k cores  By 2015 12k-15k cores, representing about 40% of the size of the total Tier-1 resources  Same order as Tier-0 AI  It’s a resource fully available only during shutdown periods and fully configured as an OpenStack Cloud  We will try to use opportunistically during the year during inter-fill and machine studies CPU efficiency close to 100% for reconstruction applications

11 Maria Girone, CERN  CERN split the Tier-0 between 2 physical centers  CC in Meyrin and Wigner in Budapest  Physical disk resources are currently located entirely in Meyrin, so any CPU in Wigner reads with 30ms latency  We measure the results with the dashboard logs, comparing similar applications running at CERN and at Wigner  Same OS (SLC6) and all virtual machines Production jobs (little data read): 3% increase Analysis jobs: 6% drop

12 Maria Girone, CERN  US-CMS pioneered this effort, with the “Any data, anywhere, anytime (AAA)” project  Federation should allow the sharing of data serving to processing resources across sites  By Summer 2014, CMS will complete the deployment and testing of the data federation in preparation for Run2  All Tier1s and 90% of Tier2s serving data  Nearly all files from data collected or derived in 2015 should be accessible interactively  Scale tests currently ongoing: file opening at 250Hz! Now moving to file reading scale tests Desired capability: access 20% of data across wide area; 200k jobs/day, 60k files/day, O(100TB)/day 12 10.02.2014

13 Maria Girone, CERN  About 7% of CMS analysis jobs read data (at least one file) over the WAN during 2013  Includes jobs deliberately “overflowed” to other sites, as well as “fallback” for locally unreadable files

14 Maria Girone, CERN  Slight efficiency cost for remote access  However, “fallback” saves jobs which otherwise would have failed and wasted CPU time…  And “overflow” uses CPU cycles which may have otherwise sat idle, allowing tasks to be completed more quickly

15 Maria Girone, CERN  CMSSW makes use of advanced caching techniques (available to the community)  The first technique takes advantage of understanding how ROOT works in a high latency environment  The second technique takes advantage of understanding the data access pattern of CMS analysis applications  These improvements have allowed CMS to have good CPU efficiencies reading in environments with O(100ms) latencies (AAA, within regions, …)  We are confident we can increase the scale of remote data access over the WAN to the targeted 20%  CMS would like to make use of data federations for production use cases too  Allowing shared production workflows to use CPU at multiple sites and data served over the data federation  CMS would like Data Federation to be an enabling technology for physics discovery in 2015

16 Maria Girone, CERN

17  Assuming mostly local access  CMS refreshes the data samples roughly twice per year. In a nominal Tier-2 site there is ~1PB of disk space. To refresh a 1PB disk requires access to nearly all a 10GB/s link for 10 days  A user or group in CMS frequently requests samples of O(10TB). If we say a user is willing to wait 24 hours for the transfer to complete, it also requires ~2.5Gb/s of networking. A nominal Tier-2 supports 40 users, and if each user only made one request per business day per month, on average 5Gb/s would be needed  Assuming some provisioning factor, we can say a nominal Tier-2 site in CMS needs 10Gb/s of networking for 14kHS06 of processing and 1PB of disk. If we say each of the 7 Tier-1s should support 5 nominal Tier-2s, the export rate from Tier-1 should evolve to 50Gb/s during Run2

18 Maria Girone, CERN  Assuming data federation access  Analysis applications average read rate is 300kB/s  Averaged over every thing users submit to the grid  Reconstruction Applications are similar including RAW data, code, and conditions  A nominal Tier-2 in 2015 has 2k cores. For analysis the average reading rate is 300kB/s. If half the site was performing analysis, ~2.5Gb/s are needed to keep 1000 cores busy

Download ppt "Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT "

Similar presentations

Ads by Google