Download presentation
Presentation is loading. Please wait.
Published byDina Whitehead Modified over 8 years ago
2
Maria Girone, CERN CMS in a High-Latency Environment CMSSW I/O Optimizations for High Latency CPU efficiency in a real world environment HLT Wigner AAA Outlook
3
Maria Girone, CERN What follows is the work of a number of people Optimization and development by CMS Offline (Especially Brian Bockelman) More details at ‘Optimizing High-Latency I/O in CMSSW’, CHEP2013 CMS Commissioning and Operations (especially David Colling, James Letts, Nicolò Magini, Ian Fisk and Daniele Bonacorsi) All the work from the CMS data federation activity (especially Ken Bloom)
4
Maria Girone, CERN Even two halves of the Tier-0 are separated by 30ms To efficiently read data over high latency connections, CMS has invested effort in the last 3 years in code improvements and I/O optimizations using ROOT best practices and other additional improvements 130ms 250ms 300ms
5
Maria Girone, CERN CMS has built on the work of the ROOT team in TTreeCache ROOT does a good job in reading ahead objects/branches that are frequently accessed in a local environment using a cache The cache training techniques do not work well in a high latency environment as during the training (first 20 events) every object is read with a separate network access A typical file format in CMS has more than 1000 branches; if you read half of them at 130ms latency then the first event took longer than a minute to read (20 minutes for the training)
6
Maria Girone, CERN The two highest impact techniques used by CMS both rely on making a secondary TTreeCache with a small number of calls To speed up the cache training CMS loads all branches for the first 20 events in 1 call Loading them into a (<20MB) memory cache and using this for access More data than needed is read for the first 20 events, but many fewer calls than getting individual baskets from every branch
7
Maria Girone, CERN A common analysis application is reading only some of the branches to select (trigger branches) and then write out all the objects for a limited number of events The default TTreeCache deals well with the triggering quantities, but issues single reads for each non- trigger branch that was not in the cache When CMSSW requests a non-trigger branch, instead of passing the read request to ROOT, creates a temporary TTreeCache and trains it for the non-trigger branches The temporary TTreeCache will fetch objects from each branch in one network request 1000 branches unused and uncached, but written if there is a selection Used for selectio n
8
Maria Girone, CERN CMS ran 2 tests reading 30% of the branches for every event and then selecting every 50 th event selecting all branches This is a sparse selection skimming application, I/O intensive The tests were run reading 1000 events from a local server (0.3msRTT) and from CERN servers from Nebraska (137ms RTT) The results of removing the most significant improvements are shown in the table. Results are normalized to the local read with all improvements The startup optimization even gives a 20% improvement even in the local environment
9
Maria Girone, CERN We use CPU efficiency for demonstrating how well resources are used It is not a perfect metric, because it’s coupled with the speed of the CPU A high value only tells that you have sufficient bandwidth for the particular resource being measured A low value tells us the application is waiting for data We look at 3 environments HLT – CERN P5- 60Gbps link P5 to CC (low latency, 0.7ms) Wigner – Budapest – 200Gbps link Wigner to CC (Medium Latency, 30ms) AAA (with large variations of latency up to 300ms)
10
Maria Girone, CERN The Higher Level Trigger farm is 6k cores By 2015 12k-15k cores, representing about 40% of the size of the total Tier-1 resources Same order as Tier-0 AI It’s a resource fully available only during shutdown periods and fully configured as an OpenStack Cloud We will try to use opportunistically during the year during inter-fill and machine studies CPU efficiency close to 100% for reconstruction applications
11
Maria Girone, CERN CERN split the Tier-0 between 2 physical centers CC in Meyrin and Wigner in Budapest Physical disk resources are currently located entirely in Meyrin, so any CPU in Wigner reads with 30ms latency We measure the results with the dashboard logs, comparing similar applications running at CERN and at Wigner Same OS (SLC6) and all virtual machines Production jobs (little data read): 3% increase Analysis jobs: 6% drop
12
Maria Girone, CERN US-CMS pioneered this effort, with the “Any data, anywhere, anytime (AAA)” project Federation should allow the sharing of data serving to processing resources across sites By Summer 2014, CMS will complete the deployment and testing of the data federation in preparation for Run2 All Tier1s and 90% of Tier2s serving data Nearly all files from data collected or derived in 2015 should be accessible interactively Scale tests currently ongoing: file opening at 250Hz! Now moving to file reading scale tests Desired capability: access 20% of data across wide area; 200k jobs/day, 60k files/day, O(100TB)/day 12 10.02.2014
13
Maria Girone, CERN About 7% of CMS analysis jobs read data (at least one file) over the WAN during 2013 Includes jobs deliberately “overflowed” to other sites, as well as “fallback” for locally unreadable files
14
Maria Girone, CERN Slight efficiency cost for remote access However, “fallback” saves jobs which otherwise would have failed and wasted CPU time… And “overflow” uses CPU cycles which may have otherwise sat idle, allowing tasks to be completed more quickly
15
Maria Girone, CERN CMSSW makes use of advanced caching techniques (available to the community) The first technique takes advantage of understanding how ROOT works in a high latency environment The second technique takes advantage of understanding the data access pattern of CMS analysis applications These improvements have allowed CMS to have good CPU efficiencies reading in environments with O(100ms) latencies (AAA, within regions, …) We are confident we can increase the scale of remote data access over the WAN to the targeted 20% CMS would like to make use of data federations for production use cases too Allowing shared production workflows to use CPU at multiple sites and data served over the data federation CMS would like Data Federation to be an enabling technology for physics discovery in 2015
16
Maria Girone, CERN
17
Assuming mostly local access CMS refreshes the data samples roughly twice per year. In a nominal Tier-2 site there is ~1PB of disk space. To refresh a 1PB disk requires access to nearly all a 10GB/s link for 10 days A user or group in CMS frequently requests samples of O(10TB). If we say a user is willing to wait 24 hours for the transfer to complete, it also requires ~2.5Gb/s of networking. A nominal Tier-2 supports 40 users, and if each user only made one request per business day per month, on average 5Gb/s would be needed Assuming some provisioning factor, we can say a nominal Tier-2 site in CMS needs 10Gb/s of networking for 14kHS06 of processing and 1PB of disk. If we say each of the 7 Tier-1s should support 5 nominal Tier-2s, the export rate from Tier-1 should evolve to 50Gb/s during Run2
18
Maria Girone, CERN Assuming data federation access Analysis applications average read rate is 300kB/s Averaged over every thing users submit to the grid Reconstruction Applications are similar including RAW data, code, and conditions A nominal Tier-2 in 2015 has 2k cores. For analysis the average reading rate is 300kB/s. If half the site was performing analysis, ~2.5Gb/s are needed to keep 1000 cores busy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.