Download presentation
Presentation is loading. Please wait.
Published byRosa Dalton Modified over 8 years ago
1
Grid Computing 4 th FCPPL Workshop Gang Chen & Eric Lançon
2
Gang Chen/CC/IHEP 2016-6-28 - 2 LHC Grid Computing LHC started to be operational in March 2010. LHC started to be operational in March 2010. WLCG became the real productive level computing system for the Experiments. WLCG became the real productive level computing system for the Experiments. Collaboration of Grid computing within FCPPL is also a challenge to meet the requirement of LHC. Collaboration of Grid computing within FCPPL is also a challenge to meet the requirement of LHC.
3
Gang Chen/CC/IHEP 2016-6-28 - 3 Grid organization CERN Lyon Beijing Active collaboration between Lyon-T1 and Beijing T2 mandatory (-from the Eric’s slide for FCPPL2010)
4
Gang Chen/CC/IHEP 2016-6-28 - 4 Activities in 2010 One person from CC-IN2P3 stays two years at IHEP starting from last summer (Fabio Hernandez) One person from CC-IN2P3 stays two years at IHEP starting from last summer (Fabio Hernandez) Enhance the close collaboration between two partnersEnhance the close collaboration between two partners One person from IHEP visited CC-IN2P3 for three weeks (Jingyan Shi) One person from IHEP visited CC-IN2P3 for three weeks (Jingyan Shi) Exchange of expertise on grid site operationsExchange of expertise on grid site operations Active cooperation between China & France through : Active cooperation between China & France through : Monthly meeting about organizational and operational computing issues on French cloudMonthly meeting about organizational and operational computing issues on French cloud Monthly LCG-France technical meetings to share common operational solutionsMonthly LCG-France technical meetings to share common operational solutions French Cloud conference in November, three persons from IHEPFrench Cloud conference in November, three persons from IHEP Face to Face meetings (Eric Lan ç on, Xiaofei Yan, Gongxing Sun) Face to Face meetings (Eric Lan ç on, Xiaofei Yan, Gongxing Sun) Visits at BeijingVisits at Beijing Workshops in France and JapanWorkshops in France and Japan
5
Gang Chen/CC/IHEP 2016-6-28 - 5 Activities in 2010 Fine tuning of the network between IHEP and CC- IN2P3. Fine tuning of the network between IHEP and CC- IN2P3. Guillaume Cessieux and Fazhi Qi involvedGuillaume Cessieux and Fazhi Qi involved Operation of the French cloud of ATLAS Operation of the French cloud of ATLAS Remote operation of sites in China, France, Japan, RomaniaRemote operation of sites in China, France, Japan, Romania Monitoring of production, analysis, data transferMonitoring of production, analysis, data transfer Shifts operated by 5 people (Wenjing Wu from IHEP)Shifts operated by 5 people (Wenjing Wu from IHEP) Monitoring of ATLAS Distributed Data Management (DDM) Monitoring of ATLAS Distributed Data Management (DDM) PhD Thesis from Donal Zang (IHEP) in cooperation with main DDM architect (French collaborator)PhD Thesis from Donal Zang (IHEP) in cooperation with main DDM architect (French collaborator) CMS related activities … CMS related activities …
6
Gang Chen/CC/IHEP 2016-6-28 - 6 Network performance Tuning Problem : Problem : CC-IN2P3 IHEP performance is acceptable, but IHEP IN2P3 was very badCC-IN2P3 IHEP performance is acceptable, but IHEP IN2P3 was very bad 81.94 KB/sec with one stream81.94 KB/sec with one stream The large files(>1GB) could not be transferred from IHEP to IN2P3The large files(>1GB) could not be transferred from IHEP to IN2P3
7
Gang Chen/CC/IHEP 2016-6-28 - 7 Network performance Tuning Contacted with Renater to adjust the network configurationContacted with Renater to adjust the network configuration Performance backed to the normal level on Sept. 16 2010Performance backed to the normal level on Sept. 16 2010 IHEP CC-IN2P3 throughput with single stream can be a few MB/sIHEP CC-IN2P3 throughput with single stream can be a few MB/s Comparable with CC-IN2P3 IHEPComparable with CC-IN2P3 IHEP Performance asymmetry still persists…Performance asymmetry still persists… Further work is needed in 2011Further work is needed in 2011
8
Gang Chen/CC/IHEP 2016-6-28 - 8 ATLAS DDM/DQ2 Tracer service ATLAS Distributed Data Management service ATLAS Distributed Data Management service Record relevant information about data Access and Usage on the grid Record relevant information about data Access and Usage on the grid Key and critical component for the ATLAS COMPUTING MODEL Key and critical component for the ATLAS COMPUTING MODEL Automatic and dynamic cleaning of grid storages based on popularityAutomatic and dynamic cleaning of grid storages based on popularity Automatic replication of ‘HOT’ dataAutomatic replication of ‘HOT’ data Both experiment and user activity keep increasing since data taking Both experiment and user activity keep increasing since data taking http://bourricot.cern.ch/dq2/ atlas-adc-ddm-lab@cern.ch Evolution of the total space (PB)Total Number of traces* per month (M) *Trace = Grid file access(read/write) operation ~60 traces / second Peak > 300 / second
9
Gang Chen/CC/IHEP 2016-6-28 - 9 New DDM/ DQ2 Tracer architecture Issues with the old tracer architecture Issues with the old tracer architecture Important contributions from Donal Zang (IHEP) Important contributions from Donal Zang (IHEP) Evaluation of new technologies Evaluation of new technologies Messaging system & NOSQL databasesMessaging system & NOSQL databases Official ATLAS R&D taskforces / request support to CERN-ITOfficial ATLAS R&D taskforces / request support to CERN-IT Definition and validation of the new tracer and monitoring architecture Definition and validation of the new tracer and monitoring architecture oracle HTTP one-by-one insertion stomp bulk insertion real time statistics Monitorin g & API Monitorin g & API Tracer agents statisti c agents … … Old architecture New architecture atlas-adc-ddm-lab@cern.ch Scalability issues, Loss of traces, Limited monitoring
10
Gang Chen/CC/IHEP 2016-6-28 - 10 Good results in production All issues solved ! All issues solved ! >1k traces/second and can scale linearly >1k traces/second and can scale linearly No lost traces No lost traces Almost real time monitoring on thousands of metrics Almost real time monitoring on thousands of metrics atlas-adc-ddm-lab@cern.ch Monitoring Plots (based on statistic metrics in Cassandra) Total file size ~90T/hour Average file size ~0.6GFile operation numbers ~60 /second Average transfer rate ~25M/second
11
Gang Chen/CC/IHEP 2016-6-28 - 11 ATLAS Data transfer speed: Lyon to Beijing Large improvement of transfer speed in last trimester of 2010, thanks to continuous monitoring effort
12
Gang Chen/CC/IHEP 2016-6-28 - 12 ATLAS Data transfer between Lyon and Beijing > 130 TB of data transferred from Lyon to Beijing in 2010 > 35 TB of data transferred from Lyon to Beijing in 2010
13
Gang Chen/CC/IHEP 2016-6-28 - 13 CMS Data transfer from/to Beijing ~290 TB transferred from elsewhere to Beijing in 2010 ~110 TB transferred from Beijing elsewhere in 2010
14
Gang Chen/CC/IHEP 2016-6-28 - 14 Beijing: 10% of Jobs from T2s of FR-cloud Production efficiency : 92.5% (average T2 of FR-cloud : 86%) Half of Beijing resources used for analysis in second part of 2010 ATLAS Jobs @ Beijing in 2010
15
Gang Chen/CC/IHEP 2016-6-28 - 15 Total Jobs @ Beijing in 2010 About 8.7 million CPU hours provided and 2.4 million jobs completed in the year: About 8.7 million CPU hours provided and 2.4 million jobs completed in the year: Experiments CPU hours Jobs ATLAS5,054,1381,681,391 CMS3,639,866752,886
16
Gang Chen/CC/IHEP 2016-6-28 - 16 Beijing site
17
Gang Chen/CC/IHEP 2016-6-28 - 17 Prospect for 2011 More integrated operation of French Cloud More integrated operation of French Cloud Closer monitoring of data transfers to/from IHEP Closer monitoring of data transfers to/from IHEP Foreseen areas of cooperation: Foreseen areas of cooperation: Improvement on the transfer rateImprovement on the transfer rate to ensure that Beijing remains in the top list of ATLAS T2s Caching technology for software & calibration constants distributionCaching technology for software & calibration constants distribution Virtual Machine testing for deployment on the GridVirtual Machine testing for deployment on the Grid Remote ATLAS control station testsRemote ATLAS control station tests
18
Gang Chen/CC/IHEP 2016-6-28 - 18 THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.