Grid Computing 4 th FCPPL Workshop Gang Chen & Eric Lançon
Gang Chen/CC/IHEP LHC Grid Computing LHC started to be operational in March LHC started to be operational in March WLCG became the real productive level computing system for the Experiments. WLCG became the real productive level computing system for the Experiments. Collaboration of Grid computing within FCPPL is also a challenge to meet the requirement of LHC. Collaboration of Grid computing within FCPPL is also a challenge to meet the requirement of LHC.
Gang Chen/CC/IHEP Grid organization CERN Lyon Beijing Active collaboration between Lyon-T1 and Beijing T2 mandatory (-from the Eric’s slide for FCPPL2010)
Gang Chen/CC/IHEP Activities in 2010 One person from CC-IN2P3 stays two years at IHEP starting from last summer (Fabio Hernandez) One person from CC-IN2P3 stays two years at IHEP starting from last summer (Fabio Hernandez) Enhance the close collaboration between two partnersEnhance the close collaboration between two partners One person from IHEP visited CC-IN2P3 for three weeks (Jingyan Shi) One person from IHEP visited CC-IN2P3 for three weeks (Jingyan Shi) Exchange of expertise on grid site operationsExchange of expertise on grid site operations Active cooperation between China & France through : Active cooperation between China & France through : Monthly meeting about organizational and operational computing issues on French cloudMonthly meeting about organizational and operational computing issues on French cloud Monthly LCG-France technical meetings to share common operational solutionsMonthly LCG-France technical meetings to share common operational solutions French Cloud conference in November, three persons from IHEPFrench Cloud conference in November, three persons from IHEP Face to Face meetings (Eric Lan ç on, Xiaofei Yan, Gongxing Sun) Face to Face meetings (Eric Lan ç on, Xiaofei Yan, Gongxing Sun) Visits at BeijingVisits at Beijing Workshops in France and JapanWorkshops in France and Japan
Gang Chen/CC/IHEP Activities in 2010 Fine tuning of the network between IHEP and CC- IN2P3. Fine tuning of the network between IHEP and CC- IN2P3. Guillaume Cessieux and Fazhi Qi involvedGuillaume Cessieux and Fazhi Qi involved Operation of the French cloud of ATLAS Operation of the French cloud of ATLAS Remote operation of sites in China, France, Japan, RomaniaRemote operation of sites in China, France, Japan, Romania Monitoring of production, analysis, data transferMonitoring of production, analysis, data transfer Shifts operated by 5 people (Wenjing Wu from IHEP)Shifts operated by 5 people (Wenjing Wu from IHEP) Monitoring of ATLAS Distributed Data Management (DDM) Monitoring of ATLAS Distributed Data Management (DDM) PhD Thesis from Donal Zang (IHEP) in cooperation with main DDM architect (French collaborator)PhD Thesis from Donal Zang (IHEP) in cooperation with main DDM architect (French collaborator) CMS related activities … CMS related activities …
Gang Chen/CC/IHEP Network performance Tuning Problem : Problem : CC-IN2P3 IHEP performance is acceptable, but IHEP IN2P3 was very badCC-IN2P3 IHEP performance is acceptable, but IHEP IN2P3 was very bad KB/sec with one stream81.94 KB/sec with one stream The large files(>1GB) could not be transferred from IHEP to IN2P3The large files(>1GB) could not be transferred from IHEP to IN2P3
Gang Chen/CC/IHEP Network performance Tuning Contacted with Renater to adjust the network configurationContacted with Renater to adjust the network configuration Performance backed to the normal level on Sept Performance backed to the normal level on Sept IHEP CC-IN2P3 throughput with single stream can be a few MB/sIHEP CC-IN2P3 throughput with single stream can be a few MB/s Comparable with CC-IN2P3 IHEPComparable with CC-IN2P3 IHEP Performance asymmetry still persists…Performance asymmetry still persists… Further work is needed in 2011Further work is needed in 2011
Gang Chen/CC/IHEP ATLAS DDM/DQ2 Tracer service ATLAS Distributed Data Management service ATLAS Distributed Data Management service Record relevant information about data Access and Usage on the grid Record relevant information about data Access and Usage on the grid Key and critical component for the ATLAS COMPUTING MODEL Key and critical component for the ATLAS COMPUTING MODEL Automatic and dynamic cleaning of grid storages based on popularityAutomatic and dynamic cleaning of grid storages based on popularity Automatic replication of ‘HOT’ dataAutomatic replication of ‘HOT’ data Both experiment and user activity keep increasing since data taking Both experiment and user activity keep increasing since data taking Evolution of the total space (PB)Total Number of traces* per month (M) *Trace = Grid file access(read/write) operation ~60 traces / second Peak > 300 / second
Gang Chen/CC/IHEP New DDM/ DQ2 Tracer architecture Issues with the old tracer architecture Issues with the old tracer architecture Important contributions from Donal Zang (IHEP) Important contributions from Donal Zang (IHEP) Evaluation of new technologies Evaluation of new technologies Messaging system & NOSQL databasesMessaging system & NOSQL databases Official ATLAS R&D taskforces / request support to CERN-ITOfficial ATLAS R&D taskforces / request support to CERN-IT Definition and validation of the new tracer and monitoring architecture Definition and validation of the new tracer and monitoring architecture oracle HTTP one-by-one insertion stomp bulk insertion real time statistics Monitorin g & API Monitorin g & API Tracer agents statisti c agents … … Old architecture New architecture Scalability issues, Loss of traces, Limited monitoring
Gang Chen/CC/IHEP Good results in production All issues solved ! All issues solved ! >1k traces/second and can scale linearly >1k traces/second and can scale linearly No lost traces No lost traces Almost real time monitoring on thousands of metrics Almost real time monitoring on thousands of metrics Monitoring Plots (based on statistic metrics in Cassandra) Total file size ~90T/hour Average file size ~0.6GFile operation numbers ~60 /second Average transfer rate ~25M/second
Gang Chen/CC/IHEP ATLAS Data transfer speed: Lyon to Beijing Large improvement of transfer speed in last trimester of 2010, thanks to continuous monitoring effort
Gang Chen/CC/IHEP ATLAS Data transfer between Lyon and Beijing > 130 TB of data transferred from Lyon to Beijing in 2010 > 35 TB of data transferred from Lyon to Beijing in 2010
Gang Chen/CC/IHEP CMS Data transfer from/to Beijing ~290 TB transferred from elsewhere to Beijing in 2010 ~110 TB transferred from Beijing elsewhere in 2010
Gang Chen/CC/IHEP Beijing: 10% of Jobs from T2s of FR-cloud Production efficiency : 92.5% (average T2 of FR-cloud : 86%) Half of Beijing resources used for analysis in second part of 2010 ATLAS Beijing in 2010
Gang Chen/CC/IHEP Total Beijing in 2010 About 8.7 million CPU hours provided and 2.4 million jobs completed in the year: About 8.7 million CPU hours provided and 2.4 million jobs completed in the year: Experiments CPU hours Jobs ATLAS5,054,1381,681,391 CMS3,639,866752,886
Gang Chen/CC/IHEP Beijing site
Gang Chen/CC/IHEP Prospect for 2011 More integrated operation of French Cloud More integrated operation of French Cloud Closer monitoring of data transfers to/from IHEP Closer monitoring of data transfers to/from IHEP Foreseen areas of cooperation: Foreseen areas of cooperation: Improvement on the transfer rateImprovement on the transfer rate to ensure that Beijing remains in the top list of ATLAS T2s Caching technology for software & calibration constants distributionCaching technology for software & calibration constants distribution Virtual Machine testing for deployment on the GridVirtual Machine testing for deployment on the Grid Remote ATLAS control station testsRemote ATLAS control station tests
Gang Chen/CC/IHEP THANK YOU