Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1.

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Testing PanDA at ORNL Danila Oleynik University of Texas at Arlington / JINR PanDA UTA 3-4 of September 2013.
Status report from Tokyo Tier-2 for the one year operation after whole scale system upgrade 2014/5/20Tomoaki Nakamura ICEPP, UTokyo1 Tomoaki Nakamura on.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC
ALICE Tier-2 at Hiroshima Toru Sugitate of Hiroshima University for ALICE-Japan GRID Team LHCONE workshop at the APAN 38 th.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Planning and Designing Server Virtualisation.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Block1 Wrapping Your Nugget Around Distributed Processing.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
THORPEX Interactive Grand Global Ensemble (TIGGE) China Meteorological Administration TIGGE-WG meeting, Boulder, June Progress on TIGGE Archive Center.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
P. Kuipers Nikhef Amsterdam Computer- Technology Nikhef Site Report Paul Kuipers
Analysis in STEP09 at TOKYO Hiroyuki Matsunaga University of Tokyo WLCG STEP'09 Post-Mortem Workshop.
Status of ATLAS T2 in Tokyo
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
STATUS OF KISTI TIER1 Sang-Un Ahn On behalf of the GSDC Tier1 Team WLCG Management Board 18 November 2014.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
ICEPP regional analysis center (TOKYO-LCG2) ICEPP, The University of Tokyo 2013/5/15Tomoaki Nakamura ICEPP, Univ. of Tokyo1.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
ATLAS Tier3s Santiago González de la Hoz IFIC-Valencia S. González de la Hoz, II PCI 2009 Workshop, Valencia, 18/11/2010 ATLAS Tier3s (Programa de Colaboración.
Atlas Tier 3 Overview Doug Benjamin Duke University.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Experience of Lustre at QMUL
Grid site as a tool for data processing and data analysis
Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing
Update on SINET5 implementation for ICEPP (ATLAS) and KEK (Belle II)
Diskpool and cloud storage benchmarks used in IT-DSS
LCG Deployment in Japan
Update on Plan for KISTI-GSDC
Experience of Lustre at a Tier-2 site
Luca dell’Agnello INFN-CNAF
PK-CIIT Grid Operations in Pakistan
Upgrading to Microsoft SQL Server 2014
Oracle Storage Performance Studies
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Presentation transcript:

Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1

ICEPP regional analysis center 2013/12/13Tomoaki Nakamura ICEPP, UTokyo2 Resource overview Support only ATLAS VO in WLCG as Tier2, and ATLAS-Japan collaboration as Tier3. The first production system for WLCG was deployed in 2005 after the several years R&D. Almost of hardware are prepared by three years lease. System have been upgraded in every three years. Current system is the 3rd generation system. Human resource Tetsuro Mashimo (associate professor):fabric operation, procurement, Tier3 support Nagataka Matsui (technical staff):fabric operation, Tier3 support Tomoaki Nakamura (project assistant professor):Tier2 operation, Tier3 analysis environment Hiroshi Sakamoto (professor):site representative, coordination, ADCoS (AP leader) Ikuo Ueda (assistant professor):ADC coordinator, site contact with ADC System engineer from company (2FTE):fabric maintenance, system setup WLCG pledge

Computing resource ( ) 2013/12/13Tomoaki Nakamura ICEPP, UTokyo3

Evolution of disk storage capacity for Tier2 2013/12/13Tomoaki Nakamura ICEPP, UTokyo4 16x500GB HDD / array 5disk arrays / server XFS on RAID6 4G-FC via FC switch 10GE NIC 24x2TB HDD / array 1disk array / server XFS on RAID6 8G-FC via FC switch 10GE NIC 24x3TB HDD /array 2disk arrays / server XFS on RAID6 8G-FC without FC switch 10GE NIC ■ WLCG pledge ● Deployed (assigned to ATLAS) Number of disk arrays Number of file servers Pilot system for R&D 1st system nd system rd system Total capacity in DPM 2.4PB 4th system TBHDD?

ATLAS disk and LocalGroupDisk in DPM 2013/12/13Tomoaki Nakamura ICEPP, UTokyo5 2014’s pledge deployed since Feb. 20, 2013 DATADISK:1740TB (including old GroupDisk, MCDisk, HotDisk) PRODDISK:60TB SCRATCHDISK:200TB Total:2000TB Keep less than 500TB at present, if no more request from users. But, it is unlikely… ← one year → datasets deleted one-year-old 500TB datasets deleted half-year-old datasets deleted heavy user Filled 1.6PB

Fraction of ATLAS jobs at Tokyo Tier2 2013/12/13Tomoaki Nakamura ICEPP, UTokyo6 Fraction of completed jobs [%] [per 6 month] 2nd system 3rd system System migration

SL6 migration 2013/12/13Tomoaki Nakamura ICEPP, UTokyo7 Migration to SL6 WN was completed in several week at Oct It was done by the rolling transition making TOKYO-SL6 queue to minimize the downtime and risk hedge for the massive miss configuration. Performance improvement 5% increased as HepSpec06 score SL5, 32bit compile mode:17.06±0.02 SL6, 32bit compile mode:18.03±0.02 R. M. Llamas, ATLAS S&C workshop Oct. 2013

WLCG service instance and middle-ware 2013/12/13Tomoaki Nakamura ICEPP, UTokyo8

Memory upgrade 2013/12/13Tomoaki Nakamura ICEPP, UTokyo9 lcg-ce01.icepp.jp: 80 nodes (1280 cores) 32GB/node: 2GB RAM/core (=job slot), 4GB vmem/core lcg-ce02.icepp.jp: 80 nodes (1280 cores) 64GB/node: 4GB RAM/core (=job slot), 6GB vmem/core Useful for memory consuming ATLAS production jobs. Available sites are rare. We might be able to upgrade remaining half of WNs in New Panda queue: TOKYO_HIMEM / ANALY_TOKYO_HIMEM (since Dec. 5, 2013) Memory have been built up for half of the WNs (Nov. 6, 2013)

I/O performance study 2013/12/13Tomoaki Nakamura ICEPP, UTokyo10 The number of CPU cores in the new worker node was increased from 8 cores to 16 cores. Local I/O performance for the data staging area may become a possible bottleneck. We have checked the performance by comparing with a special worker node, which have a SSD for the local storage, in the production situation with real ATLAS jobs. The number of CPU cores in the new worker node was increased from 8 cores to 16 cores. Local I/O performance for the data staging area may become a possible bottleneck. We have checked the performance by comparing with a special worker node, which have a SSD for the local storage, in the production situation with real ATLAS jobs. Normal worker node HDD:HGST Ultrastar C10K600, 600GB SAS, 10k rpm RAID1:DELL PERC H710P FS:ext3 Sequential I/O:~150MB/sec IOPS:~650 (fio tool) Special worker node SSD:Intel SSD DC S GB RAID0:DELL PERC H710P FS:ext3 Sequential I/O:~400MB/sec IOPS:~40000 (fio tool)

Results 2013/12/13Tomoaki Nakamura ICEPP, UTokyo11

Direct mount via XRootD 2013/12/13Tomoaki Nakamura ICEPP, UTokyo12 J. Elmsheuser, ATLAS S&C workshop Oct I/O performance [Staging to local disk vs. Direct I/O from storage] Some improvements have been reported especially for the DPM storage. ADC/user’s point of view Jobs will be almost freed from the input file size and the number of input files. But, it should be checked more precisely… I/O performance [Staging to local disk vs. Direct I/O from storage] Some improvements have been reported especially for the DPM storage. ADC/user’s point of view Jobs will be almost freed from the input file size and the number of input files. But, it should be checked more precisely…

Summary of I/O study and others 2013/12/13Tomoaki Nakamura ICEPP, UTokyo13 Local I/O performance The local I/O performance in our worker node have been studied by a comparison with HDD and SSD at the mixture situation of running real ATLAS production jobs and analysis jobs. We confirmed that HDD in the worker node at Tokyo Tier2 center is not a bottleneck for the long batch type jobs at least for the situation of 16 jobs running concurrently. The same thing should be checked also for the next generation worker node, which have more CPU cores greater than 16 cores, and also for the ext4 or XFS file system. The improvement of the I/O performance with respect to the direct mount of DPM should be confirmed by ourselves more precisely. Local network usage should be checked toward the next system design. DPM: The database for the DPM head node become very huge (~40GB). We will add more RAM to the DPM head node, and it will be 128GB in total. We are also planning to have Fusion I/O (high performance NAND type flush memory connected by PCI-express bus) for the DPM head node to increase the maintainability of the DB. Redundant configuration of the MySQL-DB should be studied for the daily backup. Wide area network: We will migrate all production instance to LHCONE ASAP (to be discussed on Monday). Tier3: SL6 migration is on going. CVMFS will be available to use ATLAS_LOCAL_ROOT_BASE and nightly release. Local I/O performance The local I/O performance in our worker node have been studied by a comparison with HDD and SSD at the mixture situation of running real ATLAS production jobs and analysis jobs. We confirmed that HDD in the worker node at Tokyo Tier2 center is not a bottleneck for the long batch type jobs at least for the situation of 16 jobs running concurrently. The same thing should be checked also for the next generation worker node, which have more CPU cores greater than 16 cores, and also for the ext4 or XFS file system. The improvement of the I/O performance with respect to the direct mount of DPM should be confirmed by ourselves more precisely. Local network usage should be checked toward the next system design. DPM: The database for the DPM head node become very huge (~40GB). We will add more RAM to the DPM head node, and it will be 128GB in total. We are also planning to have Fusion I/O (high performance NAND type flush memory connected by PCI-express bus) for the DPM head node to increase the maintainability of the DB. Redundant configuration of the MySQL-DB should be studied for the daily backup. Wide area network: We will migrate all production instance to LHCONE ASAP (to be discussed on Monday). Tier3: SL6 migration is on going. CVMFS will be available to use ATLAS_LOCAL_ROOT_BASE and nightly release.