gLite deployment and operation toward the KEK Super B factory

Slides:



Advertisements
Similar presentations
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Advertisements

Technology on the NGS Pete Oliver NGS Operations Manager.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
H IGH E NERGY A CCELERATOR R ESEARCH O RGANIZATION KEKKEK Current Status and Plan on Grid at KEK/CRC Go Iwai, KEK/CRC On behalf of KEK Data Grid Team Links.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
11 ALICE Computing Activities in Korea Beob Kyun Kim e-Science Division, KISTI
H IGH E NERGY A CCELERATOR R ESEARCH O RGANIZATION KEKKEK Current Status and Recent Activities on Grid at KEK Go Iwai, KEK/CRC On behalf of KEK Data Grid.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
Workshop KEK - CC-IN2P3 KEK new Grid system 27 – 29 Oct. CC-IN2P3, Lyon, France Day2 14: :55 (40min) Koichi Murakami, KEK/CRC.
Site Report from KEK, Japan JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden.
Spending Plans and Schedule Jae Yu July 26, 2002.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
KISTI & Belle experiment Eunil Won Korea University On behalf of the Belle Collaboration.
GRID Deployment Status and Plan at KEK ISGC2007 Takashi Sasaki KEK Computing Research Center.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
U.S. ATLAS Computing Facilities Bruce G. Gibbard GDB Meeting 16 March 2005.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
KEK GRID for ILC Experiments Akiya Miyamoto, Go Iwai, Katsumasa Ikematsu KEK LCWS March 2010.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
GRID & Parallel Processing Koichi Murakami11 th Geant4 Collaboration Workshop / LIP - Lisboa (10-14/Oct./2006) 1 GRID-related activity in Japan Go Iwai,
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Accessing the VI-SEEM infrastructure
WLCG IPv6 deployment strategy
Xiaomei Zhang CMS IHEP Group Meeting December
LCG Service Challenge: Planning and Milestones
Belle II Physics Analysis Center at TIFR
Use of Nagios in Central European ROC
ALICE Monitoring
Moroccan Grid Infrastructure MaGrid
LCG Deployment in Japan
EGEE VO Management.
Update on Plan for KISTI-GSDC
David Cameron ATLAS Site Jamboree, 20 Jan 2017
NGS Oracle Service.
Short update on the latest gLite status
Статус ГРИД-кластера ИЯФ СО РАН.
NAREGI at KEK and GRID plans
News and computing activities at CC-IN2P3
Simulation use cases for T2 in ALICE
Interoperability & Standards
SAM Alarm Triggering and Masking
High Energy Physics Computing Coordination in Pakistan
Unsupported middleware migration update
Current Grid System in Belle
Site availability Dec. 19 th 2006
Installation/Configuration
IPv6 update Duncan Rand Imperial College London
The LHCb Computing Data Challenge DC06
Presentation transcript:

gLite deployment and operation toward the KEK Super B factory http://indico.in2p3.fr/conferenceDisplay.py?confId=2804 gLite deployment and operation toward the KEK Super B factory Updates related on Grid since last meeting in November 2008 FJPPL Workshop at CCIN2P3/Lyon February 17th 2010 Go Iwai, KEK/CRC go.iwai@kek.jp Agenda of last meeting can be found at http://kds.kek.jp/conferenceDisplay.py?confId=2387

Outline & Introduction Resource & Ops Stats JP-KEK-CRC-02 has been fully upgraded and certified in March 2009 As a part of Central Information Service (CIS) Higher availability & more robustness Xen-based VM technologies are widely supported in whole site Storage HPSS works as the backend disk of SE Still using DPM ~10TB initial capability Should be upgraded on VO-demand basis ~200US$/1TB/tape CPU 300kSI2K:10WNs x 8CPUs x ~4kSI2K ~3 times improved than previous site performance Integration with unused resources in Belle Computing System (BCS) JP-KEK-CRC-01 CE/RB based on gLite-3.0/SL3X has been decommissioned in Oct 2009. Still not yet deployed news Up in FY2009 hopefully Central service VOMS.kek.jp has been migrated with newer machine and up on VOMS.cc.kek.jp VO specific report Belle/Belle2: Active ILC Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Resource & Ops Stats at KEK

SD & Opens Tickets for A Year Between Jan 2009 and Dec 2009 Ticket-ID Date Last Update Info 54305  2009-12-23   2009-12-23 04:41   SAM *BDII-check* failed on kek2-bdii.cc.kek.jp@JP... 53382  2009-11-18   2009-11-20 06:59   SRMv2 problem(s) detected for kek2-se.cc.kek.jp (J... 51727  2009-09-23   2009-09-28 04:21   SRMv2 problem(s) detected for kek2-ce01.cc.kek.jp ... 48238  2009-04-28   2009-05-14 06:52   CE problem(s) detected for kek2-ce02.cc.kek.jp (JP... 45143  2009-01-12   2009-01-15 09:11   sBDII problem(s) detected for rls03.cc.kek.jp (JP-... CIS upgrade works in progress 5 tickets were opened, and solved all 561 hours in 5 SDs and 6 unscheduled downtimes 2 months SD for CIS upgrade works is exclusive Certification process Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

CPU Stats for A Year Between Jan 2009 and Dec 2009 33,410 CPU hours (-63%) 33,410 CPU hours / 54,025 jobs (normalized by 1kSI2K) Usually idle state in 2009 due to silent from Belle community ILC is the heaviest consumer for us 54,025 submitted (+72%) Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Read/Write from/to Storage Devices in last 7 months Read size in GB 2.0 TB Also ILC is the heaviest consumer for storage services 2.0 TB of files has been readout 4.2 TB of files has been stored Also strong Written size in GB 4.2 TB Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

FJPPL Workshop @ CCIN2P3/Lyon Recent Changes in VOMS VOMS.kek.jp has been shutdown on 2nd Oct 2009, end of validity of its host certificate. Migrated to voms.cc.kek.jp, which supports ppj, naokek, cdfj, belle, belle2, and g4med VO Validity of proxy cert is extended up to 7days, i.e. 168hours MyProxy does not work properly without special settings User enrollment e.g. https://vomrs.cc.kek.jp:8443/vo/belle/vomrs VOMS VOMRS voms.kek.jp VOMS voms.cc.kek.jp DMZ GRID-LAN Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Logical Network Connection Focused on Grid SINET Both for domestic & international institutes Univ. Grid Univ. Grid Facility Site Univ. Grid Univ. Grid Service Univ. Grid Univ. Grid VO-Central Service Generic Service at Grid Level VOMS VOMRS CA VOMS KEK-1 gLite KEK-2 gLite BCS LFC SRB SRB-DSI MCAT CD-1 NAREGI HSM CD-2 NAREGI Intranet DMZ CC-GRID GRID-LAN VO-Specific Service Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

FJPPL Workshop @ CCIN2P3/Lyon Resource Scale VOMS VOMRS BCS SRB KEK-2 gLite LFC Storage service (DPM) ~10TB with HPSS as backend storage Except for HSM through SRB-DSI Not published information by Glue Computing service System A: As a part of Central Information System (CIS) System B: Recycle-use of old blades in Belle Computing System (BCS) MCAT HSM System B System A # of nodes # of cores Memory size (GB/core) OS Middle SPEC (kSI2K speed) System A 10 80 2 Redhat EL 4.x (x86_64) gLite 3.1 ~300 3 times improved before upgrade in March 2009 System B 250 500 0.5 Scientific Linux 4.x (x86_64) ~800 Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Recent Works Integration of BCS and CIS (Still working in progress!) SRB MCAT MT Ops Room @ South Bldg. of CRC HSM 10G x 2 1st Machine Room @ North Bldg. of CRC (1) Done: BW upgrade construction (2) In progress 1Gx12 SRM-SRB CE CE CE CE (3) Half up! SRB-DSI (GridFTP) SRB KEK-2 gLite HSM Existing computing resource 192 nodes ~ 1,536 cores ~ 4.6 MSI2K Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Summary Table of CPU Resources Legend Up Down or Unseen to LCG # of nodes # of cores Memory size (GB/core) OS Middle SPEC (kSI2K speed) System A CIS 10 80 2.0 Redhat EL 4.x (x86_64) gLite 3.1 ~300 System B Old BCS 250 500 0.5 Scientific Linux 4.x (x86_64) ~800 System C BCS 96 768 0.5 to 1.0 Upgrade in plan Redhat clone 4.x (x86_64) ~2,500 System D Redhat clone 5.x (x86_64) gLite 3.2 Total ~452 ~2,000 ~6,000 Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Resource Scale – LCG Infrastructure 4,740 of 20,941 kSI2K 2,500kSI2K joins rest of half rack in Feb 2010 http://gridmap.cern.ch Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Ongoing works but up soon (1) # of nodes # of cores Memory size (GB/core) OS Middle SPEC (kSI2K speed) System A CIS 10 80 2.0 Redhat EL 4.x (x86_64) gLite 3.1 ~300 System B Old BCS 250 500 0.5 Scientific Linux 4.x (x86_64) ~800 System C BCS 96 768 0.5 to 1.0 Upgrade in plan Redhat clone 4.x (x86_64) ~2,500 System D Redhat clone 5.x (x86_64) gLite 3.2 Total ~452 ~2,000 ~6,000 System B is being retired from LCG and relocated for next activities 100 of 250 nodes for evaluation of ISF, a product of Platform Computing Corporation 150 of 250 nodes for evaluation of NAREGI-LSF adaptor Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Ongoing works but up soon (2) To expose HSM to Grid SRB-DSI to bridge SRB inside BCS Just for GridFTP access, not for SRM and LFC To enable access with logical file name, e.g. lcg-xxx lfn://logical-file-name SRM-SRB for Site URL, e.g. srm://srm.kek.jp/ SRB-DSI for Transfer URL, e.g. gridftp://disk.kek.jp/ A bit behind of schedule Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

VO Specific Report Belle Belle2 (New) ILC

SAGA-WG/OGF27 @ Banff, Alberta, Canada News Belle2.org has been recently registered in CIC toward Belle2 experiment Still not launched as an official name though Thanks to Rolf Rumler for his support Current computing model and software in Belle has been established 10 years ago. Users even in outside of KEK traditionally logon PC farm and submitting their jobs. Brand-new computing model toward Belle2 is being discussed in collaboration with a view to using Grid (and Cloud supplementarily) model. Highly possible that main infrastructures will be LCG. Time and gLite-compatibility is key subject Beam commissioning in Q1 of 2013 All software must work within Q1 of 2012 earlier Huge amount of data handling is also a key over the distributing computing infrastructures Slides and figures are given by T. Hara (Co-chair of comp. modeling group) October 12-15, 2009 SAGA-WG/OGF27 @ Banff, Alberta, Canada

Some metrics to estimate B2 computing resources Sorry! A couple of slides, which show metrics to estimate computing resources for T0 of B2, are removed because those are not yet able to be placed on public places. Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

FJPPL Workshop @ CCIN2P3/Lyon Summary Ops Stats Storage: Read: 2TB in last 7 months Write: 4TB in last 7 months CPU: ILC colleagues are the heaviest consumer in 2009 Resource upgrade +2.5MSI2K in Dec 2009 +2.5MSI2K by Mar 2010 Storage service Integration with SRM and HSM in BCS In operation by the end of FY2009 (Mar 2010) Belle2.org has been launched toward Belle2 experiment Resources estimation, computing model (could be tier structure), and operational model is now being discussed Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon

Thank You Taken at “Sun-kitchi” on last meeting held at Tsukuba on 28th November 2008 Gratefully thanks for this memory Feb 17, 2010 FJPPL Workshop @ CCIN2P3/Lyon