COMP_3:Grid Interoperability and Data Management CC-IN2P3 and KEK Computing Research Center FJPPL Annecy June 15, 2010.

Slides:



Advertisements
Similar presentations
30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Advertisements

Peter Berrisford RAL – Data Management Group SRB Services.
19/05/2011 CSTS File transfer service discussions CSTS-File Transfer service discussions (2) CNES position.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Generic policy rules and principles Jean-Yves Nief.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction.
Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1, A.Hasan 2, F.Di Lodovico 3, Y.Watase 1, T.Sasaki 1 1. High.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Grid Engine Riccardo Rotondo
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
OSG Public Storage and iRODS
Status of Tsukuba, KEK, 21 September 2010 Sylvain Reynaud.
CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
Workshop KEK - CC-IN2P3 KEK new Grid system 27 – 29 Oct. CC-IN2P3, Lyon, France Day2 14: :55 (40min) Koichi Murakami, KEK/CRC.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
London e-Science Centre GridSAM Job Submission and Monitoring Web Service William Lee, Stephen McGough.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
EGEE-Forum – May 11, 2007 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A gateway platform for Grid Nicolas.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
H IGH E NERGY A CCELERATOR R ESEARCH O RGANIZATION KEKKEK High Availability iRODS System (HAIRS) Yutaka Kawai, KEK Adil Hasan, ULiv December 2nd, 20091Interoperability.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Managing Petabytes of data with iRODS at CC-IN2P3
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
RENKEI:UGI Takashi Sasaki. Project history The RENKEI project led by Prof. Ken Miura of NII is funded by MEXT during JFY The goal of the project.
FJPPL meeting Lyon, 17th of February 2010 Sylvain Reynaud.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
CC-IN2P3 Pierre-Emmanuel Brinette Benoit Delaunay IN2P3-CC Storage Team 17 may 2011.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Grid interoperability developments at CC-IN2P3 Lyon, 7 September 2010 Sylvain Reynaud.
Mardi 14 juin 2016 JUX (Java Universal eXplorer) Pascal Calvat.
JUX (Java Universal eXplorer) Pascal Calvat. Several grid in the world middleware ARCGOSNAREGI 2.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
ILC_3: DISTRIBUTED COMPUTING TOWARD ILC (PROPOSAL) CC-IN2P3 and KEK Computing Research Center (KEK-CRC) Hiroyuki Matsunaga (KEK) 2014 Joint Workshop of.
IRODS at CC-IN2P3: overview Jean-Yves Nief. Talk overview iRODS in production: –Hardware setup. –Usage. –Prospects. iRODS developpements in Lyon: –Scripts.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid Interoperability and Data Management KEK-CRC & CC-IN2P3 Yonny CARDENAS JFPPL09 Workshop, Tsukuba, May 2009.
Grid Interoperability
StoRM: a SRM solution for disk based storage systems
JUX (Java Universal eXplorer)
FJPPL Lyon, 13 March 2012 Sylvain Reynaud, Lionel Schwarz
CC-IN2P3 Lyon March 14, 2012 Yoshimi KEK
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Interoperability of Digital Repositories
Elisa Ingrà – Consortium GARR
Grid Engine Riccardo Rotondo
Grid Engine Diego Scardaci (INFN – Catania)
Presentation transcript:

COMP_3:Grid Interoperability and Data Management CC-IN2P3 and KEK Computing Research Center FJPPL Annecy June 15, 2010

Members (2010) Japan(KEK) M. Nozaki T. Sasaki Y. Watase G. Iwai Y. Kawai S.Yashiro Y. Iida France(CC-IN2P3) D. Boutigny G. Rahal S. Reynaud F. Hernandez J.Y. Nief Y. Cardenas P. Calvat 2FJPPL 2010

Activities in Cooperative development of SAGA and iRODS – Please see following slides Workshop at Lyon on February 17 – Status report on each side – SAGA and iRODS development discussion – 3 Japanese visited Lyon FJPPL+other KEK budget One had to cancel the trip because the person was suspected to be influenced by a swein flu 3FJPPL 2010

Common concern GRID interoperability – How we build the world wide distributed computing infrastructure in HEP and related fields? Different middleware are deployed and operated in the different region Data handling in smaller experiments – Should be simple, but efficient enough 4FJPPL 2010

SAGA Virtualization of Grid/cloud resources and Grid interoperability 5FJPPL 2010

International collaboration and computing resource e-Science infrastructures are developed and operated independently and not compatible each other Japan:NAREGI United states: globus and VDT(OSG) Europe: gLite(EGEE), ARC, UNICORE How they can share the resources? How they can develop the software together? Local computing resource Grid Interoperability and SAGA will be the key. 6FJPPL 2010

SAGA Simple API for Grid Applications – The API to provide the single method to access the distributed computing infrastructure, such as “cloud”, GRID, local batch schedulers and independent local machines. API definition itself is language independent – This is the technology for a world size collaboration, such as Belle-II or ILC Different institutes depends on different technologies – There are implementations in two languages JAVA – JSAGA: CC-IN2P3 – JAVA SAGA C++ (SAGA-C++): KEK and others – Python and C languages bindings are also available 7FJPPL 2010

The aim of the project Exchange the knowledge and information Converge two implementations in the future 8FJPPL 2010

Converging JSAGA and SAGA-C++ SAGA-C++Java SAGAJSAGA Java GAT SAGA Java BindingSAGA C Binding PySAGA JySAGA the most used SAGA Python Binding Boost-based implementation? C Python Jython a user application another user application 9FJPPL 2010

Converging JSAGA and SAGA-C++ SAGA-C++Java SAGAJSAGA Java GAT SAGA Java BindingSAGA C Binding PySAGA JySAGA the most used SAGA Python Binding Boost-based implementationJPySAGA C Python Jython a user application another user application 10FJPPL 2010

Converging all SAGA implementations SAGA-C++Java SAGAJSAGA Java GAT SAGA Java BindingSAGA C Binding JySAGA common SAGA Python Binding (PySAGA ?) Boost-based implementationJPySAGA C Python Jython a user application another user application 11FJPPL 2010

JPySAGA Developed by J. Devemy (CC-IN2P3) – based on Compatible with reference implementations of… – Python (CPython) – SAGA (python binding of SAGA-C++) First release available for download – namespace, file system and replica functional packages only – execution management functional package will come soon… – Will be used by to integrate JSAGA into –. (Distributed Infrastructure with Remote Agent Control) 12FJPPL 2010

Summary of KEK activities related SAGA This activity is a part of the RENKEI project – RENKEI:Resource Linkage for e-Science – funded by MEXT during JFY Job adaptors for NAREGI, PBSpro and Torque have been implemented File adaptors for NAREGI(Gfarm v1 and v2) has been implemented also File adaptors for RNS and iRODS are under development Service Discovery for NAREGI will be implemented 13FJPPL 2010

RNSiRODSgLiteNAREGIPBSPro/torqueLSFcloudglobus SAGA File adaptors SAGA Job adaptors Unified GRID Interface(UGI) 14 RNSSAGA-C++ Python Interface (Unified GRID Interface ) RENKEI-KEK Goal: Hide the differences of underlying middleware from users Single commands set will work for everything OGF standards FJPPL 2010

Summary of CC-IN2P3 activities related SAGA Latest developments JSAGA plug-ins for – gLite-LFC, by – Globus GK with Condor for OSG, by – SSH with offline monitoring, by JSAGA core engine – many improvements (scalability, features…) Next developments JSAGA plug-ins for – ARC (NorduGrid) – DIET (Decrypton) – Grid Engine (next batch system at CC-IN2P3) Service Discovery API (SAGA extension) GridRPC SAGA package – needed for DIET 15FJPPL 2010

IRODS Data handling for small size projects 16FJPPL 2010

What is iRODS? iRODS is the successor of SRB – Data management software Meta data catalogue and rule based data management – Considered as a data Grid solution – The project is led by Prof. Reagan Moore of North Carolina University 17FJPPL 2010

iRODS service at KEK HPSS (Tape library) iRODS server DB server (ICAT) HPSS-VFS  iRODS server × 4  IBM x3650  QX5460 (4 core)  Memory 8GB  HDD 293.6GB + 600GB  RHEL 5.2  iRODS 2.1  HPSS-VFS client  GPFS client  Postgres server –IBM x3650 –QX5460 (4 core) –Memory 8GB –HDD 293.6GB –RHEL 5.2 –Postgres  HPSS –TS3500 –HPSS p –3PB in maximum (3000 vols) –10TB cache disk –10 tape drives –5 movers –2 VFS servers

Client tools  Client tools  i-commands  JUX (GUI Application)  Davis (Web Application)

Client tools  JUX (Java Universal eXplorer)  Works on Linux, Windows and Mac Looks like windows explorer Looks like windows explorer  visually confirm the file structuring  copy the files by drag and drop  not able to recognize the replicated files  not able to handle Japanese character

Client tools  Davis (A webDAV-iRODS/SRB)  running Jetty and Apache on iRODS server Useful for a small laboratory in a university Useful for a small laboratory in a university  don’t need a special software at client side  use only https port  not able to upload/download some files at the same time  not support parallel transfer

KEK wiki page  Wiki page in Japanese for end users  what is iRODS  how to use at KEK  how to install  how to make rule  how to make MS  …

storage raw data 20~50TB/year in each groups raw data simulateddata MLF : Materials and Life Science Experimental Facility

Use case Scenario raw data simulateddata  Raw data is used once  Simulated data can be accessed from collaborators raw data  After processing, move to KEK storage raw data simulateddata  Replicate between J-PARC and KEK  After a certain term, delete from J-PARC  Keep it forever at KEK  Data preservation and distribution for MLF groups

iCAT From J-PARC to Collaborators J-PARC (Tokai) KEK (Tsukuba) Collaborators (Internet) Storage Storage HPSS iRODSServer iRODS Client DataServer iRODSServer Web Client iRODS Client HPSS Client(?) iCAT iRODSServer

Rules and Micro-services  Main Rules 1.All created data should be replicated to the KEKCC storage 10 min later. 2.All row data older than 1 week should be removed from the JPARC storage, with checking the existence of their replicated data in the KEKCC storage before removing. 3.All simulated data should be removed in the same way but the period of time can be changed by each research group.

Rules and Micro-services  Created a new micro-service  To detect the files matched with the specified age.  Implemented by Adil Hasan at University of Liverpool  Other experiments use the different rule  Send the files for successful runs only  Check file sizes and age

Client Speed Performance  Data transfer between KEK and J-PARC HPSS iput: 43MB/s iget: 40MB/s pftp put: 26MB/s pftp get: 33MB/s scp: 24MB/s scp: 4MB/s J-PARCKEK iRODSserver iRODS ssh HPSS workserver

New server setup  Set up parallel iRODS servers  Before March: running 1 iRODS on 2 machine (active & standby)  Now: running separate iRODS on each machine (backup each other)  run the iRODS for each experiment  in order to change the writing user to HPSS for each experiment group  in order to avoid the influence of the congestion of other experiment groups iRODSserver iRODSserveriRODS-AiRODS-B:iRODS-BiRODS-A:iRODS-CiRODS-D:iRODS-DiRODS-C:

iRODS CC-IN2P3 In production since early servers: –3 iCAT servers (metacatalog): Linux SL4, Linux SL5 –6 data servers (200 TB): Sun Thor x4540, Solaris 10. Metacatalog on a dedicated Oracle 11g cluster. HPSS interface: rfio server (using universal MSS driver). Use of fuse-iRODS: –For Fedora-Commons. –For legacy web applications. TSM: backup of some stored data. Monitoring and restart of the services fully automated (crontab + Nagios + SMURF). Automatic weekly reindexing of the iCAT databases. Accounting: daily report on our web site.

iRODS usage: prospects Starting: –Neuroscience: ~60 TB. –IMXGAM: ~ 15 TB ( X and gamma ray imagery). –dChooz (neutrino experiment): ~ 15 TB / year. Coming soon: LSST (astro): –For the IN2P3 electronic test-bed: ~ 10 TB. –For the DC3b data challenge: 100 TB ? Thinking about a replacement of light weight transfer tool (bbftp).  communities: High Energy physics, astrophysics, biology, biomedical, Arts and Humanities.

iRODS contributions Scripts: –Test of icommands functionnalities. icommand: –iscan (release 2.3): admin command. Micro-services: –Access control: flexible firewall. –Msi to tar/untar files and register them in iRODS. –Msi to set ACLs on objects/collections. Universal Mass Storage driver. Miscealeneous (related to the Resource Monitoring System): –Choose best resource based on the load. –Automatic setup of status for a server (up or down).

JUX: Java Universal eXplorer Provide a single GUI for accessing the data on the GRID. JUX tries to be intuitive and easy to use for non-expert users: –use context menus, drag-and-drop… –close to widely used explorer (i.e. Windows explorer) Written in Java by Pascal Calvat. Based on the JSAGA API developed at ccin2p3 by Sylvain Reynaud. JSAGA provides the data management layer: –Protocols: srb, irods, gsiftp, srm, http, file, sftp, zip… –SRB and iRODS plugins are using Jargon. –Can add a plugin easily for a new protocol. JSAGA provides security mechanisms: –Globus proxy, VOMS proxy, Login/Password, X509

JUX: Java Universal eXplorer Download:

iRODS overall assessement iRODS is becoming more and more popular in IN2P3 community and beyond. Very flexible, large amount of functionnalities. Can be interfaced with many different technologies (no limit): –Cloud, Mass Storage, web services, databases, ….  Able to answer a vast amount of needs for our users community. Lot of projects = lot of work for us ! Goal for this year: ~ x00 TB (guess: > 300 TBs). Should reach PB scale very quickly.

FJPPL

FJKPPL? CC-IN2P3, KISTI Super Computing Center and KEK Computing Research Center are agreed to build the three points collaboration – We share the common interests on Grid computing – We will discuss what we will do together The same effort is done in BIO_1 also FJPPL

SUMMARY FJPPL

Summary CC-IN2P3 and KEK-CRC are working to solve the common problems in Grid computing mostly independently, but interactively and complementary – SAGA as the solution for Grid interoperability – iRODS as the solution for data management in smaller size projects Long term collaboration has a benefit – For KEK. CC-IN2P3 is very strong partner who provides useful software tools 39FJPPL 2010