Our Work at CERN Gang CHEN, Yaodong CHENG Computing center, IHEP November 2, 2004.

Slides:

Advertisements

Similar presentations

30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.

Advertisements

HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

High Performance Computing Course Notes Grid Computing.

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.

DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.

Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.

16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.

16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.

Data management in grid. Comparative analysis of storage systems in WLCG.

Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.

Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

File and Object Replication in Data Grids Chin-Yi Tsai.

Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2

May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.

4 Oct 04Storage Resource Manager, Timur Perelmutov, Don Petravick, Fermilab 1 Storage Resource Management at Fermilab Timur Perelmutov Don Petravick Fermi.

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.

Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.

Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.

Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.

 CASTORFS web page - CASTOR web site - FUSE web site -

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP

Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.

1 Grid File Replication using Storage Resource Management Presented By Alex Sim Contributors: JLAB: Bryan Hess, Andy Kowalski Fermi: Don Petravick, Timur.

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

1 P.Kunszt LCGP Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management.

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

Fabric Management with ELFms BARC-CERN collaboration meeting B.A.R.C. Mumbai 28/10/05 Presented by G. Cancio – CERN/IT.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.

David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.

CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.

Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.

CASTOR new stager proposal CASTOR users’ meeting 24/06/2003 The CASTOR team.

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen

Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

CASTOR: possible evolution into the LHC era

Jean-Philippe Baud, IT-GD, CERN November 2007

StoRM: a SRM solution for disk based storage systems

Vincenzo Spinoso EGI.eu/INFN

Status of Fabric Management at CERN

GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.

StoRM Architecture and Daemons

Introduction to Data Management in EGI

Ákos Frohner EGEE'08 September 2008

GGF15 – Grids and Network Virtualization

A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,

INFNGRID Workshop – Bari, Italy, October 2004

Presentation transcript:

Our Work at CERN Gang CHEN, Yaodong CHENG Computing center, IHEP November 2, 2004

Outline Conclusion of CHEP04 –computing fabric New developments of CASTOR Storage Resource Manager Grid File System –GGF-WG, GFAL ELFms –Quattor, Lemon, Leaf Others: –AFS, wireless network,Oracle,condor,SLC, InDiCo, lyon visiting, CERN openday (Oct, 16)

CHEP04

Conclusion of CHEP04 CHEP04, from Sept. 26 to Oct. 1 Plenary conference in every morning Seven parallel sessions at each afternoon –Online computing –Event processing –Core software –Distributed computing services –Distributed computing systems and experiences –Computing fabrics –Wide area network Documents: Our Presentations (one talk each person): – two on Sep. 27 and one on Sep. 30

Computing fabrics Computing nodes, disk servers, tape servers, network bandwidth at different HEP institutes Fabrics at Tier0, Tier1 and Tier2 Installation, configuration, maintenance and management of large Linux farms Grid Software installation Monitoring of computing fabrics OS choice: move to RHES3/Scientific Linux Storage observations

Storage stack JASMine dCache/TSM HPSS CASTOR ENSTORE FibreChannel/SATA SAN EIDE/SATA in a box SATA array direct connect iSCSI GPFS XFS ext2/3 SAN FS 1Gb eth 10Gb ethInfiniband StoRM NFS v2 v3 Lustre GoogleFS Chimera PNFS dCache PVFSCASTOR SRB gfarm HW Raid 5HW Raid 1SW Raid 5SW Raid 0 StoRM SRBgfarm SRM Expose to WAN Expose to LAN Local network File Systems Disk Organisation Disks Tape Store

Storage observations Castor and dCache are in full growth –Growing numbers of adopters outside the development sites SRM supports all major managers SRB at Belle (KEK) Not always going for largest disks (capacity driver), already choosing smaller for performance –Key issue for LHC Cluster file system comparisons –SW based solutions allow HW reuse

Architecture Choice 64 bits are coming soon and HEP is not really ready for it! Infiniband for HPC –Low latency –High bandwidth (>700MB/s for CASTOR/RFIO) Balance of CPU to disk resources Security issues –Which servers exposed to users or WAN? High performance data access and computing support –Gfarm file system (Japan)

New CASTOR developments

Castor Current Status Usage at CERN –370 disk servers, 50 stagers (disk pool managers) –90 tapes drives, More than 3PB in total –Dev team (5), Operations team (4) Associated problems –Management is more and more difficult –Performance –Scalability –I/O request scheduling –Optimal use of resource

Challenge for CASTOR LHC is a big challenge A single stager should scale up to handle peak rates of 500/1000 requests per second Expected system configuration –4PB of disk cache, 10 PB stored on tapes per year –Tens of millions of disk resident files –peak rate of 4GB/s from online –10000 disks, 150 tape drives –Increase of small files The current CASTOR stager cannot do it

Vision With clusters of 100s of disk and tape servers, the automated storage management faces more and more the same problems as CPU clusters management –(Storage) Resource management –(Storage) Resource sharing –(Storage) Request access scheduling –Configuration –Monitoring The stager is the main gateway to all resources managed by CASTOR Vision: Storage Resource Sharing Facility

Ideas behind new stager Pluggable framework rather than total solution –True request scheduling: third party schedulers, e.g. Maui or LSF –Policy attributes: externalize policy engines governing the resource matchmaking. move toward full-fledged policy languages, “ GUILE ” Restricted access to storage resources –All requests are scheduled –No random rfiod eating up the resources behind the back of the scheduling system Database centric architecture –Stateless components: all transactions and locking provided by the DB system –Allows for easy stop/restarting components –Facilitates development/debugging

New Stager Architecture Application RFIO/stage API Request Handler Migration and recall LSF CASTOR tape archive components (VDQM, VMGR, RTCOPY) Maui rfiod (disk mover) Master Stager Notification (UDP) Request repository and file catalogue (Oracle or MySQL) Data (TCP) Control (TCP) Disk cache

Architecture Request handling & scheduling RequestHandler Fabric Authentication service e.g. Kerberos-V server Read: /castor/cern.ch/user/c/castor/TastyTreesDN=castor Typical file request Thread pool Authenticate “castor” Request repository (Oracle, MySQL) Scheduler Scheduling Policies user “castor” has priority Job Dispatcher Store request Run request on pub003d Get Jobs Disk server load Catalogue File staged? Request registration: Must keep up with high request rate peaks Request scheduling: Must keep up with average request rates

Security Implementing strong authentication (Encryption is not planned for the moment) Developed a plugin system, based on the GSSAPI so as to use the mechanisms: –GSI, KBR5 And support KBR4 for back compatibility Modifying various CASTOR components to integrate the security layer Impact on the config of machines (need for service keys etc…)

Castor GUI Client Prototype was developed by LIU aigui, on the platform of Kylix 3. If possible, it will be downloadable on CASTOR web site. Still exists many problems Need to be optimized Functionality and performance tests are very necessary

Storage Resource Manager

Introduction of SRM ● SRMs are middleware components that manage shared storage resources on the Grid and provide: ● Uniform access to heterogeneous storage ● Protocol negotiation ● Dynamic Transfer URL allocation ● Access to permanent and temporary types of storage ● Advanced space and file reservation ● Reliable transfer services ● Storage resources refers to: ● DRM: disk resource managers ● TRM: Tape resource managers ● HRM: Hierarchical resource manager

SRM Collaboration Jefferson Lab Bryan Hess Andy Kowalski Chip Watson Fermilab Don Petravick Timur Perelmutov LBNL Arie Shoshani Alex Sim Junmin Gu EU DataGrid WP2 Peter Kunszt Heinz Stockinger Kurt Stockinger Erwin Laure EU DataGrid WP5 Jean-Philippe Baud Stefano Occhetti Jens Jensen Emil Knezo Owen Synge

SRM versions Two SRM Interface specifications –SRM v1.1provides Data access/transfer Implicit space reservation –SRM v2.1 adds Explicit space reservation Namespace discovery and manipulation Access permissions manipulation Fermilab SRM implements SRM v1.1 specification SRM v2.1 by the end of 2004 Reference:

High Level View of SRM SRM Enstore JASMine Client USER/APPLICATIONS Grid Middleware SRM DCache SRM CASTOR

Role of SRM on the GRID SRM-Client SRM cache SRM dCache 6.GridFTP ERET (pull mode) Enstore CASTOR Replica Catalog Network transfer of DATA 1.DATA Creation 2. SRM- PUT Network transfer 3. Register (via RRS) CERN Tier 0 Replica Manager FNAL Tier 1 archive files stage files 4.SRM- COPY Tier0 to Tier1 5.SRM-GET archive files SRM CASTOR Tier 2 Center Network transfer 9.GridFTP ESTO (push mode) 8.SRM-PUT 7.SRM- COPY Tier1 to Tier2 SRM-Client Retrieve data for analysis 10.SRM-GET Users SRM-Client Network transfer of DATA

Main Advantages of using SRM Provides smooth synchronization between shared resources Eliminates unnecessary burden from the client Insulate them from storage systems failures Transparently deal with network failures. Enhance the efficiency of the grid, eliminating unnecessary file transfers by sharing files. Provide a “streaming model” to the client

Grid File System

Introduction There can be many hundreds of petabytes of data in grids, among which a very large percentage is stored in files A standard mechanism to describe and organize file-based data is essential for facilitating access to this large amount of data. GGF GFS-WG GFAL- Grid File Access Library

GGF GFS-WG Global Grid forum, Grid File System Working Group Two goals (two documents) –File System Directory Services Manage namespace for files, access control, and metadata management –Architecture for Grid File System Services Provides functionality of virtual file systemin grid environment Facilitates federation and sharing of virtualized data Uses File System Directory Services and standard access protocols –They will be submitted in GGF13 and GGF14 (2005)

GFS view Transparent access to dispersed file data in a Grid –POSIX I/O APIs –Applications can access Gfarm file system without any modification as if it is mounted at /gfs –Automatic and transparent replica selection for fault tolerance and access-concentration avoidance GRID File System /gfs ggfCN aistgtrc file1file3 file2 file4 file1file2 File replica creation Virtual Directory Tree mapping File system metadata

GFAL Grid File Access Library Grid storage interactions today require using several existing software components: –the replica catalog services to locate valid replicas of files. –The SRM software to ensure: files exist on disk or space is allocated on disk for new files GFAL hides these interactions and presents a Posix interface for the I/O operations. The currently supported protocols are: file for local access, dcap (dCache access protocol) and rfio (CASTOR access protocol).

Compile and Link The function names are obtained by prepending gfal_ to the Posix names, for example gfal_open, gfal_read, gfal_close... The argument lists and the values returned by the functions are identical. –The header file gfal_api.h needs to be included in the application source code –Linked with libGFAL.so –Security libraries: libcgsi_plugin_gsoap_2.3, libglobus_gssapi_gsi_gcc32dbg and libglobus_gss_assist_gcc32dbg are used internally

Basic Design Physics applications GFAL VFS SRM Client Local File I/O Root I/O Open() Read() rfio I/O Open() Read() dCap I/O Open() Read() Replica Catolog Client RC Services SRM services RFIO services dCap services MSS services Local DISK Posix I/O Wide Area Access

File system implementation Two options have been considered to offer a File System view –the way to run standard applications without modifying the source and without re-linking –The Pluggable File System (PFS) built on top of “Bypass” and developed by University of Wisconsin –The Linux Userland File System (LUFS) File system view: /grid/{vo}/… CASTORfs based on LUFS –I developed it –Available –Low efficiency

Extremely Large Fabric management system

ELFms ELFms: Extremely Large Fabric management system Sub Systems: –QUATTOR : system installation and configuration tool suite –LEMON: monitoring framework –LEAF: Hardware and State management

Deploy at CERN ELFms manages and controls most of the nodes in the CERN CC –~2100 nodes out of ~ 2400, to be scaled up to > 8000 in (LHC) –Multiple functionality and cluster size (batch nodes, disk servers, tape servers, DB, web, … ) –Heterogeneous hardware (CPU, memory, HD size,..) –Linux (RH) and Solaris (9)

Quattor Quattor takes care of the configuration, installation and management of fabric nodes A Configuration Database holds the ‘desired state’ of all fabric elements –Node setup (CPU, HD, memory, software RPMs/PKGs, network, system services, location, audit info…) –Cluster (name and type, batch system, load balancing info…) –Defined in templates arranged in hierarchies – common properties set only once Autonomous management agents running on the node for –Base installation –Service (re-)configuration –Software installation and management Quattor was developed in the scope of EU DataGrid. Development and maintenance now coordinated by CERN/IT

Quattor Architecture Configuration Management –Configuration Database –Configuration access and caching –Graphical and Command Line Interfaces Node and Cluster Management –Automated node installation –Node Configuration Management –Software distribution and management Node Configuration Management Node Management

LEMON Monitoring sensors and agent –Large amount of metrics (~ 10 sensors implementing 150 metrics) –Plug-in architecture: new sensors and metrics can easily be added –Asynchronous push/pull protocol between sensors and agent –Available for Linux and Solaris Repository –Data insertion via TCP or UDP –Data retrieval via SOAP –Backend implementations for text file and Oracle SQL –Keeps current and historical samples – no aging out of data but archiving on TSM and CASTOR Correlation Engines and ‘ self-healing ’ Fault Recovery –allows plug-in correlations accessing collected metrics and external information (eg. quattor CDB, LSF), and also launch configured recovery actions –Eg. average number of users on LXPLUS, total number of active LCG batch nodes –Eg. cleaning up /tmp if occupancy > x %, restart daemon D if dead, … LEMON is an EDG development now maintained by CERN/IT

LEMON Architecture LEMON stands for “LHC Era Monitoring”

LEAF -LHC Era Automated Fabric) Collection of workflows for automated node hardware and state management HMS (Hardware Management System) –eg. installation, moves, vendor calls, retirement –Automatically requests installs, retires etc. to technicians –GUI to locate equipment physically SMS (State Management System) –Automated handling high-level configuration steps, eg. Reconfigure, reboot,Reallocate nodes,reconfig –extensible framework – plug-ins for site-specific operations possible –Issues all necessary (re)configuration commands on top of quattor CDB and NCM HMS and SMS interface to Quattor and LEMON for setting/getting node information respectively

LEAF screenshot

Other Activities AFS –AFS documents download –AFS DB servers configuration Wireless network deployment Oracle license for LCG Condor deployment at some HEP institutes SLC: Scientific Linux CERN version lyon visiting (Oct. 27, CHEN gang) CERN OpenDay (Oct, 16)

Thank you!!