Download presentation
Presentation is loading. Please wait.
Published byAlexander Quinn Modified over 6 years ago
1
Experiences with Data Indexing services supported by the NorduGrid ARC middleware
Oxana Smirnova, Jakob Nielsen (Lund University/CERN) for the NorduGrid collaboration CHEP 2004, Interlaken,
2
NorduGrid and ARC NorduGrid is a research collaboration established by universities in Denmark, Estonia, Finland, Norway and Sweden Focuses on providing production-quality Grid middleware for academic researchers ARC (Advanced Resource Connector) is the Grid middleware developed by the NorduGrid Based on Globus libraries and API Original architectural solutions, services and implementations Supports one of the largest Grid production systems 10 countries, 40+ sites, ~4000 CPUs, ~30 TB storage
3
ARC features ARC is based on Globus Toolkit with core services replaced Currently uses Globus Toolkit 2 Alternative/extended Grid services: Grid Manager that Checks user credentials and authorization Handles jobs locally on clusters (interfaces to LRMS) Does stage-in and stage-out of files Lightweight User Interface with built-in resource broker Information System based on MDS with a NorduGrid schema xRSL job description language (extended Globus RSL) Grid Monitor Simple, stable and non-invasive
4
ARC Indexing Services NorduGrid has standard Globus indexing services deployed: Globus Replica Catalog (RC) Deployed and used since spring 2002 Used as the primary indexing service in ATLAS Data Challenge 1 Globus Replica Location Service (RLS) Deployed and used since spring 2004 Used in ATLAS Data Challenge 2 Both are internally supported by the ARC middleware But both has been unsatisfactory in many ways
5
ARC support for RC and RLS
If a user refers to a logical filename formed as an RC (rc://…) or RLS (rls://… ) pseudo-URL, the ARC UI will Query the indexing service for the size and location of the corresponding physical file Use this information in the resource brokering Furthermore the ARC Grid Manager on the chosen cluster will Contact the corresponding indexing service Obtain the physical filename (URL) Download the file to the cluster If the file exists in the local cluster cache, a symbolic link will be created instead At the end of a job, the ARC Grid Manager can automatically Upload the physical file to a Storage Element Obtains Storage Elements list by querying the RC/RLS Register the file in the given indexing service with the logical filename specified
6
Command Line Interface Support
ARC provides a set of command-line file management tools: ngls lists physical files and their attributes; uses either physical file names (PFN) or logical file names (LFN) ngcopy and ngrequest copy and replicate files registered in indexing services; uses either PFN or LFN Multithreaded and 3rd party transfer supported ngremove deletes physical files and logical records
7
Globus Replica Catalog
Stores mappings between LFN and PFN Supports file-collections and file-attributes (i.e. filesize, checksum) By default runs with an LDAP-backend Problems: Random crashes Automatic restart script necessary LFN has to be identical with PFN Files could only belong to one collection at a time In default installation only clear-text authentication Was patched by NorduGrid to allow GSI-authentication instead Clumsy command-line tool Not supported anymore by Globus Snapshot of the Replica Catalog Browser from ATLAS Data Challenge 1.
8
Globus Replica Location Service
Stores mappings between LFNs and PFNs Supports arbitrary file attributes and searching on them Runs with a SQL server backend Supported are MySQL, PostGreSQL and Oracle Has GSI authentication Reasonably logical and useful API Problems: Until recently there were random freeze-outs Automatic restart script necessary Now patched – stable for last 3+ weeks Does not internally support collections Clumsy command-line tool No fine-grained access control Either no write-access or full write-access Users can overwrite/delete each other's records
9
Indexing Services Requirements for an ideal indexing service supporting multiple users and sub-projects: GSI Authentication Mappings between logical and physical file names Support for collections Files should be allowed to be listed in several collections Empty and recursive collections must be supported Support for arbitrary file attributes Fine-grained access control Possibility to give read-write access to collections, different for different users Distinguish between write-access (e.g. Production managers) and read-access (e.g. regular users) to collections The access restrictions to the physical files should be reflected in the indexing service
10
Summary NorduGrid has deployed the standard Globus Indexing Services: RC and RLS Stress-tested during the ATLAS Data Challenges Support for both RC and RLS is implemented in the ARC Grid Manager, User Interface and the command line tools Neither are satisfactory solutions: Unstable (until recently at least) No decent support for collections Clumsy command-line tools No fine-grained access control Access restrictions of the corresponding physical files are not stored/propagated An indexing service meeting the requirements of the future LHC experiments is being designed at the moment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.