Download presentation
Presentation is loading. Please wait.
1
Gridifying the LHC Data
Peter Kunszt CERN IT/DB EU DataGrid Data Management P.Kunszt Openlab
2
Outline The Grid as a means of transparent data access
Current mode of operations at CERN Elements of Grid data access Current capabilities of the EU DataGrid/LCG-1 Grid infrastructure Outlook P.Kunszt Openlab
3
Outline The Grid as a means of transparent data access
Current mode of operations at CERN Elements of Grid data access Current capabilities of the EU DataGrid/LCG-1 Grid infrastructure Outlook P.Kunszt Openlab
4
The Grid vision Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships. P.Kunszt Openlab
5
Grids: Elements of the Problem
Resource sharing Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy, negotiation, payment, … Coordinated problem solving Beyond client-server: distributed data analysis, computation, collaboration, … Dynamic, multi-institutional virtual organizations Community overlays on classic organization structures Large or small, static or dynamic P.Kunszt Openlab
6
Grid middleware architecture hourglass
Current Grid architectural functional blocks: Specific application layer ALICE ATLAS CMS LHCb Common application layer LCG EU DataGrid middleware Advanced Grid Services GLOBUS 2.2 Basic Grid Services OS, Storage & Network services P.Kunszt Openlab
7
Grid Middleware Cloud Authentication, Authorization
Requirement parsing Resource matching Resource allocation Accessibility Have many resources available – but where should the job be run? P.Kunszt Openlab
8
Vision of Grid Data Management
Distributed Shared Data Storage Ubiquitous Data Access Transparent Data Transfer and Migration Consistency and Robustness Optimisation P.Kunszt Openlab
9
Vision of Grid Data Management
Distributed Shared Data Storage Different architectures Heterogenous data stores Self-describing data and metadata GRID P.Kunszt Openlab
10
Vision of Grid Data Management
Ubiquitous Data Access Global Namespace Transparent security control and enforcement Access from anytime anywhere, physical data location irrelevant Automatic Data Replication and Validation GRID P.Kunszt Openlab
11
Vision of Grid Data Management
Transparent Data Transfer and Migration Protocol negotiation and multiple protocol support Management of data formats and database versions GRID P.Kunszt Openlab
12
Vision of Grid Data Management
Consistency and Robustness Replicated data is reasonably up-to-date Reliable data transfer Self-detecting and self-correcting mechanisms upon data corruption GRID X P.Kunszt Openlab
13
Vision of Grid Data Management
Optimisation Customisation or self-adaptation to specific access patterns Distributed Querying, Data Analysis and Data Mining ? GRID ! P.Kunszt Openlab
14
Existing Middleware for Grid Data Management - Overview
Globus GridFTP Replica Catalog Replica Manager EU DataGrid GDMP Spitfire Condor NeST PPDG Magda JASMine GDMP SAM Griphyn/iVDGL Virtual Data Toolkit Storage Resource Broker Storage Resource Manager ROOT Alien Nimrod-G Legion Not exhaustive P.Kunszt Openlab
15
What you would like to see
…. and nice to look at reliable available powerful calm cool easy to use P.Kunszt Openlab
16
Outline The Grid as a means of transparent data access
Current mode of operations at CERN Non-Grid operations Grid operations Elements of Grid data access Current capabilities of the EU DataGrid/LCG-1 Grid infrastructure Outlook P.Kunszt Openlab
17
Current non-Grid Operations (Oversimplified)
Fabric Storage P.Kunszt Openlab
18
Current non-Grid Operations
Planning of Resources (computing and storage) Projects and job schedules Data placement Monitoring of Running jobs Available resources Alarms P.Kunszt Openlab
19
Desktop CERN – Tier 0 FNAL RAL IN2P3 Lab a Uni b Lab c Uni n
Department Desktop CERN – Tier 0 Tier 1 FNAL RAL IN2P3 622 Mbps 2.5 Gbps 155 mbps Tier2 Lab a Uni b Lab c Uni n
20
Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x
Tier3 physics department Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group Germany Tier 1 USA UK France Italy ………. CERN Tier 1 Japan CERN Tier 0 Desktop grid for a physics study group
21
See talk in the afternoon
Grid Testbed Today Currently largest Grid Testbed: EU DataGrid Not a full-fledged Grid fulfilling the Grid vision Pragmatic: what can be done today Research aspect: trying out novel approaches Operation of each Grid Site: Huge effort at each computing center for installation and operations support Local user support is necessary but not sufficient Operation of the Grid as a logical entity Complex management – coordination effort among Grid centers concerning Grid middleware updates, but also policies, trust relationships Grid middleware support through many channels Complex interdependencies of Grid middleware See talk in the afternoon P.Kunszt Openlab
22
Outline The Grid as a means of transparent data access
Current mode of operations at CERN Elements of Grid data access Current capabilities of the EU DataGrid/LCG-1 Grid infrastructure Outlook P.Kunszt Openlab
23
Grid Data Access Elements: Storage and Transfer
Grid Storage Resource Manager Managed storage GSI enabled Mass Storage System interface Grid accessible Relational Database Data transfer mechanism between sites in place Site 1 Site 2 Site 3 P.Kunszt Openlab
24
Grid Data Access Elements: I/O
Transfer protocols (gsiftp, https, scp…) File System and/or Posix I/O for direct read/write of files from Worker Nodes SQL or equivalent interface to relational data from Worker Nodes Storage Transfer Protocols Posix or SQL Interface P.Kunszt Openlab
25
Grid Data Access Elements: Catalogs
Grid Data Location Service Find location of all identical copies (replicas) Metadata Catalogs File/Object specific metadata Logical names Collections Grid Database Object Catalog catalog P.Kunszt Openlab
26
Higher Level Data Management Services
Customizable pre- and post-processing services Transparent encryption and decryption of data for transfer and storage External catalog updates Optimization Services Location of the best replica based on access ‘cost’ Active preemptive replication based on usage patterns Automated replication service based on subscriptions Data Consistency Services Data Versioning Consistency between replicas Reliable data transfer service Consistency between catalog data and the actual stored data Virtual Data Services On-the-fly data generation P.Kunszt Openlab
27
Outline The Grid as a means of transparent data access
Current mode of operations at CERN Elements of Grid data access Current capabilities of the EU DataGrid/LCG-1 Grid infrastructure Outlook P.Kunszt Openlab
28
File Names GUID – Global Unique IDentifier
guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 LFN – Logical File Name lfn:presentation LCN – Logical Collection Name lcn:storage_workshop_presentations SFN – Site File Name sfn://ccgridli02.in2p3.fr/edg/SE/dev/wpsix/higgs/data/123.ppt TURL – Transport URL file:///home/pkunszt/presentation.ppt rfio://srm.cern.ch/castor/user/p/pkunszt/presentation.ppt gsiftp://pcrd24.cern.ch/data/pkunszt/pfn10_1.ppt P.Kunszt Openlab
29
Data Services in EDG/LCG1: Storage and I/O
Grid Storage Element for files: Understands SFNs – maps into TURL GSI enabled interface: Storage Resource Manager Support of different MSS backends Support for GridFTP and RFIO (CASTOR) Will be deployed this week in the EDG testbed for the first time. GridFTP Server for files: only TURL (gsiftp://) Also GSI enabled Only FTP-like functionality, no management capabilities Current remote I/O NFS RFIO GridFTP P.Kunszt Openlab
30
Data Services in EDG/LCG1: Database access
Spitfire Thin client for GSI-enabled database access Customizable, API exposed through a Web Service WSDL Not suitable for large result sets Not used by HEP applications yet P.Kunszt Openlab
31
Data Services in EDG/LCG1: Replica Location Service RLS
Distributed file catalog Stores GUID → SFN mappings Stores replication metadata (e.g. file size, creator, MD5 checksum) on SFNs Local Catalogs hold the actual name mappings Remote Indices respond with the list of LRCs most probably having an entry on the file LRCs are configured to send index updates to any number of RLIs Indexes are Bloom Filters; Implementation using Web Service Technology Scales well The Replica Location Service has two sub-service components: Local Replica Catalogs (LRC) reside at each site close to the actual data store, cataloging only the data that is at the site. So usually there will be one LRC per Storage Element. Replica Location Indices (RLI) are light-weight services that hold Bloom Filter index bitmaps of LRCs. The LRCs can be configured dynamically to send indices to any number of RLIs. The LRC computes a Bloom Filter bitmap that is compressed and sent to each subscribed RLI. Bloom filters are compact data structures for probabilistic representation of a set in order to support membership queries (i.e. queries that ask: “Is element X in set Y?”). This compact representation is the payoff for allowing a small rate of false positives in membership queries; that is, queries might incorrectly recognize an element as member of the set. The ratio of false positives can be optimized by tuning various parameters of the filter. Small false positive rates require more computation time and memory. Acceptable false positive rates are around Hence RLIs answer lookups by responding with the names of the LRCs that might have a copy of the given file. The LRCs also need to be contacted to get a definitive answer. This method is used in Peer-to-Peer networks successfully, up to a certain size of the network. In the EU DataGrid we don’t foresee more than a few dozen sites, at most O(100), for which this algorithm scales still very well. P.Kunszt Openlab
32
Data Services in EDG/LCG1: Replica Metadata Catalog
Single logical service for replication metadata Deployement possible as a high-availability service (Web service technology) Possibility of synchronized data on many sites to avoid a single site entry point (Using underlying database technology) Holds Logical File Name (LFN) → GUID mappings (“aliases”) Contains LCN → set of GUIDs mapping (“collections”) Holds replication metadata on LFNs, LCNs and GUIDs Might hold a small amount of replica-specific application metadata – O(10) items P.Kunszt Openlab
33
Data Services in EDG/LCG1: Higher Level Services
EDG Replica Manager ERM Coordinates all replication service Replica Optimization Service ROS Rely on Network Monitoring (iperf) between Testbed sites Rely on Storage Element access cost method (estimated time to stage a file) Summarize network costs for generic access requests Allows Replica Manager to choose ‘best’ replica P.Kunszt Openlab
34
Replication Services Interactions
P.Kunszt Openlab
35
Outlook We took only the first step on the long road to fulfilling the Grid vision Promising initial results but a lot of work still needs to be done Industry has only recently joined the Grid community through GGF – industrial-strength middleware solutions are not available yet By the end of this year we’ll have a first experience with a Grid infrastructure that was built for production from the start (LCG1). P.Kunszt Openlab
36
Open Grid Services Architecture
OGSA is a framework for a Grid architecture based on the Web Service paradigm. Every service is a Grid Service. The main difference to Web Services is that Grid Services may be stateful services. These Grid Services interoperate through well-understood interfaces. The reference implementation, Globus Toolkit 3 is still in beta, the first release is expected for this summer. Depending on the evolution until the end of the year, we will see whether OGSA becomes stable enough to be considered for integration into LCG next year. P.Kunszt Openlab
37
Thank you for your attention
P.Kunszt Openlab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.