Download presentation
Presentation is loading. Please wait.
Published byIrma Marshall Modified over 6 years ago
1
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European Union under contract IST
2
Contents Context Application Use Case 1: Physics Analysis
Application Use Case 2: Medical Imaging Application Use Case 3: Data distribution EGEE data services design GGF OGSA-WG,
3
Context: EU DataGrid and EGEE
Build a Grid infrastructure in Europe Three application domains High Energy Physics Bio-Medical applications Earth Observation (not fully in EGEE yet) Provisioning of Grid Middleware Deployment Operation Service-level support and maintenance GGF OGSA-WG,
4
1: Physics Analysis Use Case
Analyzing HEP data involves A (group of) researchers with an algorithm A set of selection criteria on metadata to identify the data to be analyzed Metadata Catalogs Identify a dataset based on a metadata query Data is stored in files. The user navigates in a logical namespace, like a local filesystem The algorithm may need to access files based on the calculation, so the dataset that the analysis runs on is not always fully determined by the metadata query Might need to access data that is initially remote (co-locating data and computation is not always possible as a preparatory step) Large number of data files to be managed (1012) GGF OGSA-WG,
5
Use Case 1 Functional Requirements
Logical File Namespace management The users should see their files as if on a filesystem Browsing directories, listing attributes ACL support File Replica management Keeping track of the actual locations of a file Keeping the master copy in a reliable storage (high QoS) Synchronization of updates Data storage and space management Capability to reserve and query space Transparent access to mass storage (staging to/from tape or other medium) Posix-like file I/O Data Collection and Virtual Data support, based on queries on application metadata GGF OGSA-WG,
6
2: Medical Imaging Use Case
Diagnosing based on sensitive patient data Users: a (group of) doctor(s) Retrieve an image, run algorithm, examine result and write diagnosis, maybe re-run another algorithm. Secure Data Retrieval Patient data is sensitive, needs to be kept anonymous at all times Site admins are not trustworthy – strip or encrypt patient data from image Image in database or secure data store ready for retrieval Replication of data not always allowed High security needs Strong authorization Fine-grained access control mechanisms Leaking patient information results in prosecution. GGF OGSA-WG,
7
Use Case 2 Functional Requirements
Logical File Namespace management ACL support Metadata access control Data processing support Ability to run checks before transfer at source to assure secure state of data (strip patient content, encrypt) User management Correlation of doctor and patient data might also lead to patient identification – anonymized users In general, whatever goes over the wire should be useless to the man-in-the-middle No intermediate copies of files to be kept anywhere Strong application software Intellectual Property Rights management (may not run anywhere) GGF OGSA-WG,
8
3: Data Distribution Use Case
Trigger-based Data Distribution Users: a (group of) scientists Have automatic delivery of data at many sites based on some criteria Trigger may be An Event in the local Store, Catalog, Monitor, … Cron-like events GGF OGSA-WG,
9
Use Case 3 Functional Requirements
Network congestion control Transfers need to be scheduled between the sites Transfer queues ‘Data jobs’ Service Orchestration between data catalogs and data transfer services Transaction boundaries across service invocations Consistency between the catalogs and the actual data availability Service Eventing Event management Registration and cancellation GGF OGSA-WG,
10
EGEE Data Service Interfaces
Storage Element SRM interface (see GGF GSM-WG) Manage a Storage Resource Space reservation Put and retrieve files using various protocols Posix-like File I/O Most posix-compliant feature support Abstraction over existing MSS IO mechanisms File Catalog Management of the logical namespace Replica Catalog Tracking of file replicas Metadata Catalog Application metadata Data Catalog Added functionality by orchestration of the 3 catalogs (providing transaction safety) File Transfer Service Reliable Transfer of files between two sites Pre- and post-processing hooks File Placement Service Transfer and register files Orchestrate File Transfer and Data Catalog services Data Scheduling Service Event-based data transfer, using File Placement Service GGF OGSA-WG,
11
Further Requirements Service distribution
Data Catalog Services need to scale not only in content but also across sites Resilience and Fault Tolerance Service vs. Resource management The mapping between services (VO-controlled) and resources (site-controlled) has to be simple VO and site policies need to be manageable and applicable, their interaction and semantics well-defined Security model Transport-layer vs. message-layer security issues, concerning standards, interoperability and performance Need delegation for service orchestration Need traceability for accounting Simple user management Project Application Requirement Document will be ready soon. GGF OGSA-WG,
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.