INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Data Management Expert Panel - WP2. WP2 Overview.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Access on the Grid Mike Mineter.
E-science grid facility for Europe and Latin America A Data Access Policy based on VOMS attributes in the Secure Storage Service Diego Scardaci.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Maarten Litmaath (CERN), EGEE User Forum, CERN, 2006/03/02 (v3) Use of the SRM interface Use case What is the SRM? –Who develops it? –Is it a standard?
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
INFSO-RI Enabling Grids for E-sciencE gLibrary: a Multimedia Contents Management System on the grid Tony Calanducci INFN Catania,
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
EGEE-II INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Access on the Grid Mike Mineter.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Recent improvements in HLRmon, an accounting portal suitable for national Grids Enrico Fattibene (speaker), Andrea Cristofori, Luciano Gaido, Paolo Veronesi.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Management cluster summary David Smith JRA1 All Hands meeting, Catania, 7 March.
INFSO-RI Enabling Grids for E-sciencE gLite Information System: R-GMA Tony Calanducci INFN Catania gLite tutorial at the EGEE User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
EGEE-II INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Service Mike Mineter.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
INFSO-RI Enabling Grids for E-sciencE Security needs in the Medical Data Manager EGEE MWSG, March 7-8 th, 2006 Ákos Frohner on behalf.
EGEE Data Management Services
Grid based telemedicine application
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
AMGA - Official Metadata Service for EGEE
Security and Replication of Metadata with AMGA
Cross-health enterprises Medical Data Management on the EGEE grid
Medical Data Manager use case: 3D medical images analysis workflow.
Introduction to Data Management in EGI
Encrypted Data Store, Hydra & Delegation Interface
Short update on the latest gLite status
GSAF Grid Storage Access Framework
AMGA Web Interface Vincenzo Milazzo
The AMGA metadata catalog
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger Koblitz

EGEE UF, March 3 rd, Summary of the data access session 2 Enabling Grids for E-sciencE INFSO-RI Data access parallel session ~60 persons attending (pick activity) Talks where grouped in 3 different panels –Metadata and databases access –File access –Applications 3 associated demonstrations

EGEE UF, March 3 rd, Summary of the data access session 3 Enabling Grids for E-sciencE INFSO-RI Agenda Panel on metadata and databases access GDSE: data source oriented computing element –Dr. Giuliano Taffoni, INFN, CNAF ATLAS metadata interface –Thomad Doherty, University of Glasgow The AMGA metadata service –Dr. Birger Koblitz, CERN Oracle on the grid –Bjorn Engsing, Oracle

EGEE UF, March 3 rd, Summary of the data access session 4 Enabling Grids for E-sciencE INFSO-RI DSE: Data Source Engine We define a new Grid component (G-DSE) that enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and Resource Broker. The new Grid Element, finally, can be built on top of the G- DSE component. Handle very long SQL queries just like a CE would handle jobs. the Query Element

EGEE UF, March 3 rd, Summary of the data access session 5 Enabling Grids for E-sciencE INFSO-RI GDSE integration gatekeeper JobMangerQueryManger JobProcessQueryProcess Scheduler p-in Pbs/LFS query plug-in Query DB specific driver GRAM GIS RDBMS MDS GRIS Ldap ldif RDBMS Grid Providers (snmp)

EGEE UF, March 3 rd, Summary of the data access session 6 Enabling Grids for E-sciencE INFSO-RI Features Data source indexing, monitoring, management and recovery GRAM or WS protocol Transactions/queries specified through RSL/JDL The grid WMS is used to support the execution The grid IS is used to monitor the transactions GSI and VOMS based access control –Different roles (administrator, writer, selecter) –Access control at tables and rows level Connects to different RDBMS Supports workflows of query jobs with inter- dependencies Support for replication Application to AstroDBs

EGEE UF, March 3 rd, Summary of the data access session 7 Enabling Grids for E-sciencE INFSO-RI ATLAS Metadata Inteface It is a developing application, which stores and allows access to dataset metadata for the ATLAS experiment It fulfils the need of many database-backed applications by offering a generic web service and servlet interface, through the use of self- describing databases supports geographical distribution with the use of web services and secure access with the use of grid-certificates

EGEE UF, March 3 rd, Summary of the data access session 8 Enabling Grids for E-sciencE INFSO-RI Adaptation of AMI Architecture for gLite Interfaces Web Service client gLite Interface method gLite Interface Implementation Controller class Result returned in XML format

EGEE UF, March 3 rd, Summary of the data access session 9 Enabling Grids for E-sciencE INFSO-RI Features Supports Oracle, MySQL and SQLite DBs gLite metadata interface Web Service interface (AXIS container in tomcat) Authentication: based on certificate DN Very fine grain authorization –Roles –At project or records level –May write ad hoc control classes Secured and well defined interface for providing access to metadata

EGEE UF, March 3 rd, Summary of the data access session 10 Enabling Grids for E-sciencE INFSO-RI AMGA: ARDA metadata interface gLite 1.5 metadata catalog Two modes –With the LFC: bind metadata to files –Standalone: general relational data Front ends –Web Service –proprietary TCP streaming protocol Implementing the gLite metadata interface Versatile, provides both performance and security Security components (optional) –SSL connections –Password/X509 certificates/proxies based authentication –Posix-ACLs and Unix permissions at table and row level Applications: LHCb, Medical Data Management, gLibrary, UnoSat...

EGEE UF, March 3 rd, Summary of the data access session 11 Enabling Grids for E-sciencE INFSO-RI Performances Comparison with LFC and FireMan catalogs

EGEE UF, March 3 rd, Summary of the data access session 12 Enabling Grids for E-sciencE INFSO-RI Replication & Federation modes

EGEE UF, March 3 rd, Summary of the data access session 13 Enabling Grids for E-sciencE INFSO-RI Oracle Free Oracle software –Express edition, limited to 1 CPU Support for Linux on many distributions Provides streams for replication

EGEE UF, March 3 rd, Summary of the data access session 14 Enabling Grids for E-sciencE INFSO-RI Discussion Standards –How does gLite commit to standards –Lot of GGF work invested in defining standards –Difficult to endorse standards as they are evolving and the global picture is not so clear today Security –Common concern, different granularities Replication –Partially implemented in existing databases, different semantics –Should this be implemented at a higher level? Distribution –Some work on information schemas for locating metadata –What about queries on a priori unlocated data? Grids of databases –Let the grid pick the “best” database for you There is room for more research activity!

EGEE UF, March 3 rd, Summary of the data access session 15 Enabling Grids for E-sciencE INFSO-RI Agenda Panel on file access gLite File Transfer Service –Paolo Badino, CERN Encrypted Data Storage in EGEE –Akos Frohner, CERN Storage Resource Manager Interface –Maarten Litmaath, CERN

EGEE UF, March 3 rd, Summary of the data access session 16 Enabling Grids for E-sciencE INFSO-RI File Transfer Service channels Logical unit of management –Represent a directed network pipe between two sites Mono-directional Independently manageable –State –Number of streams –Number of concurrent transfers Inter-VO scheduling –VO share No Routing Between specific host pairs group of hosts

EGEE UF, March 3 rd, Summary of the data access session 17 Enabling Grids for E-sciencE INFSO-RI Transfer Jobs and Files Job –Represent the transfer request –Identified by a GUID File –source-destination file names pair Job States File States

EGEE UF, March 3 rd, Summary of the data access session 18 Enabling Grids for E-sciencE INFSO-RI What SC achieved so far SC3 Rerun (January 2006) All sites achieved target rate 8/11 sites achieved nominal rate

EGEE UF, March 3 rd, Summary of the data access session 19 Enabling Grids for E-sciencE INFSO-RI Encryption/Decryption System Designed to fulfill biomedical application needs –Fine grain access control –Data encryption –Anonimity Based on gLiteIO, FiReMan and an SRM v1.1 Access control through gLiteIO

EGEE UF, March 3 rd, Summary of the data access session 20 Enabling Grids for E-sciencE INFSO-RI Encryption Anonimity: patient data separated from files (stored in AMGA) ACL access control on files (FiReMan) File keys distributed among Hydra servers with ACL

EGEE UF, March 3 rd, Summary of the data access session 21 Enabling Grids for E-sciencE INFSO-RI And decryption Key retrieved from the Hydra key server Data decrypted block by block in memory (OpenSSL cyphers) Encryption also works for output data

EGEE UF, March 3 rd, Summary of the data access session 22 Enabling Grids for E-sciencE INFSO-RI What is the SRM? Client-server interface for Storage Resource Management –De facto standard (see further on), GGF working group  –Secure web service –Defines functions that allow storage resources to be managed from both client and server perspectives  Different requirements, optimizations, concerns SRM collaboration institutes develop different implementations –CERN + RAL + INFN (CASTOR-2) –CERN/LCG (DPM) –FNAL + DESY (dCache) –JLAB (J-SRM) –LBNL (DRM, HRM) –EGRID/INFN/GridIt (StoRM)

EGEE UF, March 3 rd, Summary of the data access session 23 Enabling Grids for E-sciencE INFSO-RI Is the SRM a standard? “The nice thing about standards is that there are so many to choose from.” - Andrew S. Tanenbaum Version 1.1 in widespread use –But implementations have subtle incompatibilities due to ambiguities in the “standard” –Various basic functionalities not defined Version 2.1 implemented to various extents by some projects –Try to get a critical subset implemented on WLCG by autumn 2006  Use cases defined by LHC experiments, see next pages –Still lacks some features –Incompatible with version 1  Clients and servers need to support both versions during transition period (May last a long time) Version 3 definition many months away –Again incompatible

EGEE UF, March 3 rd, Summary of the data access session 24 Enabling Grids for E-sciencE INFSO-RI What should the SRM do? (A. Shoshani, PPDG Review, 28 Apr 2003) Manage space dynamically –Any disk caches and Mass Storage Systems –Space reservation and negotiation –Manage “lifetime” of spaces Manage files dynamically –Pin files in storage till they are released –Manage “lifetime” of files, and action when lifetime expires Manage file sharing –Policies on what to evict when space is needed  Currently always decided by back-end Manage multi-file requests –A brokering function: queue file requests, pre-stage files –Invoke file transfer services Permit site-SRM over multiple storage systems Negotiate transfer protocols

EGEE UF, March 3 rd, Summary of the data access session 25 Enabling Grids for E-sciencE INFSO-RI Discussion Connection between Data Management and jobs scheduling –The file catalog holds information on files location used for scheduling –Jobs are scheduled where data sits –In some cases, data could move where resources are available for computations. –Is this desirable? Legacy code is common in scientific applications –Transparent POSIX access Data encryption –Transparency

EGEE UF, March 3 rd, Summary of the data access session 26 Enabling Grids for E-sciencE INFSO-RI Agenda Panel on applications Space Physics Interactive Data Resource – SPIDR –Dr. Zhinzhin, Russian Academy of Science DLibrary: a multimedia contents manager system –Dr. Tony Calanducci, INFN Catania

EGEE UF, March 3 rd, Summary of the data access session 27 Enabling Grids for E-sciencE INFSO-RI Space Physics Interactive Data Resources SPIDR SPIDR is a de facto standard data source on solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet. SPIDR can work as a fully-functional web- application (portal) or as a grid of web- services, providing functions for other applications to access its data holdings.

EGEE UF, March 3 rd, Summary of the data access session 28 Enabling Grids for E-sciencE INFSO-RI SPIDR components SPIDR portal combines the central XML metadata repository with a set of distributed data web services and data file collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and plot or download in parallel the selected data in different formats, including XML and NetCDF.

EGEE UF, March 3 rd, Summary of the data access session 29 Enabling Grids for E-sciencE INFSO-RI Real-time usage statisics for a given time interval User sessions per day Total ~ registered users Per database requests for plot (red) and export (blue)

EGEE UF, March 3 rd, Summary of the data access session 30 Enabling Grids for E-sciencE INFSO-RI gLibrary usage scenarios Example 1: –Locate all theoretical (PPTType) PowerPoint (Type) presentations about FireMan (Keywords) given in 2005 (Date) by Uncle Sam (Speaker); –Find all the movies (Type) in which Julia Roberts (Cast) performed together with Hugh Grant (Cast) produced in USA (Country) in 2004 (ReleaseDate); or all the acoustic (Genre) mp3 (Format) audio files (Type) of Alanis Morissette (Singer) that last more than 3 minutes (Runtime). Example 2: –A doctor is looking for brain (keyword) DICOM (Type) images of male (Gender) patients older than 65 (Age). Example 3: –A job can behave as a storage crawler: it scans pre-existing files in Storage Elements to extract relevant metadata that will be published on gLibrary for further data mining.

EGEE UF, March 3 rd, Summary of the data access session 31 Enabling Grids for E-sciencE INFSO-RI Example of gLibrary collections /gLPPT PowerPoint /EGEEPPT EGEEDOC /gLDOC Documents /gLVideo Video /gLImage Image /gLAudio Audio Path (refers to a collection) Attributes Entry names /gLTypesCollection Theorical Type 00:30:00 Runtime Valeria Ardizzione, Giuseppe La Rocca Author R-GMA, BDII Topic Giuseppe La Rocca, Valeria Ardizzone Speaker 4 th EGEE Conferen ce Event Date Information Systems 00454dca-a269- 4b93-8a45- c4012af05600 Title Attributes Entry names /EGEEPPTCollection Pop Genre 00:03:27 Duration Dedicato A Te Album MP3 Format Le Vibrazioni Singer Dedicato A Te 4ffaffc8-26e b460-3d5bf08081a4 SongTitle Attributes Entry names /gLAudioCollection ardizzo 00454dca-a269-4b93-8a45- c4012af05600 Passphrase Attributes Entry names /gLKeysCollection “additional features”

EGEE UF, March 3 rd, Summary of the data access session 32 Enabling Grids for E-sciencE INFSO-RI gLibrary Security User Requirements: –a valid proxy with VOMS extensions –VOMS Role and Group needed to be recognized by gLibrary as a contents manager. 3 kinds of users: –gLibraryManager: (s)he can create new content type and allows a generic VO user to become gLibrarySubmitter –gLibrarySubmitters: they can add new entries and define access rights on the entries they create.  Fine-grained permission (reading, writing, listing, decrypting) settings on each entry: whole VO members, VO groups, list of DNs –generic VO users: browse and make queries (on entries they have access to) Basic level of cryptography: –New files saved on SEs can be encrypted beforehand with a symmetric passphrase that will be saved in /gLKeys. Only selected users (that have a specific DN in the subject of their VOMS proxy) can access the passphrase and decrypt the file.

EGEE UF, March 3 rd, Summary of the data access session 33 Enabling Grids for E-sciencE INFSO-RI Features Born as an use case to demonstrate AMGA features Built on top of many gLite services Considering collaboration and integration with NA3 Document Digital Library System Fast → thanks to AMGA Secure → ACLs, encryption, and splitting Easy to use → User friendly Java GUI and portal soon available Easily extensible to support any document types (Medical Images and files, Invoices, Proceedings, Scientific Publications, Newspapers clips, …)

EGEE UF, March 3 rd, Summary of the data access session 34 Enabling Grids for E-sciencE INFSO-RI Discussion SPIDR want to use grids for –Security and access control –Asynchronous access to large amount of data gLibrary –Flexibility of the schema to adapt to many document types –Content analysis / indexing of documents Very different needs for database access => room for many solutions: –GDSE: Time consuming jobs on databases –AMGA: Fast access to small amounts of (returned) metadata –SPIDR: Asynchronous access to large amounts of metadata