OSG Storage Forum: iRODS September 21-22, 2010 DICE integrated Rule- Oriented Data System: iRODS Leesa Brieger.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

ASPiS - Architecture for a Shibboleth-Protected iRODS System Mark Hedges, Tobias Blanke Centre for e-Research, Kings College London Adil Hasan, Jens Jensen.
Presentations Introduction Case Studies:
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
iRODS: Interoperability in Data Management
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Generic policy rules and principles Jean-Yves Nief.
Understanding Active Directory
IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
OSG Public Storage and iRODS
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Module 11: Implementing ISA Server 2004 Enterprise Edition.
IRODS Service in GIMI. 2 User Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Module 9 User Profiles and Social Networking. Module Overview Configuring User Profiles Implementing SharePoint 2010 Social Networking Features.
Managing Petabytes of data with iRODS at CC-IN2P3
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
IPS Infrastructure Technological Overview of Work Done.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
IRODS at CC-IN2P3: overview Jean-Yves Nief. Talk overview iRODS in production: –Hardware setup. –Usage. –Prospects. iRODS developpements in Lyon: –Scripts.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
iRODS for Research Data Management
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
An Overview of iRODS Integrated Rule-Oriented Data System
Introduction to iRODS Jean-Yves Nief.
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Policy-Based Data Management integrated Rule Oriented Data System
Introduction to Data Management in EGI
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Presentation transcript:

OSG Storage Forum: iRODS September 21-22, 2010 DICE integrated Rule- Oriented Data System: iRODS Leesa Brieger

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS Developed by the Data Intensive Cyber Environments (DICE) group at UCSD and UNC Based on decade-long experience of the Storage Resource Broker (SRB) development Community-driven Open source (BSD license) Supported by RENCI at UNC: 2 DICE

OSG Storage Forum: iRODS September 21-22, 2010 DICE The Issues Data… collection (physical or virtual) sharing/publishing security/integrity auditing/accounting metadata management curation (of remote and local data) preservation (remotely and locally) In a nutshell: data management, i.e. the application of data policy across the data life cycle 3

OSG Storage Forum: iRODS September 21-22, 2010 DICE The Issues - Examples Genome centers: o petabytes of data o researchers sharing data o derived data products from workflows o requirements for traceability and reproducibility (provenance and metadata management) NOAA’s National Climate Data Center (NCDC) o repository management o publishing public data o delivery of services with the data (to the public and to researchers) o support for climate modeling (at ORNL, …) 4

OSG Storage Forum: iRODS September 21-22, 2010 DICE The Issues – More Examples Streamline (automate) data movement (OSG) o distributed jobs o collect data results to a central location for sharing/post-processing o archive data output Institutional repositories o collect a variety of data from a multitude of disparate sources o manage collections independently: some public collections some collections shared between select user or institutional groups (across administrative boundaries) varying life spans some privacy-protected data (legal issues) different integrity requirements break-the-glass scenarios (emergency management) – access permissions change as a function of state information 5

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS as a Data Grid Sharing data across: o geographic and institutional boundaries o heterogeneous resources (hardware/software) Virtual collections of distributed data Global name spaces o data/files o users o storage Metadata catalogue (iCAT) manages mappings between logical and physical name spaces Beyond a single-site repository model 6

OSG Storage Forum: iRODS September 21-22, 2010 DICE My Data: disk, tape, database, filesystem, etc. My Data: disk, tape, database, filesystem, etc. My Data: disk, tape, database, filesystem, etc. My Data: disk, tape, database, filesystem, etc. Partner’s Data remote disk, tape, filesystem, etc. Partner’s Data remote disk, tape, filesystem, etc. User Client iRODS installs over heterogeneous data resources; users view and manage distributed data as a single collection. User sees a single collection iRODS View of Distributed Data iRODS Virtual Collection

OSG Storage Forum: iRODS September 21-22, 2010 DICE 8 DB Schedule & Compute Queue Message Queue iCAT Metadata Catalogue Storage Resources: File Systems, Archives, Databases, Sensor Systems, Clusters,… iRODS Servers: resource servers, metadata catalogue,… iRODS HDF Viewer HDF Visualization iRODS Rich Web Client WebDAV On iPod Windows Browser Command Line Client DropBo x Client FUSE Client Multiple iRODS Clients An iRODS Overview

OSG Storage Forum: iRODS September 21-22, 2010 DICE RENCI VO Data Grid iRODS Server Metadata Catalog (iCAT) DB iRODS Server Client asks for data – request goes to an iRODS server Server contacts the iCAT-enabled server Information (location, access rights, etc) is retrieved from the iCAT Server containing data is signaled to send data to authorized client Client asks for data – request goes to an iRODS server Server contacts the iCAT-enabled server Information (location, access rights, etc) is retrieved from the iCAT Server containing data is signaled to send data to authorized client ECU iRODS Server NCSU UNC-A Duke UNC-CH RENCI, Europa Center iRODS Server

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS Command Line Client: icommands ils icp irepl irsync iput iget imeta – add, modify, read metadata iquest – query the iCAT database 10

OSG Storage Forum: iRODS September 21-22, 2010 DICE Data Grid Security Authenticate each user access o PKI, Kerberos, challenge-response, Shibboleth o Use internal or external identity management system Access controls as constraints imposed between name spaces o Controls on: Files / Storage resources / Metadata o Access controls remain invariant as files move within the data grid Authorization of operations o ACLs (Access Control Lists) on users and groups o Conditional rule execution o ACLs on services not yet implemented, but coming 11

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS Rules and Microservices Functional unit: microservices (C programs) 185 microservices provided out-of-the-box Workflows of microservices: rules iRODS Rule Base set by iRODS administrator (a file that contains the data grid rules: core.irb) Remote execution - run where the data reside Delayed execution (eg daily synchronization) 12

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS Rules and Microservices User communities customize a data grid by writing their own microservices and rules Modules of new microservices shared with the larger user community Modularity increases sense of community architecture User-defined rules can be composed of provided microservices and run on command line (irule) 13

OSG Storage Forum: iRODS September 21-22, 2010 DICE 14 DB Schedule & Compute Queue Message Queue iCAT Metadata Catalogue Storage Resources: File Systems, Archives, Databases, Sensor Systems, Clusters,… iRODS Servers: resource servers (hosting microservice modules), metadata catalogue, rule engine, etc. iRODS HDF Viewer HDF Visualization iRODS Rich Web Client WebDAV On iPod Windows Browser Command Line Client DropBo x Client FUSE Client Multiple iRODS Clients

OSG Storage Forum: iRODS September 21-22, 2010 DICE Microservice Examples msiGetCollectionACL get AC list for a collection msiGetDataObjAVUs retrieve metadata AVU triplets for a data object and return as an XML file msiGetAuditTrailInfoByKeywords msiCopyAVUMetadata copy triplets from one data object to another msiRecursiveCollCopy Core microservices + modules of custom microservices 15

OSG Storage Forum: iRODS September 21-22, 2010 DICE Rules Syntax: rule_name | condition | workflow-chain | recovery-chain Contained in core.irb file getATInfoByObjPath.ir Get Audit Info By Object Path||writeLine(stdout,' ') ##writeLine(stdout," ") ##msiIsData(*objPath,*objID,*foo) ##msiGetAuditTrailInfoByObjectID(*objID,*BUF,*Status) ##writeBytesBuf(stdout,*BUF) ##writeLine(stdout," ")|nop *objPath=/dcape-dev/home/leesa/foo.txt ruleExecOut RuleGen parser for easier syntax 16

OSG Storage Forum: iRODS September 21-22, 2010 DICE Policy-driven Data Management Actions: event-triggered rules o acCreateUser – services to run when creating a new user o acPreProcForDataObjOpen o acPreProcForCollCreate o acPostProcForPut o acPostProcForCopy o acPostProcForCreate o acSetPublicUserPolicy - set the list of operations that are allowable for the user "public" Data administrator determines policy by defining actions and by providing rules and services to users. 17

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS Data Management Sharing & publishing data Virtual collections Security controls for data access Metadata Provide services with the data Remote services - that run where the data reside Policy that follows the data anywhere in the data grid - can even use this to implement your policy on cloud storage 18

OSG Storage Forum: iRODS September 21-22, 2010 DICE National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP) UMD UCSD iCAT Georgia Tech iCAT Federation of Seven Independent Data Grids NARA II iCAT NARA I iCAT Extensible environment: can federate with additional research & education sites Each data grid uses different vendor products. Rocket Center UNC iCAT Federated Data Grids

OSG Storage Forum: iRODS September 21-22, 2010 DICE Current iRODS Applications 20

OSG Storage Forum: iRODS September 21-22, 2010 DICE High Availability iRODS System KEK: High Energy Accelerator Research Organization, Japan Redundancy and load balancing with Pgpool and Director

OSG Storage Forum: iRODS September 21-22, 2010 DICE iRODS at Centre de Calcul de l’Institut de Physique Nucléaire et de Physique des Particules (CC-IN2P3), France iRODS setup 9 servers: o 3 iCAT servers (metacatalog): Linux SL4, Linux SL5 o 6 data servers (200 TB): Sun Thor x4540, Solaris 10 Metacatalog on a dedicated Oracle 11g cluster HPSS interface Use of fuse-iRODS: o For Fedora-Commons o For legacy web applications TSM: backup of some stored data Monitoring and restart of the services fully automated Automatic weekly re-indexing of the iCAT databases Accounting: daily report on web site

OSG Storage Forum: iRODS September 21-22, 2010 DICE CC-IN2P3 Communities: High Energy Physics, astrophysics, biology, biomedical, Arts and Humanities TIDRA: Rhône-Alpes area data grid o biology, biomedical applications animal imagery, human data automatic bulk metadata registration in iRODS based on DICOM files content o Coming soon: synchrotron data (ESRF – Grenoble) o 3 million files registered o up to 60,000 connections per day on iRODS o authentication: using password or grid certificate Beginning: o Neuroscience: ~60 TB o IMXGAM: ~ 15 TB ( X and gamma ray imagery) o dChooz (neutrino experiment): ~ 15 TB / year Coming soon LSST (astro): For the electronic test-bed: ~ 10 TB

OSG Storage Forum: iRODS September 21-22, 2010 DICE PetaShare and the Louisiana Optical Network Initiative (LONI) Petashare: a distributed data archival, analysis and visualization cyberinfrastructure for data-intensive collaborative research, connected through LONI; NSF-funded iRODS for distributed data management and storage infrastructure Novel approach: Treat data storage resources and the tasks related to data access as first class entities just like computational resources and compute tasks Key technologies being developed: Data-aware storage systems, data-aware schedulers (i.e. Stork), and cross-domain meta-data scheme

OSG Storage Forum: iRODS September 21-22, 2010 DICE Australian Research Collaboration Services (ARCS) Services: data, compute, collaboration (web, video), authorization Architecture

OSG Storage Forum: iRODS September 21-22, 2010 DICE ARCS Web Interface (Davis) Basic file operations Metadata editing Permissions Dynamic objects o Configurable interface to run rules or for displaying information WebDAV (Davis) Turns data fabric into a local folder Works on Windows, Mac & Linux Easy to use: drag and drop folders/multiple files Griffin : the GridFTP to iRODS interface; developed by ARCS

OSG Storage Forum: iRODS September 21-22, 2010 DICE Further and Future Possible Applications NASA JPL (evaluation), Goddard OOI (Ocean Observatories Initiative) NOAA (National Climate Data Center) Genomics: Broad Institute, Sanger Institute (UK) NARA Bibliothèque Nationale de France OSG …

OSG Storage Forum: iRODS September 21-22, 2010 DICE Some Current iRODS Development Access to external databases Netcdf integration iDrop client (drag and drop) Integration with storage vendors (plug-in data grids) Metadata capture systems (bioinformatics) Workflow integration RENCI now begins significant support to the DICE group in these and similar efforts. 28