IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
The Storage Resource Broker and.
Peter Berrisford RAL – Data Management Group SRB Services.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Presentations Introduction Case Studies:
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Integrated Rule Oriented Data System (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS extension of the NARA TPAP Using the OAI-PMH J. Ward, A. de.
iRODS: Interoperability in Data Management
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
National Science Foundation Cooperative Agreement: OCI
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
Introduction to The Storage Resource.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
NARA Report: NARA Persistent Archives Prototype Bill Underwood GTRI, Atlanta CCSDS, MOIMS DAI / IPR WGs Toulouse, 2 Nov-5 Nov 2004.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
San Diego Supercomputer Center University of California, San Diego
Technical Issues in Sustainability
Presentation transcript:

iRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Integrated Rule-Oriented Data System –What It Is Origins, How it works, What’s different about it –Why It Is Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts

iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why It Is Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts

What’s Different about iRODS? iRODS lets you manage your data with your rules and in your way… Against a backdrop of federatable community data worldwide via Policies

iRODS Background Integrated Rule-Oriented Data System –Open-source initiative that represents 12+ years of development and over $10M of NSF grant funding –Another $8M+ funding pending (via NSF DataNet) Collaboration between –UNC Chapel Hill Data Intensive Cyber Environments group (DICE) –RENCI State-funded Cyberinfrastructure Institute at UNC Chapel Hill –San Diego Supercomputing Center

iRODS Data and Policy Virtualization RENCI /cuahsi/modeling The iRODS Data Grid installs in a “layer” over storage systems, so you can view, manage, access, add, and share part or all of your data in a unified Collection. Utah State Univ /cuahsi/catalog User Sees Single “Virtual Collection” /cuahsi/catalog /cuahsi/modeling /cuahsi/terrain SDSC /cuahsi/terrain User Client Views & Manages Data Data Grid

Using a Data Grid - Details iRODS Server Rule Engine Data request goes to 1 st Server iRODS Server Rule Engine iRODS Server Rule Engine Server looks up information in Catalog (applies rules) Catalog responds 3 rd Server has data 1 st Server peer-to-peer asks 3 rd Server to serve up data 3 rd Server applies rules and serves data User asks for data using logical properties (client-server) iCAT Metadata Catalog RENCI SDSC USU

Using a Data Grid – NEAR FUTURE (DB Resource) iRODS Server Rule Engine Query goes to 1 st Server iRODS Server Rule Engine iRODS Server Rule Engine Server looks up information in Catalog (applies rules) Catalog responds that 3 rd Server has SQL db 1 st Server sends 3 rd Server SQL query 3 rd Server applies rules and serves query result User not running SQL Server locally makes query iCAT Metadata Catalog USU RENCI SDSC MySQL PostgreSQL Oracle

Example Clients & Client Interfaces (i.e. iRODS is client agnostic) C library calls- Application level.NET- Windows client API Unix shell commands- Scripting languages Java I/O class library (JARGON)- Web services SAGA- Grid API Web browser (Java-python)- Web interface Windows browser- Windows interface WebDAV- iPhone interface Fedora digital library middleware- Digital library middleware Dspace digital library- Digital library services Parrot- Unification interface Kepler workflow- Grid workflow Fuse user-level file system- Unix file system iDrop -Drag and drop GUI -User actions can be mapped to policies

iRODS Policies iRODS is described as a “Policy-based” data management system Policy def’n: A proposed or adopted course of action – ergo iRODS associates a “course of action” for all data Pre- and Post- “Policy Enforcement Points” (PEP) – Pre: Course of action for data coming into iRODS – Post: Course of action for data going out of iRODS

iRODS Policies Retention, disposition, distribution, arrangement Authenticity, provenance, description Integrity, replication, synchronization Deletion, trash cans, versioning Archiving, staging, caching Authentication, authorization, redaction Access, approval, IRB, audit trails, report generation Assessment criteria, validation Derived data product generation, format parsing Federation

iRODS Rule Engine, Workflows iRODS has its own built-in imperative interpreted programming language called the Rule Engine The iRODS Rule Engine executes Microservices An iRODS “program” is called a Workflow – A Microservice is one “step” of an iRODS Workflow – iRODS Workflows are executed on the iRODS Server – Arbitrary external WEB-SERVICES can be one “step” of an iRODS Workflow Encapsulated as a microservice

iRODS Microservices Microservices are written in C and provide: Well, really anything that can be done in C, and that’s in part what makes iRODS so extensible, but typically: – Standard operations; e.g. file or format conversion – Queries on metadata catalog – Interaction with web services – Triggering external HPC workflows – Remote and delayed execution control Microservices communicate through – Arguments, session variables, user space variables, etc.

Differentiating Workflows iRODS data grid workflows – Low-complexity, a small number of operations compared to the number of bytes in the file – Server-side workflows – Data sub-setting, filtering, metadata extraction Grid workflows – High-complexity, a large number of operations compared to the number of bytes in the file – Client-side workflows – Computer simulations, pixel re-projection

A few more iRODS notes… Authentication – GSI (PKI), Kerberos, Shibboleth, Challenge-response Authorization – Roles, user groups, resource groups, policy constraints, ACLs Transport – TCP/IP (parallel I/O streams), Reliable Blast UDP Metadata catalog – PostgreSQL, mySQL, Oracle Distributed rule engine – Scheduler, messaging system, execution engine, rule base

iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why is there an Integrated Rule- Oriented Data System? Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts

Entire Data Life Cycle: The iRODS Vision Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Each data life cycle stage increases the value and usability of the original collection Jeff gets data from a sensor Jeff shares data with colleagues Together w/ colleagues, analyzes data and produces results Results peer- reviewed and published Jeff et. al. hit jackpot: collection now accepted as ref collection for decades Hydrology Datagrid grows in value to ecology and biology and federated

iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why is there an Integrated Rule-Oriented Data System? Context, Role it serves –Where Is iRODS going Today and in the Future? Funding, Key efforts

iRODS: Future Pending 2011 NSF DataNet –DataNet Federation Consortium (DFC) Includes CUAHSI!! (and several others) RENCI: Creating an “Enterprise” version of iRODS 2011UserMeeting-contribution.pdf

Summary iRODS fills an important niche – Differentiation: It’s a Policy-driven distributed data management system formally supporting the entire Data LifeCycle E.g. an iRODS DataGrid is a vehicle to fulfilling NSF’s Data Management Plan requirement at the community scale – Classification: Middleware iRODS is not intended to be all encompassing, but rather work with other DataNets, Workflow Engines, systems like CUAHSI HIS, etc. in canvasing a National Cyberinfrastructure – i.e. Falls primarily in the “Data Services/Storage” portion of NSF’s Data Enabled Science description With iRODS, the community is still responsible for: – Schema, data formats, defining policies, defining web interfaces, building analysis and knowledge tools, etc.

iRODS Credits Principal Investigators Richard Marciano, Reagan Moore (PI), Arcot Rajasekar Additional Contributors William Sims Bainbridge, Leesa Brieger, Luis Carriço, Sheau-Yen Chen, Michael Conway, Jason Coposky, Vijay Dantuluri, Antoine de Torcy, Wei Ding, Kevin Gamiel, Lucas Gilbert, Nuno Guimarães, Chien-Yi Hou, Bernard J. ( Jim) Jansen, Oleg Kapeljushnik, Mounia Lalmas, Christopher A. Lee, Xia Lin, Gary Marchionini, Cathy Marshall, Jason Reilly, Meredith Ringel Morris, Stefan Rüger, Wayne Schroeder, Michael Stealey, Lisa Stilwell, Jaime Teevan, Paul Tooby, Michael Wan, Bing Zhu

iRODS Credits Research Supported By  NSF ITR , Constraint-Based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives (2004–2007)  NARA supplement to NSF SCI , Cyberinfrastructure; From Vision to Reality—Developing Scalable Data Management Infrastructure in a Data Grid-Enabled Digital  NARA supplement to NSF SCI , Cyberinfrastructure; From Vision to Reality—Research Prototype Persistent Archive Extension (2006–2007)  NSF SDCI , SDCI Data Improvement: Data Grids for Community Driven Applications (2007–2010)  NSF/NARA OCI , NARA Transcontinental Persistent Archive Prototype (2008–2012)

iRODS Credits For More Information /renci-teams-with-dice

Thank You.