Download presentation
Presentation is loading. Please wait.
Published byMarjory Hubbard Modified over 9 years ago
1
iRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill
2
iRODS Integrated Rule-Oriented Data System –What It Is Origins, How it works, What’s different about it –Why It Is Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts
3
iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why It Is Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts
4
What’s Different about iRODS? iRODS lets you manage your data with your rules and in your way… Against a backdrop of federatable community data worldwide via Policies
5
iRODS Background Integrated Rule-Oriented Data System –Open-source initiative that represents 12+ years of development and over $10M of NSF grant funding –Another $8M+ funding pending (via NSF DataNet) Collaboration between –UNC Chapel Hill Data Intensive Cyber Environments group (DICE) –RENCI State-funded Cyberinfrastructure Institute at UNC Chapel Hill –San Diego Supercomputing Center
6
iRODS Data and Policy Virtualization RENCI /cuahsi/modeling The iRODS Data Grid installs in a “layer” over storage systems, so you can view, manage, access, add, and share part or all of your data in a unified Collection. Utah State Univ /cuahsi/catalog User Sees Single “Virtual Collection” /cuahsi/catalog /cuahsi/modeling /cuahsi/terrain SDSC /cuahsi/terrain User Client Views & Manages Data Data Grid
7
Using a Data Grid - Details iRODS Server Rule Engine Data request goes to 1 st Server iRODS Server Rule Engine iRODS Server Rule Engine Server looks up information in Catalog (applies rules) Catalog responds 3 rd Server has data 1 st Server peer-to-peer asks 3 rd Server to serve up data 3 rd Server applies rules and serves data User asks for data using logical properties (client-server) iCAT Metadata Catalog RENCI SDSC USU
8
Using a Data Grid – NEAR FUTURE (DB Resource) iRODS Server Rule Engine Query goes to 1 st Server iRODS Server Rule Engine iRODS Server Rule Engine Server looks up information in Catalog (applies rules) Catalog responds that 3 rd Server has SQL db 1 st Server sends 3 rd Server SQL query 3 rd Server applies rules and serves query result User not running SQL Server locally makes query iCAT Metadata Catalog USU RENCI SDSC MySQL PostgreSQL Oracle
9
Example Clients & Client Interfaces (i.e. iRODS is client agnostic) C library calls- Application level.NET- Windows client API Unix shell commands- Scripting languages Java I/O class library (JARGON)- Web services SAGA- Grid API Web browser (Java-python)- Web interface Windows browser- Windows interface WebDAV- iPhone interface Fedora digital library middleware- Digital library middleware Dspace digital library- Digital library services Parrot- Unification interface Kepler workflow- Grid workflow Fuse user-level file system- Unix file system iDrop -Drag and drop GUI -User actions can be mapped to policies
10
iRODS Policies iRODS is described as a “Policy-based” data management system Policy def’n: A proposed or adopted course of action – ergo iRODS associates a “course of action” for all data Pre- and Post- “Policy Enforcement Points” (PEP) – Pre: Course of action for data coming into iRODS – Post: Course of action for data going out of iRODS
11
iRODS Policies Retention, disposition, distribution, arrangement Authenticity, provenance, description Integrity, replication, synchronization Deletion, trash cans, versioning Archiving, staging, caching Authentication, authorization, redaction Access, approval, IRB, audit trails, report generation Assessment criteria, validation Derived data product generation, format parsing Federation
12
iRODS Rule Engine, Workflows iRODS has its own built-in imperative interpreted programming language called the Rule Engine The iRODS Rule Engine executes Microservices An iRODS “program” is called a Workflow – A Microservice is one “step” of an iRODS Workflow – iRODS Workflows are executed on the iRODS Server – Arbitrary external WEB-SERVICES can be one “step” of an iRODS Workflow Encapsulated as a microservice
13
iRODS Microservices Microservices are written in C and provide: Well, really anything that can be done in C, and that’s in part what makes iRODS so extensible, but typically: – Standard operations; e.g. file or format conversion – Queries on metadata catalog – Interaction with web services – Triggering external HPC workflows – Remote and delayed execution control Microservices communicate through – Arguments, session variables, user space variables, etc.
14
Differentiating Workflows iRODS data grid workflows – Low-complexity, a small number of operations compared to the number of bytes in the file – Server-side workflows – Data sub-setting, filtering, metadata extraction Grid workflows – High-complexity, a large number of operations compared to the number of bytes in the file – Client-side workflows – Computer simulations, pixel re-projection
15
A few more iRODS notes… Authentication – GSI (PKI), Kerberos, Shibboleth, Challenge-response Authorization – Roles, user groups, resource groups, policy constraints, ACLs Transport – TCP/IP (parallel I/O streams), Reliable Blast UDP Metadata catalog – PostgreSQL, mySQL, Oracle Distributed rule engine – Scheduler, messaging system, execution engine, rule base
16
iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why is there an Integrated Rule- Oriented Data System? Context, Role it serves –Where It’s Going (Today, Future) Funding, Key efforts
17
Entire Data Life Cycle: The iRODS Vision Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Each data life cycle stage increases the value and usability of the original collection Jeff gets data from a sensor Jeff shares data with colleagues Together w/ colleagues, analyzes data and produces results Results peer- reviewed and published Jeff et. al. hit jackpot: collection now accepted as ref collection for decades Hydrology Datagrid grows in value to ecology and biology and federated
18
iRODS Talk Outline Integrated Rule-Oriented Data System –What is the Integrated Rule-Oriented Data System? Origins, Technology, How it works –Why is there an Integrated Rule-Oriented Data System? Context, Role it serves –Where Is iRODS going Today and in the Future? Funding, Key efforts
19
iRODS: Future Pending 2011 NSF DataNet –DataNet Federation Consortium (DFC) Includes CUAHSI!! (and several others) RENCI: Creating an “Enterprise” version of iRODS –http://iren-web.renci.org/irods-meeting/irods@renci- 2011UserMeeting-contribution.pdfhttp://iren-web.renci.org/irods-meeting/irods@renci- 2011UserMeeting-contribution.pdf
20
Summary iRODS fills an important niche – Differentiation: It’s a Policy-driven distributed data management system formally supporting the entire Data LifeCycle E.g. an iRODS DataGrid is a vehicle to fulfilling NSF’s Data Management Plan requirement at the community scale – Classification: Middleware iRODS is not intended to be all encompassing, but rather work with other DataNets, Workflow Engines, systems like CUAHSI HIS, etc. in canvasing a National Cyberinfrastructure – i.e. Falls primarily in the “Data Services/Storage” portion of NSF’s Data Enabled Science description With iRODS, the community is still responsible for: – Schema, data formats, defining policies, defining web interfaces, building analysis and knowledge tools, etc.
21
iRODS Credits Principal Investigators Richard Marciano, Reagan Moore (PI), Arcot Rajasekar Additional Contributors William Sims Bainbridge, Leesa Brieger, Luis Carriço, Sheau-Yen Chen, Michael Conway, Jason Coposky, Vijay Dantuluri, Antoine de Torcy, Wei Ding, Kevin Gamiel, Lucas Gilbert, Nuno Guimarães, Chien-Yi Hou, Bernard J. ( Jim) Jansen, Oleg Kapeljushnik, Mounia Lalmas, Christopher A. Lee, Xia Lin, Gary Marchionini, Cathy Marshall, Jason Reilly, Meredith Ringel Morris, Stefan Rüger, Wayne Schroeder, Michael Stealey, Lisa Stilwell, Jaime Teevan, Paul Tooby, Michael Wan, Bing Zhu
22
iRODS Credits Research Supported By NSF ITR 0427196, Constraint-Based Knowledge Systems for Grids, Digital Libraries, and Persistent Archives (2004–2007) NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From Vision to Reality—Developing Scalable Data Management Infrastructure in a Data Grid-Enabled Digital NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From Vision to Reality—Research Prototype Persistent Archive Extension (2006–2007) NSF SDCI 0721400, SDCI Data Improvement: Data Grids for Community Driven Applications (2007–2010) NSF/NARA OCI-0848296, NARA Transcontinental Persistent Archive Prototype (2008–2012)
23
iRODS Credits For More Information http://www.irods.org http://diceresearch.org/ http://dice.unc.edu/ http://www.renci.org/news/releases /renci-teams-with-dice
24
Thank You. http://www.renci.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.