1 iRODS: A Rule Oriented Data ManagementSystem SRB Space.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Database System Concepts and Architecture
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
Apache Struts Technology
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
A Very Brief Introduction to iRODS
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Client-Server Processing and Distributed Databases
The Client/Server Database Environment
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
REPLIX Max Planck Institute for Psycholinguistics, TLA.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
4/2/03I-1 © 2001 T. Horton CS 494 Object-Oriented Analysis & Design Software Architecture and Design Readings: Ambler, Chap. 7 (Sections to start.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved RPC Tanenbaum.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
SAN DIEGO SUPERCOMPUTER CENTER An Intelligent Rule-Oriented Data Management System DataGrid Wayne Schroeder San Diego Supercomputer Center, University.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Introduction to DBMS Purpose of Database Systems View of Data
Databases and DBMSs Todd S. Bacastow January 2005.
An Overview of iRODS Integrated Rule-Oriented Data System
The Client/Server Database Environment
Policy-Based Data Management integrated Rule Oriented Data System
The Client/Server Database Environment
CHAPTER 3 Architectures for Distributed Systems
#01 Client/Server Computing
Ch > 28.4.
Service-centric Software Engineering
Data, Databases, and DBMSs
Software Engineering with Reusable Components
Introduction to DBMS Purpose of Database Systems View of Data
Chapter 5 Architectural Design.
Distributed Database Management System
#01 Client/Server Computing
Presentation transcript:

1 iRODS: A Rule Oriented Data ManagementSystem SRB Space

2 Beyond the Storage Resource Broker SRB is a data management system for large-scale data Logical name space -- Independence from Physical Pin Downs Integrated Data and Metadata Management Uniform Access Interfaces Caters to multiple tasks and paradigms Data grid Federations for distributed and replicated data handling Cooperating Autonomous Virtual Organizations (VO) Persistent Archives for long-term preservation Building light, dim and dark archives Digital Libraries for semantically searchable data sharing multiple domains with collection-level functionalities Server-side Operations for performing data intensive operations Data sub setting, data fusion, administrative management Used in large-scale systems in production

3 What Next? SRB is quite complex – with many functions and operations > 90 commands with many options several 100 unique ops The intelligence is hard-coded extensions/modifications require extreme care but, the modules are fairly robust and reusable SRB is a one-size fits all architecture everyone gets the same code base Users want more functionality increased customizability want a small foot print as necessary Easy for them to modify independence from developers functionality to fit policies and not the other way around!!

4 What Do the Users Want? Innovative Access Control Sometimes by groups, sometimes by users & sometimes by roles Based on their login type – how they got authenticated Third party authorization - outside authority agent Dynamically changeable access control Access Control Lists, Denial lists, over-rides,… Ticket-based short-term and controlled access Data Placement Strategies Completely user controlled – user preference policies Completely Administration controlled – site policies Group-based policies Over-rides, exceptions Based on Data characteristic or Collection characteristic Policies for staging, caching, archiving, purging, synchronization,… Ingestion Policies Check for authenticity – anonymization, Pre and post process Replication policies, metadata extraction policies, permission policies,… And others …

5 Rule Oriented Data Management Adaptive Middleware Architecture Customizable and Flexible – User Configurable Administratively Simpler – Admin Configurable Build upon the experience of SRB Data Grid Rule-oriented Programming Well-defined set of functionalities --- Micro services Define Rules which chain micro-services Work-flow of micro services Define Rule Application Condition Define Recoverability for failure management Administrators can set site policies Users can encode their preferences Groups can set their process requirements Control actions at collection-level, format level, user level, resource level, ….

6 Rules and Constraints Rule-based Lower-level Functions are composed of micro-services Higher-level Functions are composed of rules of lower-level micro-services Rules are interpreted using a rule engine Customizability Problems with rule composition Integrity checks to make sure rules do not break higher-level functionailties Declarative programming Rules define semantics Operational programming Rule invocation provides procedural interpretation Rules can be used as “checks and balances” to make sure that collections are self-consistent Example: Rule makes two copies of each files Constraint checking: can be used to see if the collection is consistent with this rule

7 Rule-Oriented Data Systems Framework Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Meta Data Base Engine Rule Curren t State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module

8 Rules Flow Application Client Call Server Call Select Firstt/Next Rule Find Appropriate Rules Condition Check Execute Next MicroService/Action Success Execute Recovery MicroService/Action Yes No Success: No More MS/A True False Failure: No More Rules

9 ingestObject(*F) createFile(*F), registerFile(*F). ingestObject(*F) $userDept == sdsc OR $userDept == sio createFile(*F), registerFile(*F), computeChkSum(*F),!, findBackUpRsrc(*F, *R), replicateFile(*F, *R), computeCheckSum(*F, *R), compareCheckSum(*F). ingestObject(*F) $dataType == FITS Image createFile(*F), registerFile(*F), extractFITSMetadata(*F). Sample Rules

10 Format of a Rule Action :- Condition | MS 1, …, MS n | RMS 1, …, RMS n Action to be performed Condition checked to see if rule is applicable If applicable micro services {1,…n} are executed If any micro service fails, recovery micro service(s) executed to maintain transactional capability createFile(*F) removeFile(*F) ingestMetadata(*F,*M) rollback Caveats: More than one rule can define an action R/MS i can be actions Micro services can pass parameters

11 AMA & ROP A New Paradigm in Middleware Development Higher level Services composed of Micro-services Customizable at multiple levels Glass Box Architecture Can explain what happens Semantics can be checked Run-time Version Control Combines multiple paradigms Workflow systems, active databases, rule-based execution, transaction systems, data grids and remote execution of services Flexible Management Administrative ease Triggers for handling low/high water marks Periodic Job execution – backup, archive, usage control,…

12 Components of Rule System Actions Name Space of Actions Client Call Maps to Actions Micro Services Well-defined Server-side Procedures and Functions Rule Definitions for Actions Workflow of what to do Composed of of Actions and Micro Services Invoked to execute an Action Rule Base Set of Rules Each User Community can choose their own rule base Data Components Blackboard Architecture Used by Micro Services,Actions and Rules

13 Data Components of Rule System Persistent Data Attributes: # Has an external name space Mapping to internal database attributes Persists across sessions Session Data Attributes: $ Has an external name space Mapping to internal data structures Used by micro-services/actions inside a session Side Effects Set: % Changes affected outside the system File created, File Copied, Sent, … Well-defined name space of activities

14 Micro Services Compiled Functions Short and Well-defined functionality Should have a clear semantics Works on $,#,% Examples: Metadata Extraction for DICOM Access Control Permission Changed to User Replicate a file from Source to Destination

15 Semantics Micro Service Semantics Input /Output Variables (in terms of $) Input: what is needed Output: what gets changed Persistent Changes (in terms of #) Updates to Databases Activities Performed (in terms of %) External Activities Performed

16 Semantics Rule Semantics Based on component micro services Action Semantics Based on corresponding rules Only one rule semantics apply

17 ingestInCollection(S) :- /* store & backup */ chkCond1(S), ingest(S), register(S) findBackUpRsrc(S.Coll, R), replicate(S,R). ingestInCollection(S) :- /*store & check */ chkCond2(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S). ingestInCollection(S) :- /store, chk, backup & chk */ chkCond3(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), replicate(S,R) computeSerChkSum(S,C3), checkAndRegisterChkSum(C2,C3,S). ingestInCollection(S) :- /*store,check, backup & extract metadata */ chkCond4(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), [replicate(S,R) || extractRegisterMetadata(S)]. ingestInCollection(S) :- /* just store */ ingest(S), register(S). Sample Rules chkCond1(S) :- user(S) == chkCond1(S) :- coll(S) like ‘*/scec.sdsc/img/*’. chkCond2(S) :- user(S) == chkCond3(S) :- user(S) == chkCond4(S) :- user(S) == datatype(S) == ‘DICOM’. [OprList] implies delay for later or send to a CronJobManager Opr||Opr implies do them in parallel Opr, Opr implies do them serially

18 Middleware Software providing complex distributed applications/services Client-server Peer-to-peer Web servers, Content Managers, Databases, Application Servers,… Client access through common protocols RPC, Message-oriented, Object Request Broker, WSDL or service-oriented Middleware provide a specific set of services

19 Middleware Normal Middleware are black boxes Expose a set of interfaces/service definitions No customization System Developer has complete control A Service will have very configurability option - even in open source middlewares Applications are developed on top of middleware

20 Adaptive Middleware Architecture Similar to normal middleware Provides a set of services Has a well-defined access protocol AMA not a Black Box Admin/User Customizable Service Tweak services to achieve alternate goals Can explain at a high-level what is happening One can compare two AMA services to see how they differ Useful for verification and analysis

21 Adaptive Middleware Architecture External View – Logical Name Space Persistent Memory – Database Transient Memory – Variables External Side-effects Interaction to outside world Ex. File is created, is sent Services, Methods, Actions Rules, Workflow Internal View – Programmatic View Changes in DB Tables, internal variables/structure Procedures, Methods and Functions Drivers, Protocols Users, Resources, Data Objects – methods affecting them Mapping External to Internal Capturing Semantics of Services and Rules Validation, Analysis, Introspection