Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009.

Slides:



Advertisements
Similar presentations
Adding OAI-ORE Support to Repository Platforms Alexey Maslov, Adam Mikeal, Scott Phillips, John Leggett, Mark McFarland Texas Digital Library TCDL09.
Advertisements

Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
EThOSnet Repositories and Web Services Workshop 2 nd June2009 Richard Green
EThOSnet Repositories and Web Services Workshop 2 nd June2009 Richard Green
Reinventing using REST. Anything addressable by a URI is called a resource GET, PUT, POST, DELETE WebDAV (MOVE, LOCK)
Repositories The Algoma University Experience By Robin Isard, Algoma University.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
JMS messaging service  All write-only Fedora operations are published to subscribed clients  Messaging system can be durable – if client/consumer/subscriber.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Depositing e-material to The National Library of Sweden.
Event-Driven Architecture for Synchronizing Active Directory Groups Nathan Dors – University of Washington Eric Kool-Brown – University of.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
Web Service Architecture Part I- Overview and Models (based on W3C Working Group Note Frank.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
SOA, BPM, BPEL, jBPM.
DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Fedora Commons Overview and Future Plans Sandy Payette, Executive Director Cornell University Library Metadata Working Group June 13, 2008.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Cloud Task Replica Repository Preservation Tools Open Repositories Atlanta Richard Rodgers MIT Libraries.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
DuraCloud pilot program Michele Kimpton, CEO DuraSpace Richard Rodger, Dept Head Software development, M.I.T. Libraries Claire Stewart Dept Head Digital.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
Independent Insight for Service Oriented Practice Summary: Service Reference Architecture and Planning David Sprott.
Session 7: JMS, JCA, JSF Dr. Nipat Jongsawat.
Coding Compliance Components Writing Custom Policies for Auditing, Expiration and More Jason Morrill Program Manager Windows SharePoint Services.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Fedora and the Preservation of University Electronic Records Project NHPRC Electronic Records Research Grant Kevin L. Glick Manuscripts and Archives, Yale.
REST By: Vishwanath Vineet.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Fedora Service Framework Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
SharePoint Fest 2013 Chicago What’s New and Exciting (and not so great) in SharePoint Designer 2013 Workflows Ira Fuchs – SharePoint Technical Specialist,
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Using E-Business Suite Attachments
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
Overview: Fedora Architecture and Software Features
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
VI-SEEM Data Repository
Hydra: a case study Chris Awre
Building a Database on S3
NSDL Data Repository (NDR)
Saravana Kumar CEO/Founder - Kovai Atomic Scope – Product Update.
Malte Dreyer – Matthias Razum
Open Archival Information System
Matthew Farmer Making Azure Integration Services Real
SDMX IT Tools SDMX Registry
Presentation transcript:

Fedora Service Framework Simple Queue Services For fulfillment of the Mellon Grant June 29, 2009

Simple Queue Services Provide a simple, reliable way to connect content-related infrastructure services to: – Enable moving notifications and content between services and repositories – Perform tasks using decoupled, reusable services – Enable easy reuse and repurposing of services as programmable flows Inspirations – Amazon Simple Queue Services (FOSS Implementation) – Tom Cramer, Stanford Library Work Do workflow (via Hydra) – Richard Rogers, MIT Libraries Cloud Task Replica – NSDL NCORE

Example FSF-SQS Application Request Queue Response Queue File System Or Duraspace Or Naked Akubra Or Fedora Repository Simple Ingest Service Portable Ingest Client Validation Service (e.g.) Custom Ingest Client Browser

Example Chained FSF-SQS Application Request Queue Response Queue Staging or Institutional Store Simple Ingest Service Request Queue Response Queue Appraisal Service (e.g.) Validation Service (e.g.) Portable Ingest Client Fedora Repository Service

Example Replication FSF-SQS Application Request Queue Response Queue Notification Polling Service Request Queue Response Queue Fedora Ingest Service Transform Service Existing Client Metadata Bitstreams DSpace Fedora Repository

Fedora Repository Service GSearch OAI Ingest Simple JMS Simple JMS Service Integration More… First, we are providing simple messaging (via ActiveMQ in Fedora 3.0) repository publishes events Services listen and consume events or other messages Next, lightweight integration with workflow engine(s); orchestration Original FSF Messaging Concept Did not get implemented No message ingest method

Collective Experience Domain Characterization (reference Mellon ESB Study):Mellon ESB Study – Limited governance structures – High developer turnover – Rapid environment changes – Cost-sensitive Examples: – RepoMMan and Remap (BPEL) RepoMManRemap – Hydra (three approaches)(Dlib) Hydrathree approachesDlib – eSciDoc plus others (Red Hat jBPM)Red Hat jBPM Northwestern Books Trident Project Conclusion: – Using full-featured workflow systems will be difficult for the majority of our targeted organizations

Amazons Simple Queue Service Amazon SQS Implemented as a service within Amazons Cloud Less capable but much simpler than direct JMS Limited to an 8K message body with no attachments SOAP and Query (aka Web) API Messages are durable for 4 days Messages are locked while processing

Amazons SQS API CreateQueue: Create queues for use with your AWS account. ListQueues: List your existing queues. DeleteQueue: Delete one of your queues. SendMessage: Add any data entries to a specified queue. ReceiveMessage: Return one or more messages from a specified queue. ChangeMessageVisibility: Change the visibility timeout of previously received message. DeleteMessage: Remove a previously received message from a specified queue. SetQueueAttributes: Control queue settings like the amount of time that messages are locked after being read so they cannot be read again. GetQueueAttributes: See information about a queue like the number of messages in it. AddPermission: Add queue sharing for another AWS account for a specified queue. RemovePermission: Remove an AWS account from queue sharing for a specified queue.

Rogers Cloud Task Replica OR09 Presentation Oriented to Cloud characteristics Uses lightweight interfaces and queuing, highly-decoupled Primarily focuses on replication use cases At prototype stage

CTR - Roles decompose work into distinct replaceable agents archive = content home replicator = manages copies auditor = implements and enforces policy role != institution

CTR - Process Model a message queue for each role message post triggers activity asynchronously bucket brigade - message is a handoff or acknowledgment storage is abstracted

CTR - Workflow: Replication archive replicatorauditor S3

CTR - Message Semantics web-standard URI addressing entities: packages, ORE maps content model agnostic entity checksums for integrity standard identifiers for actors

Stanfords Work Do Workflow Puts the resource management state inside the Fedora digital object Each application is read the object and performs its function Able to support both human workflow and BPE Uses logical queues to manage workflow (no messaging SW) Depends on applications doing the right thing Simplifies governance to resource management semantics and representation

Work Do - Approach Each object in DOR has: – a locally defined resource-management metadata – a special Datastream to describe processing conditions and their state for that object. Places work-related information in the object: – it can be indexed (using SOLR or other search engines) – co-located alongside other useful processing information – contains collection and selector identity to mark records ready for a particular process.

Work Do – Process Model Simple queries are used to: – establish logical queues – queues define the work ready for a particular robot or human interaction at any given time. Queries also provide: – ongoing management information about the flow of objects through the system – can be exposed as facets in an administrative discovery environment Simple REST based interactions based on Fedora service calls are used to identify queues and update state.

Work Do – Process Data A workflow datastream in each object describes processing requirements and status <process name="google-download" status="exception message="Item for barcode not found" attempts="3" />

FSF-SQS Development Approach Merge selected aspects of Amazon, Stanford Work Do, and MIT Cloud Task Replica approaches Enable moving notifications and data between repository services Mostly integration of existing FOSS, minimal new build Extends existing ActiveMQ implementation – Adds tools for moving data – Adds additional language bindings likely using Stomp – Realizes promise of completing asynchronous messaging – Can be extended later to include business rules engine, full workflow – Can be extended to Cloud implementations (Amazon, Eucalyptus) – Note: No FOSS implementation currently available for Amazon SQS

Targeted Use Cases Bi-directional replication between Fedora repositories – initial and ongoing – possibly update Uni-direction replication from DSpace to Fedora – initial and ongoing One-time ingest (ETL) from legacy repositories Validation services Selected workflows (TBD)

FSF-SQS Implementation Would prefer to use FOSS implementation of Amazon SQS interface Fallback is to use other products directly Under investigation: – ActiveMQ integrations including Apache CXF ActiveMQApache CXF – Mule Mule – Apache Camel Apache Camel – FUSE ESB 4 (Apache ServiceMix – Mellon ESB top recommendation) FUSE ESB 4 Apache ServiceMix Note: Bus In the CloudBus In the Cloud Note: Is Eucalyptus ready to be your private cloud?Is Eucalyptus ready to be your private cloud?

Dont Need to Build Messaging (ActiveMQ) Language Bindings, Brokers/Gateways (e.g. Stomp) ESB (e.g. Camel, Mule) or Workflow (e.g. jBPM, Kepler) Most services Business integration patterns (but will have to choose) – Document (send object, action and content through) – Disconnected (temporarily put the content in storage or in Fedora and incrementally perform actions) – Notification (events only)

Do Need to Build Service Wrappers (or request from community) FSF-SQS based on Amazon SQS in ActiveMQ possibly with Mule Message payload formats include resource processing state DSpace to Fedora extract, transform, transfer and load flow Replacement for Diringest service (maybe) – Chris Wilper wants this work done – Needs to handle content without requiring FOXML wrapper, manifest – Good to use Fedora Content Models where feasible – Be extensible – Needs some common components with FC-REPO WebDAV – Support Messaging and Web end-point (brokers/gateways) Portable client (partial SIP builder replacement)(maybe) – Works both client or server-side (consider Python, Ruby, Flex) – Works with or without manifest, synchronous and asynchronous – Simple, Simple, Simple on-ramp client for entry-level users

Advantages and Drawbacks Advantages – Messaging is the simplest of the enterprise methods – Low risk since simplifying approaches may be taken at may points – Has been requested many times by large repository users – Immediately useful – Fits overall Mellon goals Drawbacks – Does not include a named workflow product though workflow term used by Amazon and others to describe this approach – Meat and potatoes type implementation does not excite people

Details

Integrate a Simple Queue Service Demonstrates a lightweight ingest pipeline using off-the- shelf open source technology (ActiveMQ with REST brokers/gateways) Performs the services selected by the Simple Ingest Service web application Work consists mostly of integration tasks with building some service wrappers Service code is to be selected only from existing off-the- shelf FOSS Provides a model for integration with the Fedora Repository The specific products/languages for services to be determined when the use cases and partners are well characterized

FSF-SQS Integration Patterns Enterprise Integration Patterns Document (object, actions/state and content in message) Disconnected (object and content stored in file systems, Akubra, DuraCloud or Fedora during processing, actions/state in message) Notification (actions in message, state, object and content elsewhere)

Potential Demonstration Services Create derivative forms Format conversion Verify Checksum Virus scanning Validate object Validate datastream format (and label or check FORMAT_URI and MIME-type) Get non-Fedora PID Metadata feature services (feature extraction with write into FOXML or datastream) – JHOVE – iVia (Descriptive metadata generation plus other services) Many other services possible but a few key selections should be incorporated leaving room for later additions

Workflow States Object State – State of a data object at a point in time – Can be contained in the object and reflected on Process State – State of an instance of a processing flow – Workflow engines designed to handle this – Long running vs. short running Event State – General notion of event is a statement which is reflected on – PREMIS-like preservation event is more of a process Person State – Characteristics of a person (actor) with respect to objects, processes, or events – (e.g. requirements fulfilled by a PH.D. student to graduate)

Build a Simple Ingest Service Directory/file ingest (Diringest replacement) Web application (server-side service) Generates FOXML for transferred content Supports content models where practical (also needed for WEBDav interface) Use lightweight ingest pipeline described below to perform the pre-ingest preparation services

Build a Portable Ingest Client Ingest a single file or a directory Choose the content model (if any) from menu Choose what pre-ingest services to perform on the content from menu Works both as a Web App and as a Desktop App Communicates by Web (REST) and messaging via broker/gateway Later can be extended more towards FedoraShare concept Consider scripting framework Python, Ruby, Flex

Content Models Hydra Content Models