SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.

Slides:



Advertisements
Similar presentations
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
A centre of expertise in digital information management A QA Framework To Support Your Library Web Site Review Brian Kelly UKOLN University of Bath Bath.
Cultural Content and Digital Heritage Bernard Smith European Commission INFSO/D2.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
HP Quality Center Overview.
Dr. Ross King AIT Austrian Institute of Technology GmbH SCAPE/OPF Executive Seminar: Managing Digital Preservation The Hague, April 2, 2014 SCAPE Tools.
Overview Summary of the activities for the past two weeks Forthcoming deliverables Development plan for the following period.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
E-lico planner/DM assistant DMO Taverna WF Rapid-Miner WF 2. WFs converted to run in other applications myExperiment E-lico provenance repository WF execution.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
A Service for Data-Intensive Computations on Virtual Clusters Rainer Schmidt, Christian Sadilek, and Ross King Intensive 2009,
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
A Dynamic Solution for Electronic Records: The National Archives & Records Administration’s Electronic Records Archives Kenneth Thibodeau, Director Electronic.
Technology Capabilities. Market Research + Tech Capabilities Datamatics has in-house capabilities to deliver Technical expertise. Our clients rely on.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Building Scalable Web Archives Florent Carpentier, Leïla Medjkoune Internet Memory Foundation IIPC GA, Paris, May 2014.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
SCAPE Scalable Preservation Environments. 2 Its all about scalability! Scalable services for planning and execution of institutional preservation strategies.
Configuration Management (CM)
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Summary Service Catalogue VM Update Taverna “Platform” hackathon 1-day SCAPE “Platform” workshop in Berlin Taverna -> MapReduce thoughts.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
Cloud Age Time to change the programming paradigm?
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Computer Software.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
SCAPE Rainer Schmidt SCAPE Information Day May 5 th, 2014 Österreichische Nationalbibliothek The SCAPE Platform Overview.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
Biomedical Informatics Research Network BIRN Workflow Portal.
WSMO in Knowledge Web 2nd SDK cluster f2f meeting Rubén Lara Digital Enterprise.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
PLANETS, OPF & SCAPE A summary of the tools from these preservation projects, and where their development is heading.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Biomedical Informatics Research Network BIRN Workflow Portal Shawn Murphy Michael Mendis.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Standards and the digital life cycle NOF Digitisation Workshops September 2000 Alice Grant Consulting Including additional notes and.
DigiBoard Curator Tools Fair IIPC GA 2014 Abbie Grotke ~ Library of Congress
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
The Taverna Software Suite Prof Carole Goble FREng FBCS CITP The University of Manchester, UK
SCAPE Andy Jackson The British Library SCAPEdev1 AIT, Vienna - 6 th – 7 th June 2011 Welcome First SCAPE Developers’ Workshop.
Biomedical Informatics Research Network BIRN Workflow Portal.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
Aleksandra Pawlik Alan Williams University of Manchester.
Accessing the VI-SEEM infrastructure
Ecological Niche Modelling in the EGI Cloud Federation
University of Chicago and ANL
Research Data Context Preservation in SCAPE
Joseph JaJa, Mike Smorul, and Sangchul Song
Microsoft SharePoint Server 2016
Alan Williams, Donal Fellows, Finn Bacall,
Chapter 4 Computer Software.
Scientific Computational Reproducibility
AGMLAB Information Technologies
Scientific Workflows Lecture 15
Presentation transcript:

SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration of large data sets in order to help automate digital preservation Digital preservation: standards + policies + technologies to ensure access to digital objects over time “Preservation workflows”, “Digital objects 4 ever” 42 months, in the period project partners, 22 WPs, 55 deliverables, 88 milestones, zillion mailing lists

SCAP E The Problem Scale of data sets involved in digital preservation: large number of objects involved in data sets the objects can be large in size or complex in structure the data collections can contain heterogeneous objects (objects of different type) Data formats change over time, become obsolete Migrating digital objects – must ensure success Reproducibility of preservation processes and collection of provenance data over the entire digital object’s lifecycle

SCAP E The Solution – From Project Proposal The preservation processes - realised as data pipelines and described formally as Taverna workflows Workflows will invoke various services for planning and execution of institutional preservation and quality assurance strategies Workflows will be deployed on a large scale (using clouds) and executed over large, distributed and heterogeneous collections of complex digital objects The execution of workflows will be controlled by a policy- based system, which will ensure the workflows are in line with state-of-the art in digital object representation, file formats, rendering tools, etc. and detect and report any errors in a preservation process

SCAP E The Solution – In Practice Preservation services are written in various languages Use Taverna’s External Tools or Beanshells to invoke them from inside Taverna workflows Preservation services need to be running locally to be able to deploy them to a cluster and avoid bottleneck problem related to invoking a Web service Convert Taverna’s workflows to workflows executable and parallelizable on Hadoop MapReduce Compile Taverna workflows to intermediate language Jaql that can be optimized and executed on MapReduce

SCAP E Benefits to Us Strengthened External Tools plugin and improved support for running external services Taverna workflow (potentially containing only local services) -> parallelizable Jaql workflow executable on a MapReduce cloud App4Andy-style applications that process large data, use local scripts and need parallelization/optimization Some extensions to myExperiment (“run wf on a cloud”) /BioCatalogue – not sure how reusable

SCAP E Other Projects Affecting SCAPE External Tools plugin for Taverna Provenance in Taverna Browsing, exporting We design a Taverna wf, but actually run a Jaql wf – so provenance is not being captured by Taverna? Next Generation Workbench – could with a more advanced UI SCUFL2 – for conversion to Jaql workflows Easier for manipulation than current t2flow?

SCAP E Summary Contributions Taverna Workbench for workflow design myExperiment VRE for sharing workflows BioCatalogue catalogue for curating preservation services Ontology development Expectations Scalability in workflow execution Experiences with new domain – digital libraries