SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform
SCAlable Preservation Environments SCAPE 2 Motivation Increasing amount of data in data centers and memory institutions. Cannot be handled using traditional environments like databases or server facilities. Institutions require ability to process large and complex data sets in preservation scenarios Examples are data migration, information extraction, quality assurance. Goal is to take advantage of data-intensive computing technologies for digital preservation.
SCAlable Preservation Environments SCAPE 3 What we will show you Example Scenarios from the SCAPE Testbed and how they are formalized using Workflow Technology Introduction and hands-on exercise using the involved preservation tools. Overview of the SCAPE Platform, its underlying technologies, preservation services, and how to set-up. Creating scalable workflows and deploy them on the platform. Execute SCAPE workflows using a virtual machine environment as well as on a demonstration cluster.
SCAlable Preservation Environments SCAPE 4 Workflows in this Context Formalized (and repeatable) processes/experiments consisting of one or more activities interpreted by a workflow engine. Usually modeled as DAGs based on control-flow and/or data-flow logic. Workflow engine functions as a coordinator/scheduler that triggers the execution of the involved activities May be performed by a desktop, on server-sided component, or both. Example workflow engines are Taverna workbench, Taverna server, and Apache Oozie. Used for experimentation & research, SOA support, Hadoop integration.
SCAlable Preservation Environments SCAPE 5 Challenges in SCAPE Providing means that aid workflow developers in parallelizing different scenarios. Depends a lot on nature of the data and workflow Handling the interaction between external tools and MapReduce programs. Interaction of the execution environment with data sources and sinks, in particular with repositories. Interfacing with preservation planning and watch tools including semantic search, reporting. Maintaining a central infrastructure and providing guidance for deploying local instances in different institutional settings.