Scaling Emulation and Software Preservation Infrastructure

Slides:



Advertisements
Similar presentations
The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
Advertisements

Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
John OckerbloomDec. 6, 2002 Supporting learning at the library Towards integrating LMS and digital library technology at Penn John Mark Ockerbloom CNI.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
SaaS, PaaS & TaaS By: Raza Usmani
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
IT Infrastructure for Business
User Driven Innovation in a technology driven project Anastasius Gavras Eurescom GmbH
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Digital library infrastructure -- systems Repositories for storing digital resources protect, manage, deliver, and preserve digital resources over time.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Desktop Virtualization
Company small business cloud solution Client UNIVERSITY OF BEDFORDSHIRE.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
CLOUD COMPUTING. What is cloud computing ??? What is cloud computing ??? Cloud computing is a general term for anything that involves delivering hosted.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
Digital Asset Management Systems and Digital Preservation EUAN COCHRANE – DIGITAL PRESERVATION MANAGER YALE UNIVERSITY LIBRARY.
INFN OCCI implementation on Grid Infrastructure Michele Orrù INFN-CNAF OGF27, 13/10/ M.Orrù (INFN-CNAF) INFN OCCI implementation on Grid Infrastructure.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Preserving containers EUAN COCHRANE DIGITAL PRESERVATION MANAGER YALE UNIVERSITY LIBRARY.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Unit 3 Virtualization.
Accessing the VI-SEEM infrastructure
Chapter 6: Securing the Cloud
Operating System Structures
Preserving containers
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Netscape Application Server
System Architecture & Hardware Configurations
Joslynn Lee – Data Science Educator
N-Tier Architecture.
Wikidata as a digital preservation knowledgebase
The importance of being Connected
Microsoft SharePoint Server 2016
Distribution and components
CSC 480 Software Engineering
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Enterprise Application Architecture
Outline Pursue Interoperability: Digital Libraries
Content Management Systems
Programmable Logic Controllers (PLCs) An Overview.
Chapter 2: System Structures
Module 01 ETICS Overview ETICS Online Tutorials
Cloud computing mechanisms
An Introduction to Software Architecture
Chapter 7 –Implementation Issues
HingX Project Overview
Middleware, Services, etc.
Malte Dreyer – Matthias Razum
Outline Chapter 2 (cont) OS Design OS structure
MORE ON ARCHITECTURES The main reasons for using an architecture are maintainability and performance. We want to structure the software into reasonably.
Bird of Feather Session
How to Implement an Institutional Repository: Part II
Works in Progress Webinar | 22 August 2019
Presentation transcript:

Scaling Emulation and Software Preservation Infrastructure Dr. Klaus Rechert, OpenSLX GmbH @kurau5u http://emulation.solutions by

At least 140 different file formats (lower boundary) We already have established practices to store and cite research data. The question though, how long is this data actually usable. To get a better sense, we looked into some random data sets, and what you typically find are mixed bags. For these you most likely need the appropriate software for reuse. Random data sets from 92 public repositories (1,95 TB / 3.5 Mio. files): At least 140 different file formats (lower boundary) 4 to 39 different file formats per dataset. E.g. humanities use an avg. of 10,2 formats / data set. Are research data sets FAIR in the long run? Wehrle & Rechert, iDCC 2018

Software Preservation Software Sustainability Institute Checklist for a Software Management Plan https://zenodo.org/record/2159713 Software Deposit Guidance https://softwaresaved.github.io/software-deposit-guidance/ FORCE11 Software Citation Principles https://www.force11.org/software-citation-principles Software Heritage Source Code Archive https://archive.softwareheritage.org/

Software Preservation Software Preservation Network Scaling Software Preservation and Emulation Infrastructure (EaaSI) https://www.softwarepreservationnetwork.org/eaasi/ Fostering a Community of Practice: Preservation and Emulation in Library, Archives and Museums https://www.softwarepreservationnetwork.org/fcop/ License and rights management Internet Archive UNESCO Persist And many other smaller institutions and informal initiatives

Emulation Emulation as Access Strategy Idea: Implement a hardware equivalent in software CS: every hardware logic circuit has an equivalent representation in software Not a new idea (cf. Rothenberg, 1995)! However technically complex: Usability Scalability Preservation Paradox Skaliert: weder viele Nutzer noch viele Objekte

Dissecting a Digital Artefact If a task seems to be too complex, just divide it into layers.

Huge number of different dig. object types Objects fixed. Archived objects do not change over time. Custom-built software: unique objects → individual preservation plans essentiall

Huge number of different dig. objects. Fixed. Do usually not change over time. Rather small number of different hardware environments. Require replacements (continuously).

This is where the two worlds meet. This is where these two worlds meet. And this is the most interesting work field. division of laber, do what you do best From the bottom, very technical, keep emulators working, make sure them exchangeable, if they break / become obsolte. protect the work that has been done on the upper layers. Goal is there anything that can be shared among objects/institutions? identify boundaries between generic technical tasks and object specific tasks. E.g. mainatining a collection of rather generic software environments, that can be used with multiple objects (and can be simply adopted for specific tasks).

Emulation as a Service Store and maintain contents of all three layers separately Combine theses three layers to a executable environment “on-demand” Supplies right emulator version and preconfigures common settings Enables access to emulators over the web via browser interface Smooths creation of new emulated computing environments Smooths configuration of common OS features in legacy systems this is the basic idea of emulation as a service: To maintain this flexibility, combine the three layers just using metadata Simplify the technology as possible, having well maintained emulators Simpliffy access and emulation related workflows: e.g. create a software environment to render an object --- Just to avoid confusion. we do not develop emulators, that would be a wast of time. Use what is out there, make them usable and accessible for preservation ans similar purposes.

Emulation as a Service In development by the bwFLA team at the University of Freiburg since 2011 Since 2016 EMiL – Reading Room Solution (German Nat. Library) Since OpenSLX GmbH provides commercial support Since 2017 CiTAR builds RDM workflows to repeat, replicate, reproduce or reuse software based research on top of EaaS Since 2018 the EaaSI project

Emulation as a Service Two sides: 1. curation: create, manage and test environments 2. publication: provide access, reading room or cloud.

EaaS in Production http://archive.rhizome.org/theresa-duncan-cdroms http://archive.rhizome.org/anthology/heritage-gold.html

The technology is there and works, but for a wider addoption more work needs to be done.

EaaSI: Scale Up Streamline infrastructure for ease of deployment and operation Enhance usability of interface and system workflows Improve performance and security Firstly, EaaS needs to be improved multi-user, UI/UX

Share Work & Expertise Decentralized network of emulation nodes Able to share software resources and emulation environments in the network Thousands of pre-configured software environments and software resources provided by Yale University Library If you remember the slide with the layer. The middle layer (software environments) offers a lot of options for sharing work and expertise we create a network of EaaSI node hosts, where every node can contribute and can benefit from expertise from others. Have you ever installed AutoCAD for DOS? We don‘t have to repeat this exercise every time

Discovery & Description Defining profile for description of software and computer environments Comprehensive, open, machine-readable documentation Incorporating services developed by Wikidata for Digital Preservation Another problem is the lack of machine-actional metadata about software and their capabilities. Usually a search request starts with an object, so we need to know which software is able to render it.

Access Emulated CD-ROM environment sharing service Virtual Reading Rooms Service Scientific Software Portal API to automatically render objects in original software via emulation There will be four products

Our Team

Euan Cochrane (YUL) Principal Investigator Seth Anderson (YUL) Program Manager Ethan Gates (YUL) Software Preservation Analyst Klaus Rechert & Oleg Stobbe (OpenSLX) Technical Architecture and Development Jessica Meyerson (Educopia/SPN) Communications/Outreach Kat Thornton (Data Current/WikiDP) Semantic Architect Justin Aubin, Mac Schmidt, Zoe Sinclair, Idris Sylvester, Eric Timperman, Matt Tu, Kohei Yamaguchi Software Emulation Configuration Workers

Our Node Hosts Carnegie Mellon University Stanford University Yale University Our Node Hosts University of California - San Diego Notre Dame University University of Virginia

A Very Special Thanks to our Funders...

http://emulation.solutions by Thank you! klaus@openslx.com (@kurau5u) http://www.softwarepreservationnetwork.org/eaasi http://emulation.solutions by