Workflows for Digital Preservation and Curation Workshop Open Repositories 2012 Stacy Kowalczyk Beth Plale Kavitha Chandrasekar Yiming Sun.

Slides:



Advertisements
Similar presentations
1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
GIS for Decision Support and Economic Development Beau Bradley, Neighborhood Transformation Initiative Jim Querry, Mayors Office of Information Services.
Chapter 1: The Database Environment
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
UNITED NATIONS Shipment Details Report – January 2006.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Document #07-2I RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) (mod 7/25 & clean-up 8/20) Customer Supplier.
1 Introducing the Specifications of the Metro Ethernet Forum MEF 19 Abstract Test Suite for UNI Type 1 February 2008.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Create an Application Title 1A - Adult Chapter 3.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
ETD Preservation Workshop Session One: ETDs and Preservation Needs Gail McMillan, Virginia Tech.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Excel Functions. Part 1. Introduction 2 An Excel function is a formula or a procedure that is performed in the Visual Basic environment, outside the.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Week 2 The Object-Oriented Approach to Requirements
© Telcordia Technologies 2004 – All Rights Reserved AETG Web Service Tutorial AETG is a service mark of Telcordia Technologies. Telcordia Technologies.
Configuration management
Software change management
Information Systems Today: Managing in the Digital World
Campaign Overview Mailers Mailing Lists
1 The information industry and the information market Summary.
EU market situation for eggs and poultry Management Committee 20 October 2011.
PEPS Weekly Data Extracts User Guide September 2006.
Microsoft Access.
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
1 IC GS J. Broome, Mar Introduction to the Informatics and Data Aspects John Broome (Canada)
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Services Course Windows Live SkyDrive Participant Guide.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 2 Networking Fundamentals.
April 2003 ONLINE SERVICE DELIVERY Presentation. 2 What is Online Service Delivery? Vision The current vision of the Online Service Delivery program is.
1 BRState Software Demonstration. 2 After you click on the LDEQ link to download the BRState Software you will get this message.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Copyright 2001 Advanced Strategies, Inc. 1 Data Bridging An Overview Prepared for DIGIT By Advanced Strategies, Inc.
Analyzing Genes and Genomes
Systems Analysis and Design in a Changing World, Fifth Edition
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
ICDL-Contentra Workshop 29 th November /11/2013 Contentra Technologies Confidential (RajuB)1.
Energy Generation in Mitochondria and Chlorplasts
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
Benchmark Series Microsoft Excel 2013 Level 2
Introduction to ikhlas ikhlas is an affordable and effective Online Accounting Solution that is currently available in Brunei.
Management Information Systems, 10/e
Chapter 9: Using Classes and Objects. Understanding Class Concepts Types of classes – Classes that are only application programs with a Main() method.

DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
Presentation transcript:

Workflows for Digital Preservation and Curation Workshop Open Repositories 2012 Stacy Kowalczyk Beth Plale Kavitha Chandrasekar Yiming Sun

Agenda Introduction to Digital Curation Workflow Systems Overview Workflows for Digital Curation Break Implementing Workflows in Trident Modifying a Workflow Create a new Workflow Creating Components Wrap up 7/10/12 2

Acknowledgements This workshop was made possible through a generous grant by Microsoft Research And by the Data to Insight Center of Indiana University’s Pervasive Technology Institute Quan Zhou, Ph.D. student and developer, for his help with developing components, workflows, and documentation 7/10/12 3

Introduction to Digital Curation Defining curation Infrastructure for curation Curating the files Curating the object 7/10/12 4

Defining Curation Digital curation involves maintaining, preserving and adding value to digital research data throughout its lifecycle. The active management of research data reduces threats to their long-term research value and mitigates the risk of digital obsolescence. Meanwhile, curated data in trusted digital repositories may be shared among the wider … research community. As well as reducing duplication of effort in research data creation, curation enhances the long-term value of existing data by making it available for further high quality research. Digital Curation Center 7/10/12 5

Curation Infrastructure Repository Public access Policies Processes Institutional support 7/10/12 6

Curating the Files Bitstream Integrity – Fixity – Duplicate copies File integrity – Format verification – Format validation 7/10/12 7

File Formats Durability – Transparency – Documentation – Ubiquity – Renderability – Longevity 7/10/12 8

Format Choices Master files for preservation – Highest quality – Highest fidelity – Lossless Derivative files for active use and delivery – Smallest possible for user needs – Fast delivery – Easy to use format 7/10/12 9

Curating the Object Context – Relationships between files – Technical metadata – Intellectual metadata To Metadata – Implicit/explicit context 7/10/12 10

Curation Activities Ongoing verification – File integrity – Object integrity Metadata management Management of obsolescence – Hardware – Software – Formats – Documentation 7/10/12 11

Workflow Systems Purpose of workflow systems Types of workflow systems Trident Workflow Workbench 7/10/12 12

Why Workflow Systems Repetitive and mundane activities simplified Facilitates and enforces best practices Enables efficient scheduling Machinery for coordinating the execution of services and linking together resources Facilitates outreach to researchers for direct deposit and automatic curation 7/10/12 13

Types of Workflow Systems 7/10/12 14 Kepler BPEL Ptolemy II Triana Taverna

Trident Open source project Based on Microsoft Workflow Foundation classes Supported by Microsoft Research and academic researchers Integrates with myExperiment Well accepted in the research community – well over 100 peer-reviewed and white papers were discovered from one scholarly aggregation service 7/10/12 15

Trident Components Trident Management Studio Trident Workflow Composer Trident Workflow Application Microsoft SQL Server Trident Silverlight client for web execution of workflows Microsoft Visual Studio – C# development environment 7/10/12 16

Design Visual Workflow Composer Trident Registry Workflow Packages (domain specific) Trident Runtime Services Windows Workflow Foundation.NET 4.0 Provenance Monitoring Workflow Scheduling Service Admin Admin Console Workflow Monitor Community Web Portal s earch Launch Monitor Workflow Launcher Results Repository Workflow Repository (myExperiment) Data Access Layer Data Object Model (data source abstraction layer) Data Storage Providers: SQL Server, Local XML store, …

Workflows for Curation Goals – Systematic and repeatable processes – Helps remove human errors Data Ingest – Integrity checks – Format normalization/derivative generation – Metadata creations Curation activities – Integrity checks – Format migration – Media migration 7/10/12 18

Data Ingest Workflows Scenarios – Single part objects (individual images) – Multi-part objects (a book) – Multiple instantiations of a logical object (word, pdf and ppt of a research paper) – Multiple multi-part objects (a group of letters) – Research data products (multiple files of various types) – Scientific workflow process 7/10/12 19

Single Part Objects Workflow Magic Lantern Slides – Individual files – Spreadsheet 7/10/12 20 Derivative Generation Format Validation and Verification Fixity Check Create Tech Metadata Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks

Multi-part Object Workflow Comic Book – RIS – Set of.tif files 7/10/12 21 Create Tech Metadata Derivative Generation Format Validation and Verification Fixity Check Object Integrity Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks

Multiple Instantiations of a Logical Object Workflow Papers – Each logical object per subdirectory – RIS, word file and (perhaps) supplemental file 7/10/12 22 Format Normalization Format Validation and Verification Fixity Check Create Tech Metadata Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Derivative Generation

Multiple Multi-part Object Workflow Ball collection – RIS for collection and Inventory spreadsheet – Each logical object in separate subdirectory 7/10/12 23 Create Tech Metadata Derivative Generation Format Validation and Verification Fixity Check Object Integrity Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository Image Quality Checks Collection Integrity Create Collection Metadata

Research Data Products Vortex – Each subdirectory is an experiment with FGDC metadata 7/10/12 24 Compress Data Fixity Check Create Intellectual Metadata Create Object Metadata Persistent Identification Deposit in Repository

Workflow Components Format Conversions (for normalization and derivative generation) –.xlsx to.csv –.docx to.pdf –.ppt to.pdf –.tif to.jpg – Zipping on demand – Image (.tif or.jpg) to.pdf 7/10/12 25

Workflow Components 2 Context creation – MIX data generator and validator – METS data generator and validator Data Integrity – MD5 checksum generator – MD5 checksum validator – JHOVE for format verification and validation – Group validation (for object integrity) 7/10/12 26

Post Deposit Curation Workflow Scenarios – Fixity verification – Format normalization – New or additional derivative generation – Media migration – Persistent identifier updates – Metadata updates 7/10/12 27

Workflows in Trident 7/10/12 28

Executing Workflows 7/10/12 29 Individual object ingest Multipart object ingest Multiple multipart object ingest Multiple instantiations of a single logical object Research data ingest Scientific workflow Fixity check curation workflow

Implementing Workflows in Trident Launch the Remote Desktop application User: AMAZONA- JJOAL14\oruser PWD: TridentOR12!! Computer ip addresses on slip of paper being passed out now. 7/10/12 30

Trident Workflow Composer 7/10/12 31

Participant Exercises 7/10/12 32

Modifying Workflows Add components to existing workflows Select the Individual Ingest Workflow – Add DOI component Before the METS generator component Make the connections Select the Group Ingest Workflow Comic – Add the METS generation component After the last component in the main line Make the connections 7/10/12 33

Simple Curation Workflow Creation Create a Workflow for a simple curation process – validate MD5 checksums – Define a directory of image files – Define a METS file – Define an out put location – Link the MD5 checksum validation component – Link the MD5 checksum report component – Save and execute the workflow 7/10/12 34

Creating Components Exercise: – Create a new Trident workflow component – Implement the MARCXML to MODS Stylesheet slim2MODS3-4.xsl slim2MODS3-4.xsl – Kavitha Chandrasekar will demonstrate the process 7/10/12 35

Wrap Up Thumb drives Trident codeplex site Trident listserv Contributing to Trident Workshop Evaluation Form Ongoing conversation 7/10/12 36

Contacts for Further Discussion Trident CodePlex site: Trident Listserv: trident-wf- Stacy Kowalczyk: Kavitha Chandrasekar: Yiming Sun: Quan Zhou: 7/10/12 37