Initial BizTalk Programming Development Objectives for PeDALS Dennis Bitterlich, Electronic Records Archivist.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
File Server Organization and Best Practices IT Partners June, 02, 2010.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
EXtensible Catalog David Lindahl University of Rochester.
IMPORTING MEDIA FILES in Tycoon 3.04 NAVORI SAPrecision Tools for Digital Signage Professionals Rev. 1.0 March 2008.
PeDALS Persistent Digital Archives & Library System GladysAnn Wells, Director and State Librarian Lisa Maxwell, Division Director, Records Management Division.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Depositing e-material to The National Library of Sweden.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
1 Strategies for Collecting and Preserving Open Access Materials on the Web William Y. Arms Cornell University Federal Library and Information Center Committee.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
Tutorial 11: Connecting to External Data
Persistent Digital Archives and Library System (PeDALS) South Carolina Department of Archives and History.
 Overview and update of the PeDALS project  Persistent Digital Library and Archives System   Panel discussion of lessons.
South Carolina Information Technology Directors Association September 8, 2008 Bill Henry, Matt Guzzi SC Department of Archives and History.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Live Meeting APIs Robert Devine Program Manager Microsoft Corporation.
ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,
©2011 Quest Software, Inc. All rights reserved. Steve Walch, Senior Product Manager Blog: November, 2011 Partner Training Webcast.
INTRODUCTION TO WEB DATABASE PROGRAMMING
PeDALS Persistent Digital Archives & Library System Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library,
“Filling the digital preservation gap” an update from the Jisc Research Data Spring project at York and Hull Jenny Mitcham Digital Archivist Borthwick.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Finding a New Way Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records Using.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
Fundamentals of XML Management Greg Alexopoulos Systems Engineer Documentum.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
PACSCL Consortial Survey Initiative Group Training Session February 12, 2008 at The Historical Society of Pennsylvania.
PatentScope - Electronic Publication World Intellectual Property Organization.
The Metadata Tool Custom Metadata Tool Who this tool is for: This tool designed to be used a data management system. This tool is geared more for the.
Persistent Digital Archives and Library System (PeDALS)
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Afresco Overview Document management and share
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Collection Management Systems
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The world’s libraries. Connected. CONTENTdm ® Digital Collection Management Solutions Learn what to consider when outsourcing your library’s digitization.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
KEEPS – a system for UELMA preservation and security
What is BizTalk ?
Architecture Review 10/11/2004
7th Annual Hong Kong Innovative Users Group Meeting
KEEPS – a system for UELMA preservation and security
DAITSS and the Florida Digital Archive
An Introduction to Tessella and The Safety Deposit Box Platform
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
CS 501: Software Engineering Fall 1999
Digital Project Lifecycle Curating Across the Curriculum
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

Initial BizTalk Programming Development Objectives for PeDALS Dennis Bitterlich, Electronic Records Archivist

What is PeDALS? Persistent Digital Archives & Library System A grant funded multi-state project financed by the Library of Congress (National Digital Information Infrastructure & Preservation Program (NDIIPP)) & the Institute for Museum and Library Services Includes five state partners: Arizona, Florida, New York, South Carolina and Wisconsin, with Arizona as the lead partner Project will run 18-months, until the middle of 2009; if successful, WHS intends to continue participation beyond this period At the end of the project each partner will have a functioning electronic records repository

Why is PeDALS Needed? An increasing number of state government records of long- term value are created in electronic-only format Due to the large and increasing volume of electronic records in varied formats, traditional appraisal and acquisition practices are no longer effective—an automated, rules-based system like PeDALS is one possible response to this new reality PeDALS is not an electronic records management system, but rather a way to acquire electronic records already scheduled for transfer PeDALS is both a learning opportunity and a chance to implement a functioning system

Goals of the Project Develop a methodology to support an automated, integrated workflow to process collections of electronic records Implement an inexpensive storage system that can preserve the integrity and authenticity of electronic records over time Remove barriers to adoption by keeping costs of the system as low as possible Work with Wisconsin Document Depository Program to develop ways to integrate digital format state agency publications into PeDALS processes; since 2005 the Depository has worked to preserve e-publications acquired from state websites

Microsoft BizTalk Overview BizTalk is a middleware application which at its core is an XML Message Queue which will: Receive Objects → Converts & Performs Logic on Objects → Send Objects Completed by BizTalk using XML

BizTalk Pipelines Pipelines Connections between systems – Connect BizTalk to databases – Connect BizTalk to web – Connect BizTalk to file servers – Connect BizTalk to programs

BizTalk Business Rules Business rules – BizTalk speak for high level processes that determine what orchestrations will be performed – If record series confidential or restricted then go to orchestration to populate restrictions

BizTalk Orchestrations Orchestrations – BizTalk speak for the logic to process objects – Build in logic to calculate length of restrictions and database fields to populate

Initial BizTalk Development Goals & Objectives 1 – Write ARCAT BizTalk Code pipeline – Series already cataloged – Reduced duplication of work & manual data entry – Pipeline will work for CGI/BIN Web Service – Copy programming code to create next pipelines 2 – Write Web Services BizTalk Code pipeline – Copied from CGI/BIN ARCAT Service pipeline – Generic HTTP pipeline to Agencies Web Pages – Can use for PeDALS “Drop Box”

Initial BizTalk Development Goals & Objectives 3 – Write DHS BizTalk Code pipeline – Code copied from prior pipelines – Connect to a database – Solve issues related to external networks 4 – Write DWD BizTalk Code pipeline – Connect to a file server – Issues related to external networks should be solved, but may be different for file server connection

Initial BizTalk Development Goals & Objectives 5 – Write Call JHOVE, MetaExtractor, or C# Code in BizTalk to wrap records with preservation metadata orchestration – Once we can receive records through pipelines – Create logic to perform in BizTalk – Wrap records in XML in preservation metadata – First, execute a third party open source program such as JHOVE or MetaExtractor – Second, write code to interact with software programming languages such as C#

Measurement of Success 1 – Ability to extract MARC records from ARCAT and insert into database 2 – Ability to create external web services pipeline to transfer records to WHS 3 – Ability to create external file pipeline to DHS Quest Archives Manager to transfer records to WHS 4 – Ability to create external file pipeline to DWD to transfer records to WHS 5 – Ability to wrap electronic records with preservation metadata inside of BizTalk

Process to Write Code Iterative Process to: 1) Write BizTalk programming code 2) Test BizTalk programming code 3) Revise BizTalk programming code 4) Retest BizTalk programming code

Pre-BizTalk Training Development Plans Initial Thoughts on How I Would Get Objects into BizTalk pre September 2008 Initially PeDALS to use FTP to Receive Electronic Records – Authentication, integrity, security, and user friendliness issues – Now a generic “Drop Box” (probably a Web service) Initial Knowledge of BizTalk – A middleware application which at its core is an XML Message Queue – Uses XML to complete the connections to and from external applications Needed automated processes to provide BizTalk with XML objects

Pre-BizTalk Training Development Plans Use of Third Party Open Source Code to convert/wrap in XML: MARC21 to MARCXML Converter: MarcEdit: JHOVE: MetaExtractor:

Pre-BizTalk Training Development Plans MARC21 to MARCXML Converter: The MARCXML toolkit is a set of Java programs which allow users to convert to and from the MARC file format (including full character set conversion) and other formats available in the MARCXML architecture. The toolkit requires Java and works best with Java 1.4. If using a earlier version of Java, you need to modify the marcxml.bat file to include an xml parser in the classpath. Unzip the marcxml.zip file in a directory and run marcxml.bat for more instructions. Make sure java is in your PATH. In this version the stylesheets and character conversion mappings are downloaded via http from LC's website therefore Internet access is required when using these utilities.Java 1.4

Pre-BizTalk Training Development Plans MarcEdit: Is a MARC editing tool with a Native Z39.50 client and automatic batch conversions to/from: Comma/Tab Delimited Files Dublin Core EAD MARC OAI XML

Pre-BizTalk Training Development Plans JHOVE: JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects. Format identification is the process of determining the format to which a digital object conforms; in other words, it answers the question: "I have a digital object; what format is it?" Format validation is the process of determining the level of compliance of a digital object to the specification for its purported format, e.g.: "I have an object purportedly of format F ; is it?" Format validation conformance is determined at two levels: well-formedness and validity. – A digital object is well-formed if it meets the purely syntactic requirements for its format. – An object is valid if it is well-formed and it meets additional semantic-level requirements.

Pre-BizTalk Training Development Plans MetaExtractor: The Metadata Extraction Tool was developed by the National Library of New Zealand to programmatically extract preservation metadata from a range of file formats: – Images: BMP, GIF, JPEG and TIFF – Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF – Audio and Video: WAV and MP3 – Markup languages: HTML and XML The Metadata Extraction Tool: – Automatically extracts preservation-related metadata from digital files – Outputs that metadata in a standard format (XML) for use in preservation activities – The Tool was designed for preservation processes and activities, but can be used to for other tasks, such as the extraction of metadata for resource discovery

Pre-BizTalk Training Development Plans MarcEdit & ARCAT MARC Catalog Records: 1) Use Z39.50 gateway to retrieve records as.mrc files 2) Use MarcEdit to convert.mrc files to XML 3) BizTalk receives XML files 4) BizTalk performs logic 5) BizTalk inserts/updates SQLServer Database

Post September BizTalk Training Development Plans Pipelines can connect directly to: – Web services like ARCAT or OCLC or even HTTP – File servers like at DWD – Databases like DHS Quest Archives Manager Orchestrations can: – Call other orchestrations – Call other executable programs – Call other applications written in various software languages (C# or Java)

Post-BizTalk Training Development Plans ARCAT MARC Catalog Records: 1) Create pipeline From ARCAT To PeDALS Database 2) Create search page to enter variables or a list of series to retrieve from ARCAT Automates process Decreases manual labor needed compared to using MarcEdit Reduced duplication of work

Post-BizTalk Training Development Plans ARCAT MARC Catalog Records: 3) Create Orchestration - To automatically map data from MARC to PeDALS database - To execute MarcEdit (if necessary) - That will insert or update PeDALS database - Then export from PeDALS database to ARCAT, file, or OCLC

Possible Involvements (After Initial Development) State Archivist: Peter Gottlieb – Ultimate sign off on development Collection Development Archivist: Helmut Knies – Initial sign off on development Electronic Records Archivist: Dennis Bitterlich – Programming, testing, & verification Public Records Accessioner: Abbie Norderhaug – Testing & verification Head of Cataloging & Collections Mgmt Services: Maija Cravens – Policies & procedures

Possible Involvements (After Initial Development) Archivist: Jacquelyn Ferry – Policies & procedures – Testing & verification Information Technology Director: Paul Hedges – Hardware, networks, & security WI State Government Publications Librarian: Nancy Knies – State publications to store in LOCKSS DHS Records Officer: Steve Bose – Transfer of records DHS IT: Jovy Swanton – Hardware, network, programming, & security

Possible Involvements (After Initial Development) DPI WDDP: Abby Swanton – State publications to store in LOCKSS DWD Records Officer: Dawn Bluma – Transfer of records DWD IT – Hardware, network, programming, & security UW IT – Hardware, network, programming, & security

Thank You! Collecting, Preserving and Sharing Stories Since 1846