An Inside Look At Managing MDPI Digitization Mike Casey, Director of Technical Operations, MDPI, Indiana University Andrew Dapuzzo, Director of North American Operations, Memnon Archiving Services
Manage… administer conduct govern guide handle maintain operate oversee regulate run supervise take care of train use advocate captain counsel designate direct disperse engineer execute head influence instruct minister officiate pilot ply preside request steer superintend watch care for carry-on engage in
MDPI Overview Digitally preserve all significant audio and video Complete by IU Bicentennial in 2020 University-wide initiative
MDPI Funding Office of the President Office of the Provost Office of the Vice President for Research Additional funding and in-kind support: UITS, Libraries
MDPI Digitization Strategy Memnon – parallel transfer (industrial-scale) workflows IU – 1:1 workflows for fragile formats and problem items Digitally preserve approximately 280,000 audio and video recordings Digitized 139,794 as of yesterday 6.5 PB in 4 years
MDPI Digitization Strategy Project file formats Audio preservation master – BWF, 24/96 Audio production master – same Video preservation master – FFV1/Matroska Video mezzanine – 50 Mbps MPEG-2
IU Media Digitization Studios Audio preservation 7,000 field cylinders, lacquer discs, mixed speed tapes, wire recordings Sound Directions 1:1 workflow Two critical listening rooms designed by Jeff Hedback Studer, Prism, Benchmark, WaveLab, etc.
IU Media Digitization Studios Video preservation Hi 8/8mm, Betamax, ½” EIAJ Problem VHS, Umatic, Betacam SP FFmpeg, home-grown interface Blackmagic card
IU Media Digitization Studios Productivity Systems thinking Theory of Constraints principles Scrum methodology Scripts
Theory of Constraints Why do we need to focus on productivity? Our to-do list is longer than the time available Our budget is limited and static The more we get digitized, the more recordings are preserved for future use
General Systems Theory
General Systems Theory Theory created in the 1930s in which complex systems are viewed holistically Way of thinking Aid evaluation of media preservation system design, effectiveness, sustainability, completeness, and behavior
General Systems Theory What is a system? Commonsense definition of system Set of interacting units or elements Forms an integrated whole Performs a function or achieves a goal
General Systems Theory Systems thinking Holistic rather than reductionistic Insight into the whole through links and interactions Small events may cause large changes
General Systems Theory A few basic principles Each element/part affects the whole Whole is greater than sum of the parts Inputs and outputs – transformations
Workflow management / scheduling Migration decision Workflow management Workflow management / scheduling Cleaning or physical restoration as needed System / Project Planning & Development Funding Personnel / Vendor Equipment Software Tools Creation / maintenance of software and scripts Selection for Preservation Assess research value Evaluate condition Consider political, technical, and other issues Establish priorities Digitization Analog playback A/D conversion Creation of Preservation Master Files Local filenames Technical metadata Structural metadata Checksums Quality control Local storage solution Post-Transfer Processing Generation of derivatives Marking areas of interest in files Signal processing (if appropriate) Preliminary Work / Pilot Project Exploratory transfers and metadata collection Reassessment of digitization plan Collection Setup Gather and assess documentation Evaluate collection needs / condition Assess cataloging / descriptive metadata issues Develop digitization plan Assess and calibrate equipment Ingestion into / Copy to Long-Term Storage Solution Preservation packages Periodic Evaluation Data integrity checking Format obsolescence analysis Migration New carrier New format
Theory of Constraints
Theory of Constraints Overview Developed by Dr. Eliyahu Goldratt in the early 1980s Methodology for identifying the most important limiting factor that stands in the way of achieving a goal Improve constraint until it is no longer a limiting factor
Theory of Constraints Overview– define terms Constraint is a limitation or restriction Constraint = bottleneck Bottleneck = stage in a process where progress is impeded, “weakest link in the chain” Throughput is the quantity of raw material processed within a given time
Theory of Constraints Overview Every process (workflow) has a constraint Throughput is governed by how much can be run through the constraint per period of time Throughput improves when constraint improves Improving a non-bottleneck does not make the workflow more productive
Theory of Constraints Types of constraints Physical – equipment, material, people, space shortages Policy – required or recommended ways of working (ex. No bidirectional transfers) Paradigm – deeply ingrained beliefs or habits Market – production capacity exceeds sales
Theory of Constraints Five focusing steps Identify – find the bottleneck Exploit – quick improvements using existing resources Subordinate –everything else must support the above Elevate – further actions to eliminate bottleneck – may include capital investment Repeat – continuous improvement cycle
Typical audiotape digitization workflow Manager assigns work Engineer Inspects tape Winds tape Checks and repairs splices Repairs pack problems Gathers technical metadata Engineer digitizes
Theory of Constraints Where is the bottleneck in our system? Workflow step with capacity equal to or less than the demand placed on it Where are the non-bottlenecks? Workflow step with capacity greater than demand How can we exploit the bottleneck?
Theory of Constraints Exploit and elevate Offload tasks from the bottleneck and place elsewhere Place QC before the bottleneck Create inventory buffer before bottleneck
New digitization workflow Manager assigns work AV Specialist Inspects tape Winds tape Checks and repairs splices Repairs pack problems Gathers technical metadata Engineer digitizes
New digitization workflow Manager assigns work Engineer digitizes AV Specialist Inspects tape Winds tape Checks and repairs splices Repairs pack problems Gathers technical metadata Engineer digitizes
Theory of Constraints Exploit and elevate Offload tasks from the bottleneck and place elsewhere – AV Specialist Place QC before the bottleneck– AV Specialist Create inventory buffer before bottleneck Keep the bottleneck working as much as possible If bottleneck is idle for an hour, lose the cost of an hour for entire system An hour saved at a non-bottleneck is a mirage
Management of Digitization Workflow How can we – Support choice Foster investment and engagement Keep morale high Track which recordings should be digitized next
Scrum
Management of Digitization Workflow Scrum methodology Part of the Agile software development movement Emphasizes collaboration, team self-management, and flexibility to adapt to emerging realities
Management of Digitization Workflow Scrum methodology characteristics Time is divided into short work cadences known as sprints Two week sprints Evaluate and plan next steps at end of sprint Daily meetings to assess progress (standups)
Management of Digitization Workflow Scrum methodology Enter backlog into Jira Backlog comes from deliveries – one project at a time Divide recordings into groups based on technical characteristics Engineers select groups to commit to for sprint Story points = duration of group (minutes)
Management of Digitization Workflow Scrum methodology Sprint meeting Evaluation of sprint Selection of groups for next sprint
Management of Digitization Workflow Scrum methodology Built-in frequent feedback loops for both technical staff and administration Always know where things stand Supports philosophy of constant improvement Focus on two-week commitments not endless future Engineers have choice
New digitization workflow Manager assigns collection Engineer chooses recordings Engineer digitizes AV Specialist Inspects tape Winds tape Checks and repairs splices Repairs pack problems Gathers technical metadata Engineer digitizes
Software Automation
MDPI Post-Processing System IUMDS Packager application Script that runs on local capture machine Set to run every evening Acts on files created during the day Files placed in watch folder by engineer
MDPI Post-Processing System IUMDS Packager application Collect metadata from physical object database (POD) in order to… …Embed metadata into preservation master files using FFmpeg Using FFmpeg, create production masters for audio or mezzanine files for video from preservation masters, embedding metadata
MDPI Post-Processing System IUMDS Packager application Generate XML manifest Digital provenance metadata, file list Copy preservation masters, derivatives, XML manifest and QCTools files to remote drop box To post-processing system
MDPI Post-Processing System IUMDS Packager application Windows app written in C# Acts as an intermediary – retrieves metadata via web-service calls from the POD Uses metadata to drive FFmpeg, FF probe, and BWF MetaEdit Provides command line arguments to above utilities which do the heavy lifting
An Inside Look At Managing MDPI Digitization Mike Casey, Director of Technical Operations, MDPI, Indiana University Andrew Dapuzzo, Director of North American Operations, Memnon Archiving Services