FITS: The File Information Tool Set

Slides:



Advertisements
Similar presentations
File Format Identification and Archival Processing
Advertisements

Preserv Preservation Eprint Services Simple Preservation Services – towards Proactive Support for the Institutional Repository.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
SCAPE Carl Wilson Open Planets Foundation SCAPE Training Guimarães Characterisation An introduction to the identification and characterisation of.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v.
DRS 2 Metadata Migration June 25, Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
E-Science Data Information and Knowledge Transformation The BinX Language.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Depositing e-material to The National Library of Sweden.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
3. Technical and administrative metadata standards Metadata Standards and Applications.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
PREMIS in the Real World: some reflections on constraints Jan Lavelle Senior Librarian (Systems Development) State Library of Tasmania.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style ADVANCED PROGRAMING PRACTICES API documentation.
Nathan McMinn, Technical Consultant with Alfresco
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Linux Operations and Administration
September, 2005What IHE Delivers 1 Document Registry and Repository Implementation Strategies IHE Vendors Workshop 2006 IHE IT Infrastructure Education.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Artur Kulmukhametov Vienna University of Technology SCAPE PW Training Event Aarhus, November 2013 Content Profiling and C3PO.
Glen Robson Head of Systems Unit National Library of Wales
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
PREMIS and the National Digital Newspaper Program Justin Littman Office of Strategic Initiatives, LC
Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California.
Digital Preservation System ExLibris Rosetta OAI6 | Geneva | June 2009 Dr. Axel Kaschte, Strategy Director Europe.
DRS 2 Orientation Harvard University Library September 30, 2010 DRS = Digital Repository Service.
George E. Brown, Jr. Network for Earthquake Engineering Simulation Data Curation and Quality Assurance at NEES Stanislav Pejša HUBbub 2012 Indianapolis,
HUB AND SPOKE TOOL SUITE PREMIS Implementation Fair – 7 October 2009 Bill Ingram Visiting Research Programmer University of Illinois at Urbana-Champaign.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Javadoc: Advanced Features & Limitations Presented By: Wes Toland.
The Statistics New Zealand Prototype PREMIS creation tool Euan Cochrane PREMIS Fair October 2009
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
Introduction to metadata
Andrea Goethals, Harvard Library ASERL Webinar 2013 File Information Tool Set.
METS, Standards and Rights METS, Safonau a Hawliau Vicky Phillips Digital Standards Manager Rheolwr Safonau Digidol 4 th March ydd Mawrth 2014.
RADE Project Generator. Outline Introduction Project Types Options Generator Files and Layout Source Control (Video 3 Min) Templates and Nifty Tools.
DRS 2 Project (2008 – Present!) Andrea Goethals, Harvard Library Digital Preservation Management Workshop, MIT June 13, 2013.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving 2015 Andrea Goethals. Franziska Frey and David Ackerman.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
Arwen Hutt & Bradley D. Westbrook Metadata Analysis and Specification Unit UCSD Libraries For PREMIS Workshop La Jolla, CA, 11 Feb 2008.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Repository-specific Spoke Scripts Content Repository JSR-170/283 Content Repository for Java Technology API Normalized H&S METS Files METS Import/ExportMETS.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Maven. Introduction Using Maven (I) – Installing the Maven plugin for Eclipse – Creating a Maven Project – Building the Project Understanding the POM.
1 PSI/PhUSE Single Day Event – SAS Applications – June 11, 2009 SAS Drug Development from the Inside Magnus Mengelbier Director.
Getting it together! Automating Standardized Technical Metadata for Images and Audio Jody L. DeRidder University of Alabama Libraries DLF 2015 October.
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Convert-It audio converter is fast, complete, easy yet powerful software that allows you to convert between a large collection of audio file formats.
Building Digital Archives Mark Phillips Cathy Hartman June 6, 2008.
Joint Meeting of CSUL Committees,
DAITSS and the Florida Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Integrating PREMIS and METS
Andrea Goethals, Harvard Library
Presentation transcript:

FITS: The File Information Tool Set

Background FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008 First public release Spring 2009: http://fits.googlecode.com

Why? Needed an automatic way to identify and extract metadata for a wide range of file types No single file analysis tool satisfied our needs

Design Goals Act as a wrapper around other open source tools Extensible Needs to be a standalone command line tool and also provide an API Allow priority setting for tools Open source

The Tools Current tools: 3 Categories Jhove 1.5 Exiftool National Library of New Zealand Metadata Extractor (NLNZ) DROID FFIdent File Utility 3 Categories File Identification (all of them) Metadata Extraction (Jhove, Exiftool, NLNZ) format Validation (Jhove)

Process

Features Conflict management Value normalization Tool prioritization “inches” vs “2” Tool prioritization Format tree for understanding more specific format identities. PDF/A is a more specific version of PDF

Example Output <fits> <identification> <identity format="Graphics Interchange Format" mimetype="image/gif"> <tool toolname="Jhove" toolversion="1.5" /> ... </identity> </identification> <fileinfo> <size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">40149</size> <md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">265c9345ebf93c89d472766fda095de4</md5checksum> </fileinfo> <filestatus> <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed> <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</valid> </filestatus> <metadata> <image> <height toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">1024</height> </image> </metadata> </fits>

Configuration All settings are in the fits.xml config file Enable/disable tools (available in the API too) Prevent tools from processing files with specific file extensions Set tool priority Add new tools Use your own consolidator code Report or ignore conflicts Options to display original tool output

Sample Configuration File <fits_configuration> <!-- Order of the tools determines preference --> <tools> <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process --> <tool class="edu.harvard.hul.ois.fits.tools.jhove.Jhove" exclude-exts="dng,mbx"/> <tool class="edu.harvard.hul.ois.fits.tools.fileutility.FileUtility" exclude-exts="dng,wps"/> <tool class="edu.harvard.hul.ois.fits.tools.exiftool.Exiftool" exclude-exts="txt,wps,vsd"/> <tool class="edu.harvard.hul.ois.fits.tools.droid.Droid" exclude-exts="dng"/> <tool class="edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor" exclude- exts="dng,zip,odb,ott,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,odm,oth"/> <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.FileInfo"/> <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.XmlMetadata"/> <tool class="edu.harvard.hul.ois.fits.tools.ffident.FFIdent" exclude-exts="dng,wps,vsd"/> </tools> <output> <dataConsolidator class="edu.harvard.hul.ois.fits.consolidation.OISConsolidator"/> <display-tool-output>true</display-tool-output> <report-conflicts>true</report-conflicts> <validate-tool-output>false</validate-tool-output> <internal-output-schema>xml/fits_output.xsd</internal-output-schema> <external-output-schema>http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd</external-output-schema> <fits-xml-namespace>http://hul.harvard.edu/ois/xml/ns/fits/fits_output</fits-xml-namespace> </output> <!-- file name of the droid signature file to use in tools/droid/--> <droid_sigfile>DROID_SignatureFile_V35.xml</droid_sigfile> </fits_configuration> 10

Some Limitations... Speed Technical metadata only returned if the tool that reported it is in the first <identity> block FITS considers a successful identification to be a combination of the format name and mime type

Future Plans More tools Apache Tika (text document formats) Jhove 2 Aduna Aperture (text, documents, email formats) Mediainfo (audio and video formats) Better audio and video format support as we add object support for them to DRS2

Wrap Up http://fits.googlecode.com http://ots-schemas.googlecode.com Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, TextMD, DocumentMD, and soon AES audio metadata More information on DRS2: http://hul.harvard.edu/ois/systems/drs/enhance ments.html