Publishing Cultural Heritage Alastair Dunning Digitisation Programme Manager JISC (Joint Information Systems Committee) a.dunning@jisc.ac.uk, 0203 006 6065 UCL Presentation, 19th June
JISC Digitisation Programme Manager for 8 projects, part of 16 project programme to digitise UK cultural heritage. For example British Newspapers 1620-1900 Pre-Raphaelite Art Images from Scott Polar Research Institute Nineteenth-Century Pamphlets 20th-century Government Cabinet Papers http://www.jisc.ac.uk/digitisation http://digitisation.jiscinvolve.org/ (Blog)
Metadata?
Data Modelling?
Accessible files? Dissemination?
Growth of Digitisation Possibilities of Internet inspired rapid data capture of precious objects all over the world But maybe this started out as a reactive cottage industry? Museums, Libraries and Archives rushing to digitise material and dump it on the web How long does this material last on the Internet? Is it good quality? Can people locate it? Can they use it? Quantity of material and issue of long-term digitisation effects published material. Added pressure supplied by Google digitisation programme …. Digitisation is difficult
Need for an infrastructure To address the issues raised in previous slide How long does this material last on the Internet? Is it good quality? Can users locate it? Can they use it? Illustrations from the British model; other country’s models may be different Demonstration that mass digitisation is complex, involving multiple players and technologies Good infrastructure allows publication of cultural heritage to happen quickly; to show value for money; to be usable; to be easily accessible by educational communities and general public
Data capture To convert the physical to digital Flat scanners, robotic scanners, 3D scanners, direct capture via digital camera, remote controlled camera, conversion via medium (e.g. microfilm), reel-to-digital, millions of typists To cope with all kinds of material (newspapers, stained glass, banners, posters, maps, census, reports, grey literature, artefacts, film, audio … ) Need to have keen idea of priorities for digitisation Ensure competition but not redundancy (Keep machines working; keep staff in place) Requires research on success of methodologies, dialogue with other subject areas (i.e. sciences)
University of Southampton Robotic Scanner – Details at http://www.soton.ac.uk/mediacentre/news/2004/nov/04_181.shtml If you don’t have a range of options for data capture – cultural heritage won’t get digitised
Standards and Formats What file formats to ensure high-quality, long-term use Images - TIFF, but also JPEG2000, PNG Text – XML (and flavours thereof), but also RTF, Word Sound – WAV, AIFF, MP3, Ogg (formats and wrappers) Film – MJPEG, MPEG4, AVI, Quicktime, Flash (ditto) Normally developed internationally, but local variations occur Co-ordination, certification, co-operation, involvement and decisiveness at national and international levels As with all parts of infrastructure, research and innovation If you don’t have this – see current mess over video!
Metadata Requires sophisticated of experts who know the digital objects (e.g. newspapers, sound recordings, census reports) As with before, international co-ordination, certification, co-operation to develop international schema and vocabularies These are required at subject level, format level, technical levels, preservation levels. For example Dublin Core, MODS – generic resource description VRA4 – digital image description, including technical details METS – wraps together different information on a digital object PREMIS – preservation metadata over long term If you don’t have this – trust and authenticity, interoperability, resource discovery are severely hindered
Data Delivery I.e. the people that build websites Complex engagement between commercial (Google, ProQuest, Thomson Gale, JSTOR) and non-commercial suppliers (universities, museums etc.) Huge range of potential business models Institutional subscription, Personal subscription Pay-per-view, Google Ads Open Access Mixed model But no definitive answers about the more successful
Data Delivery – What is required Ability to regularly serve up websites and data Systems to deliver a range of digital content (e.g. newspapers, audio, posters, artifacts) Low overheads and year on year costs Good understanding of end-users Working in partnership with other content providers Commitment to innovation and good practice If you don’t have this – wheel will be constantly reinvented, users will be driven away, material will be siloed
Preservation Facilities Digital objects become obsolete with time. Experts are required to ensure this does not happen Expertise in handling digital assets (content and all metadata) in long term, and preferably also the hardware and media that hold such content Must be trusted and reliable Good relationship with data delivery providers Continual research – why, what and how to preserve? Without this, digital data will be lost, endangering the entire investment made in digitisation
Preservation Facilities – Case Study A good example from the late 1990s Orphaned archaeological data rescued from obsolescence CDs, floppy discs, PCs, databases, word files, CAD files all left But lack of metadata meant not all data could be retrieved http://ahds.ac.uk/creating/case-studies/newham/
Digitisation Infrastructure Network capabilities Authentication Tools Development Usability testing Copyright clearing houses Consultants Trained expert staff Suitable courses Data capture Standards, Formats Metadata Data Delivery Preservation And of course Money Skill is in making sure these pieces fit together