Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preservation Metadata and PREMIS Vilas Wuwongse Asian Institute of Technology 1.

Similar presentations


Presentation on theme: "Preservation Metadata and PREMIS Vilas Wuwongse Asian Institute of Technology 1."— Presentation transcript:

1 Preservation Metadata and PREMIS Vilas Wuwongse Asian Institute of Technology 1

2 Outline Introduction to Metadata What is preservation metadata? Why is preservation metadata needed? How to create preservation metadata PREMIS Conclusion Acknowledgement 2

3 INTRODUCTION TO METADATA 3

4 Metadata Metadata is often defined as “Structured Data about Data”. It defines information about one or more characteristics of the data: – Data’s name, description, purpose, created date-time, creator, basic information. For example – Library catalogues 4

5 Metadata Categories (1) Descriptive – describes identification and information of resource: title, author, abstract and keywords. Structural – informs relationships within and among resource objects: web page containing html files, image files, css files, and javascript files, linking to others files. Technical (for physical files) – Includes technical information that applies to any file type: software/hardware environment, checksums, digital signatures, image width, elapsed time. 5

6 Metadata Categories (1) Administrative – provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it – Two important subsets: Rights management metadata, dealing with intellectual property rights Preservation metadata, containing information needed to archive and preserve a resource. 6

7 WHAT IS PRESERVATION METADATA? 7

8 Preservation Metadata (1) Information that is essential to ensure long- term accessibility of digital resources A verifier of the past A communication to the future A best guess on the future: no prescriptive list of metadata elements available Must be able to exist independently from the systems which were used to create them 8

9 Preservation Metadata (2) Sometimes considered a subset of administrative metadata, assisting in the management of information technical metadata, assisting access to the digital content and ensuring that the digital resources can be rendered originally Basic functional objectives: [OCLC] Providing knowledge about actions to maintain digital resource over the long-term Ensuring that the digital resources can be rendered originally 9

10 Information Included in Preservation Metadata Provenance – Describe history of creation, ownership, access, and change Authenticity – Ensure trustworthiness (Does digital resource render originally?) Preservation activities – Record process supporting preservation, such as migration Technical environment – Provide name and version of hardware, platform, OS, and software that is required to render digital resources Rights management – Inform concern of intellectual property rights and agreement that need to be observed when execute preservation process. E.g. does a creator allow to copy his/her work or not? 10

11 Example 11 16 preservation metadata elements ( recommended by oclc.org, May 1998) Date Transcriber Producer Capture Device Capture Details Change History Validation Key Encryption Watermark Resolution Compression Source Color Color Management Color Bar/Gray-scale Bar Control Targets

12 WHY IS PRESERVATION METADATA NEEDED? 12

13 Why Preservation Metadata? Preservation metadata helps the implementation of preservation policies 13

14 Preservation Policies (1) define how to manage digital assets in a repository to avert the risk of content loss in terms of, e.g., – data storage requirements – preservation actions – Responsibilities 14

15 Preservation Policies (2) Specify preservation goals to ensure that: – digital content is within the physical control of the repository – digital content can be uniquely and persistently identified and retrieved in the future – all information is available so that digital content can be understood by its designated user community – significant characteristics of the digital assets are preserved even as data carriers or physical representations change 15

16 Preservation Policies (3) Specify preservation goals to ensure that: – physical media are cared for – digital objects remain renderable or executable – digital objects remain whole and unimpaired and that it is clear how all the parts relate to each other – digital objects are what they purport to be All of these preservation functions depend on the availability of preservation metadata 16

17 HOW TO CREATE PRESERVATION METADATA 17

18 I want to have my own restaurant. What should I do? 18 To Begin

19 What you should know 19 To Begin

20 What you should plan 20 To Begin

21 How you should run 21 To Begin

22 I won’t give you a blueprint or concrete model for running a restaurant. But I’ll guide you WHAT and HOW you have to consider when planing to run a restaurant business. 22 To Begin

23 I want to build an archival information system. What should I do? 23 OAIS: Introduction

24 Understand OAIS reference model Understand OAIS reference model 24 OAIS: Introduction

25 OAIS Background Reference Model for an Open Archival Information System (OAIS) – Development led by the Consultative Committee for Space Data Systems (CCSDS) – Issued as CCSDS Recommendation (Blue Book) 650.0-B-1 (January 2002) – Also adopted as: ISO 14721:2003 25

26 OAIS Model (1) Conventional categories – Administrative, Descriptive (e.g. MARC, Dublin Core), Structural OAIS model categories – Preservation Description Information Reference Information: to enumerate and describe identifiers Provenance Information: to document the history of the content information (creation, modification, custody) Context Information: to document the relationship of the content to its environment Fixity Information: to document authentication mechanisms 26

27 OAIS Model (2) – Content Information Content Data Object Representation Information the information needed for proper rendering, understanding, and interpretation of a digital object's content – Packaging Information – Descriptive Information the information used to aid searching, ordering, and retrieval of the objects 27

28 OAIS Model (3) 28

29 OAIS Functional Entities There are three types of information package: the Submission Information Package (SIP), which conveys the information provided to the archive by the user and deposit system. the Archival Information Package (AIP), which is the stored archival version of the information. the Dissemination Information Package (DIP), which is the version of the information available to users. 29

30 30 http://breastfeedinglib.saiyairak.com

31 PREMIS 31

32 PREMIS Overview 32

33 What? PREservation Metadata: Implementation Strategies Sponsored by Library of Congress (LOC) People usually refer to “PREMIS” as “Data Dictionary” Represented in XML format 33

34 PREMIS Data Dictionary Set of Semantic Units (which will be called Metadata Elements when they are implemented) Metadata for digital objects so that they – Can be read from media – Can be rendered – Are stored securely – Keep track of changing formats Metadata Scope – Format-spec e.g. audio, video, image, … – Implementation-specHow to access it (by app) – Descriptive metadataData properties; like, MARC, DC – Detailed info(For media or hardware) – Agents infoe.g. people, org, or software – Right infoe.g. license, permission 34

35 Where is PREMIS? 35 PREMIS responses itself as a coordinator among several types of metadata in order to perform preservation function on all digital resources. Thus, PREMIS is a small core at the heart of preservation metadata

36 PREMIS Data Model 36

37 PREMIS Data Model Intellectual Entities Objects Rights Statements Agents Events

38 Intellectual Entities Examples: Rabbit Run by John Updike (a book) “Maggie at the beach” (a photograph) The Library of Congress Website (a website) The Library of Congress: American Memory Home page (a web page) Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database) May include other Intellectual Entities (e.g. a website that includes a web page) **Has one or more digital representations** Not fully described in PREMIS DD, but can be linked to in metadata describing digital representation

39 Objects Examples: chapter1.pdf (a file) chapter1.pdf + chapter2.pdf + chapter3.pdf (representation of a book w/3 chapters) TIFF file containing header and 2 images (2 bitstreams (images), each with own set of properties (semantic units): e.g., identifiers, technical metadata, inhibitors, … ) Discrete unit of information in digital form **Objects are what repository actually preserves** Three types of Object: – FILE: named and ordered sequence of bytes that is known by an operating system – REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity – BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand- alone file)

40 Thailand Map Intellectual Entity Object 1 Object 2 Object 3 RepresentationFile 1 jpeg file 1 TIFF file include: 3 bitstreams of images of map layers Province mountain, river It can be a web page that contains 3 files HTML CSS JPEG Example types of object for the preservation of Thailand Map 40 Object Examples: Thailand Map

41 Object Example: book in two versions Intellectual Entity Da Vinci Code by Dan Brown Representation 1 Page image version File 1: page1.tiff File 2: page2.tiff File N: pageN.tiff File N+1: METS.xml Representation 2 ebook version File 1: book.lit

42 Events Examples: Validation Event: use some tools to verify that chapter1.pdf is a valid PDF file Ingest Event: transform an OAIS SIP into an AIP Migration Event: create a new version of an Object in an up-to-date format An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle Determining which Events are in scope is up to the repository (e.g., Events which occur before ingest, or after de-accession)

43 eventType Event TypeDescription capture the process whereby a repository actively obtains an object compression the process of coding data to save storage space or transmission time creation the process of removing an object from the inventory of a repository deaccession the process of removing an object from the inventory of a repository decompression the process of reversing the effects of compression decryption the process of converting encrypted data to plaintext deletion the process of removing an object from repository storage Event TypeDescription digital signature validation the process of determining that a decrypted digital signature matches an expected value dissemination the process of retrieving an object from repository storage and making it available to users fixity check the process of verifying that an object has not been changed in a given period ingestion the process of adding objects to a preservation repository message digest calculation the process by which a message digest (“hash”) is created migration a transformation of an object creating a version in a more contemporary format

44 Agents Examples: Rathachai Chawuthai (a person) Asian Institute of Technology (an organization) Dark Archive in the Sunshine State implementation (a system) JHOVE version 1.0 (a software program) Person, organization, or software program/system associated with an Event or a Right (permission statement) Agents are associated only indirectly to Objects through Events or Rights Not defined in detail in PREMIS DD; not considered core preservation metadata beyond identification

45 Rights Statements Example: Rathachai Chawuthai grants AIT digital repository permission to make three copies of metadata_fundamentals.pdf for preservation purposes. An agreement with a rights holder that grants permission for the repository to undertake an action(s) associated with an Object(s) in the repository. Not a full rights expression language; focuses exclusively on permissions that take the form: – Agent X grants Permission Y to the repository in regard to Object Z.

46 Semantic units pertaining to objects: technical metadata objectIdentifier preservationLevel significantProperties objectCategory objectCharacteristics – fixity – size – format – creatingApplication – inhibitors – extension originalName storage environment signatureInformation relationship linkingEventID linkingIntellectual EntityID linkingRights StatementID

47 Semantic units pertaining to Events: provenance and preservation activity eventIdentifier eventType eventDateTime eventDetail eventOutcome eventOutcomeDetail linkingAgentIdentifier linkingObjectIdentifier

48 Semantic units pertaining to Rights  rightsStatement  rightsStatement Identifier  rightsBasis  copyrightInformation  licenseInformation  statuteInformation  rightsGranted  act  restriction  termOfGrant  rightsGranted  linkingObjectIdentifier  linkingAgentIdentifier  rightsExtension

49 Semantic units pertaining to Agents agentIdentifier agentName agentType

50 PREMIS PREMIS with METS 50

51 METS Background XML based Describes the structure of digital objects and associates various kinds of metadata with their components Uses the XML Schema facility for combining vocabularies from different Namespaces for extensibility Metadata is categorized into separate sections (embedded or linked) Records the names and locations of the files that comprise those objects (embedded or linked) Records a map of hyperlinks between components Associates executable behaviour with the components 51

52 The Structure of a METS file METSheader dmdSec admSec behaviorSec structMap fileSec file inventory descriptive metadata administrative metadata behaviour metadata structural map

53 fileSecfileGrp file FLocat

54

55 The inside of a METS file METSheader dmdSec admSec behaviorSec structMap fileSec file inventory descriptive metadata administrative metadata behaviour metadata structural map

56 Cobbett's parliamentary history of England, from the Norman Conquest, in 1066 to the year, 1803 : from which last-mentioned epoch it is continued downwards in the work entitled, "The parliamentary debates" Cobbett's Parliamentary History - volume 2 $aGreat Britain. Parliament. spn

57 METS with PREMIS as OAIS Information Package OAIS repository functions for which METS is often used are submission or exchange (SIP), archiving (AIP), dissemination (DIP) A METS package is a good candidate for realization of an information object in an OAIS repository PREMIS satisfies need for Preservation Description Information: provenance, context, reference and fixity PREMIS is an elaboration and translation of OAIS information model into implementable semantic units 57

58 58

59 Why do we need guidelines for using PREMIS with METS? Contents of each information package may vary depending on its function within a repository Need to determine how to include representation metadata and associate it with package components PREMIS data entities (objects, events, rights, agents) do not map perfectly to METS categories for representation metadata (techMD, digiProvMD, rightsMD, sourceMD) There are redundant elements between the two standards Both have extensibility mechanisms Flexibility of both standards requires implementation choices Predictability will enhance the ability for exchange with minimal human intervention 59

60 Guidelines for Using PREMIS with METS for Exchange 60 http://www.loc.gov/standards/premis/guidelines-premismets.pdf

61 Benefits of using PREMIS in METS Packages together metadata necessary for digital preservation in a predictable format PREMIS provides technical and event metadata METS provides structural metadata Both standards are – Openly available – Flexible – Extensible – Maintained by an open process Provides an exchange standard between repositories 61

62 Conclusions Information preservation supports an organization’s identity preservation An organization must have a preservation policy A preservation policy is realized by means of preservation metadata PREMIS Data Dictionary provides critical piece of reliable digital preservation infrastructure comprising technology, standards, and best practice PREMIS Data Dictionary is a building block with which effective, sustainable digital preservation strategies can be implemented for various domains PREMIS is being widely implemented and experience using it needs to be shared

63 URLs PREMIS Maintenance Activity: http://www.loc.gov/standards/premis/ PREMIS Data Dictionary for Preservation Metadata, version 2.1: http://www.loc.gov/standards/premis/v2/premis-dd-2-1.pdf

64 Acknowledgement Some of the slides are based on slides by – Priscilla Caplan, Florida Center for Library Automation – Rathachai Chawuthai, Asian Institute of Technology – Angela Dappert, the British Library – Rebecca Guenther, Library of Congress – Brian Lavoie, OCLC 64


Download ppt "Preservation Metadata and PREMIS Vilas Wuwongse Asian Institute of Technology 1."

Similar presentations


Ads by Google