Packaged Serendipity: Preserving Context through Metadata Robert Cole Sharon Farnel Chuck Humphrey Digital Preservation Seminar University of Alberta 5 March 2010
What is at stake? Information describing context provides a foundation for understanding and identifying an object or artifact, whether it is analogue or digital. Digital objects, however, are more vulnerable to losing their context without systematically documenting this information. Standardized metadata are the best practice for documenting context for digital objects.
Two illustrations One involving a digital reproduction of analogue material The other involving born-digital objects
Setting the Scene “If it’s not online, then it doesn’t exist” –digital surrogate as exemplar Digitization as preservation –digital surrogate as object Imperative to meaningfully capture and preserve the intellectual context of the artifact through descriptive metadata
Challenges Identifying the information needed to provide context Capturing that information in a standardized way Determining what is necessary, what is ideal, what is possible
Payoff Meaningful, long-term access to and preservation of our cultural heritage
Born-digital microdata The University of Alberta is the digital custodian for the microdata produced by the Canadian Century Research Infrastructure (CCRI) project. The CCRI was funded by the Canadian Foundation for Innovation to create microdata samples of Canadian Census records from the 1911, 1921, 1931, 1941 and 1951 Censuses. These data files combined with the public use microdata for all of the Censuses since 1971 cover almost the entire 20th Century.
The census and context In seven years, Canada will be 150 years old. In another 150 years (2167), will researchers using microdata from the 20th Century have enough context to understand and analyze the data? Consider one example of a Census concept that is only 14 years old: visible minority status. Will this concept even be relevant in 150 years?
Visible minority status Consider the data behind the statistics in this table about the size of the visible minority population in the 2006 Canadian Census. Visible Minority Groups (15), Generation Status (4), Age Groups (9) and Sex (3) for the Population 15 Years and Over of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2006 Census - 20% Sample Data
Visible minority status How is visible minority status identified in the Census? Are aboriginals among the visible minority in Canada? What is the definition of visible minority?
Metadata and context Typical documentation for microdata tends to describe just the data file structure and the classification systems used to categorize responses. With the CCRI microdata, we are using a lifecycle metadata standard to incorporate contextual information across the stages of the Census out of which the microdata were produced. We are looking at context associated with process.
Stages in the lifecycle of survey 1 Program objective 2 Survey unit organized 3 Questionnaire & sample 4 Data collection 5 Data production & release 6 Analysis 7 Official findings released 8 Popularizing findings 9 Needs & gaps evaluation Preserving Information
Lifecycle metadata standards The preservation of digital objects requires metadata that represents context by incorporating information across the stages of its production and use. The descriptors in the metadata should include information about both the physical and intellectual integrity of the digital material.
Preserving context Two integrity issues are integral in preserving the context of digital objects. Physical integrity: minimizing the loss of contextual information arising from the digitization of an artifact; Intellectual integrity: ensuring the authenticity and completeness of the evidential information accompanying with or belonging to a digital object. Adapted from Paul Conway, Preservation in the Digital World, March 1996