Human Rights Archives and Documentation, CHRDR Conference 4- 6 October 2007 Issues in Human Rights Web Archiving Robert Wolven Columbia University Libraries
Libraries have a mission to build, organize, and preserve coherent collections for research Libraries have a mission to build, organize, and preserve coherent collections for research There’s a great deal of human rights-related content on the web There’s a great deal of human rights-related content on the web Much of it is not currently collected by libraries Much of it is not currently collected by libraries Something should be done about that Something should be done about that
A great deal of content exists only online A great deal of content exists only online There’s a high risk that some will disappear There’s a high risk that some will disappear Libraries and archives are custodians of our cultural heritage Libraries and archives are custodians of our cultural heritage Libraries and archives should lead in preserving “at risk” content Libraries and archives should lead in preserving “at risk” content
Web Archiving as Preservation Small footprint in organization Small footprint in organization The Hoover effect The Hoover effect Haphazard library collections Haphazard library collections Ineffective access Ineffective access
A Lot of Content
Much of it is not … collected Refugees International Refugees International 40 documents on web site 40 documents on web site 0 in Columbia collections 0 in Columbia collections 10 listed in OCLC 10 listed in OCLC 1 held by more than 2 libraries 1 held by more than 2 libraries No library holds more than 3 No library holds more than 3
Web Archiving Issues Ways and Means Ways and Means Selection policies Selection policies Permissions – and Obligations Permissions – and Obligations Organization and Integration Organization and Integration Presentation and Uses Presentation and Uses Sharing the Costs and Benefits Sharing the Costs and Benefits Organizational Transformation Organizational Transformation
Center for Research Libraries’ Center for Research Libraries’ Political Communications Web Archive Project Project website: Final report:
Web Archiving Tools Archive-It (Internet Archive) Archive-It (Internet Archive) PANDAS (National Library of Australia) PANDAS (National Library of Australia) OCLC Digital Archive OCLC Digital Archive
“I only want to download text/html and nothing else. Can I do it?” You can … add a filter that excludes all filters that end in other than 'html|htm', etc., or, if you want to instead look at document mimetypes, you can Add a ContentTypeRegExpFilter filter as a midfetch filter to the http fetcher. You can … add a filter that excludes all filters that end in other than 'html|htm', etc., or, if you want to instead look at document mimetypes, you can Add a ContentTypeRegExpFilter filter as a midfetch filter to the http fetcher. [From Heritrix FAQ [From Heritrix FAQ
Policy Technology Full web site or selected content? Full web site or selected content? Preserve relationships, “look and feel”? Preserve relationships, “look and feel”? All file types? All file types? How often? How often?
From the European Union Agency for Fundamental Rights website “Corrigendum Changes have been made to the report on "Muslims in the European Union – Discrimination and Islamophobia" after it was printed. Following pages are replacing the Annex page in the EN and FR version of the report. PDF”
Selection by Type of Agency Governmental Governmental International International Academic Academic Educational Educational
Selection by Focus Global, regional, local Global, regional, local Ethnicity, religion, gender, age Ethnicity, religion, gender, age Legal, medical, economic Legal, medical, economic Crisis-driven Crisis-driven
Selection by Content Fixed documents: Fixed documents: Case studies Case studies Position papers Position papers Topical reports Topical reports Press releases Press releases Bulletins, newsletters Bulletins, newsletters Activity reports Activity reports
Selecting by Content Non-textual (image, sound, video) Non-textual (image, sound, video) Ephemeral, dynamic content Ephemeral, dynamic content Redundant (?) content: Redundant (?) content: Languages, formats Languages, formats Republished or unique? Republished or unique?
Rights and Obligations Permissions: ask or assume Permissions: ask or assume Rights: Rights: Dark archive Dark archive Closed archive Closed archive Conditional exposure Conditional exposure Obligations: Obligations: Parallel (mirror) access Parallel (mirror) access Free, reliable access Free, reliable access Perpetual access Perpetual access
Organization and Integration … or, now that we have it, how do we know what we’ve got? How do other agencies know what’s been done? How do researchers find it?
“From 1 March the European Monitoring Centre on Racism and Xenophobia (EUMC) became the EU Agency for Fundamental Rights (FRA). The content on the website is being gradually transformed to reflect the scope, activities and products of the new Agency.”
Integrating Access Through Authority Control Through Authority Control Through controlled vocabulary Through controlled vocabulary Through series Through series
Integrating Collections With print – in the catalog With print – in the catalog With archives – in finding aids With archives – in finding aids With digital collections – in … With digital collections – in …
Use Internal organization and navigation Internal organization and navigation Indexing Indexing Analytical tools Analytical tools Citation: pedigree and persistent links Citation: pedigree and persistent links
Sharing Costs and Benefits Centralized Collaboration Centralized Collaboration Distributed Collaboration Distributed Collaboration Disclosure (at what level of detail) Disclosure (at what level of detail) Exposure (to the web; OAI-PMH) Exposure (to the web; OAI-PMH)
Transformative Action Concept of “collecting” Concept of “collecting” Modes of selection Modes of selection Bridging communities of practice Bridging communities of practice
“Where do you stop?”