Download presentation
Presentation is loading. Please wait.
Published byPaula Barrett Modified over 9 years ago
1
1 CS 502: Computing Methods for Digital Libraries Lecture 28 Current work in preservation
2
2 Administration Review class Tuesday, 12:20. Room to be announced on web site "Notices". Format, questions (by you) and answers (by me). Laptops Return before examination. Bring receipt to examination. Examination Part 1: 5 questions, 1.5 hour time limit Part 2: nomad experiment questionnaire, no time limit
3
3 Education and research Digital libraries in a state of flux: Much of this class has described material that is still experimental Cornell people and our colleagues are actively involved in many aspects This class: Recent activities in preservation of materials on the web Some of my recent work
4
4 Some light reading William Y. Arms, "Preservation of scientific serials: three current examples." Journal of Electronic Publishing, 5(2), December 1999. http://www.press.umich.edu/jep/05-02/arms.html William Y. Arms, "Economic models for open-access publishing." iMP, March 2000. http://www.cisp.org/imp/march_2000/03_00arms.htm
5
5 Preservation of serials September 1999 -- Workshop chaired by Deanna Marcum, Don Waters, Cliff Lynch Issues in preserving online journals for 100 years Invited paper by William Arms "Preservation of Scientific Serials: Three Current Examples" ACM Digital Library Internet RFC Series D-Lib Magazine Motivated by realization that early preservation work may be tackling the wrong problem
6
6 Publisher's role in preservation Life cycle of electronic publication 1. Active management by publisher 2. Long-term preservation by another organization Overall observation The length of #1 may be very short or hundreds of years The most vulnerable time is the transition between #1 and #2 Preservation discussions have emphasized #2 (e.g., 5 level model)
7
7 ACM Digital Library Organizational ACM is a stable organization that considers the Digital Library one of its principal assets Rights ACM either owns copyright or has full preservation rights Technical Complex: relational database (schema), SGML (DTD), rendering software, private metadata system Strong computing department Replication No independent mirrors
8
8 Internet RFC Series Organizational Complex relationship between Internet Society (ISCO), Internet Engineering Task Force (IETF) and RFC editor. Currently actively managed, but no long-term commitment Secretariat & RFC editor -- income from meetings & grants Rights ISOC and IETF have very broad rights Technical Simple: text only (a few PostScript) Replication Several independent mirrors
9
9 D-Lib Magazine Organizational Published by CNRI, reliant on grants. Rights Authors own rights in articles. CNRI owns rights in other materials. Technical Simple: uses basic web technology. Used for experiments in DOIs, XML metadata, etc. Replication Several independent mirrors
10
10 Approaches to preservation of the web Partnership with publishers Publishers and libraries as partners Selective collection of open access web Librarianship in a new domain Bulk collection of open access web Automatic librarianship
11
11 Partnerships with publishers Library of Congress and UMI US theses and dissertations American Physical Society and Cornell University Journals in physics Elsevier Science Policy statement on archiving
12
12 Partnership with publishers Publishers and libraries as partners Selective collection of open access web Librarianship in a new domain Bulk collection of open access web Automatic librarianship Approaches to preservation of the web Cornell and Library of Congress
13
13 Selective preservation Selection of web sites Example: National Library of Australia national importance multiple versions (print and online) authority and research value
14
14 Selection of web sites Pragmatic considerations technical complexity -- not all standards are good frequency of making copies COST Librarianship in a new domain
15
15 Catalogs and indexes Example: CORC simple standard using Dublin Core tools for creating records COST Librarianship in a new domain
16
16 Bulk collection: automatic librarianship Volumes of information are too great for human selection, indexing and management Examples: Kulturarw 3 -- National Library of Sweden Internet Archive -- Brewster Kahle Automatic methods are used to collect, organize and provide access
17
17 Automatic librarianship Collection Example: Internet Archive Collecting open access web since 1996 Complete sweep of web approximately once a month HTML pages only 14 terabytes of data (soon all online) access for researchers using Unix tools 7 people
18
18 Automatic librarianship Indexing Examples: ResearchIndex Google
19
19 Legal issues Legal position of archives that download open access materials is unclear Preservation is in the national interest See the discussion in The Digital Dilemma (National Academy of Sciences, 1999) Crucial factor is economic impact on copyright owners Library of Congress has no special position except via copyright deposit U.S. Copyright Office offer to help clarification
20
20 Current activities Selection: guidelines and prototypes Library of Congress working group Political web sites Tools Web site mirroring Web site profiler (M.Eng. project) Copyright Ad hoc working group (Deanna Marcum, Bill Arms)
21
21 CS 502 Computing Methods for Digital Libraries THE END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.