HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

1 of 16 Information Access The External Information Providers © FAO 2005 IMARK Investing in Information for Development Information Access The External.
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
Digital Preservation and Trusted Digital Repositories Priscilla Caplan Florida Center for Library Automation ALA 2005 Chicago IL.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Electronic Theses and Dissertations: Benefits, Issues, and the University of Waterloo Approach
DNAGENOMICS  RNAFUNCTIONAL GENOMICS  PROTEIN PROTEOMICS  STRUCTUREFUNCTIONAL PROTEOMICS.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Portico A New Electronic Journal Archiving Service Toni Tracy Director, Publisher Relations 2006 Ingenta Publisher Forum June 6, 2006.
The Problem: An Introduction to Preservation, Trust and Continuing Access for e-Journals Neil Beagrie Charles Beagrie Ltd With thanks to Randy Kiefer (CLOCKSS)
NATIONAL LIBRARY OF MEDICINE NLM Journal Archiving and Interchange Tagset Jeff Beck National Center for Biotechnology Information National Library of Medicine.
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM Journal Archiving Vocabulary.
MacKenzie Smith Associate Director for Technology MIT Libraries.
Contents and Formats Existing Digital Sources Gertraud Griepke Cornell University, July 26th 2002.
The KnowledgeBank: Powered by DSpace Laura Tull Systems Librarian Ohio State University Libraries WiLSWorld July 27, 2004.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
E-journal Publishing Strategies at Pitt Timothy S. Deliyannides Director, Office of Scholarly Communication and Publishing and Head, Information Technology.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Berkeley Electronic Press (bepress). Bepress history Started 10 years ago by University of California at Berkeley faculty to publish scholarly journals.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Portico: A New Electronic Archiving Service Bruce Heterick Director, Library Relations.
Portico An Electronic Archiving Service Eileen Fenton Executive Director, Portico What Works In Archiving? Society for Scholarly Publishing November 15,
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
1 CS 502: Computing Methods for Digital Libraries Lecture 28 Current work in preservation.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
A CIDOC CRM – compatible metadata model for digital preservation
MELLON E-JOURNAL ARCHIVING PROJECT January20, 2002.
1 Reference Linking in Project Euclid …with some thoughts on the preservation of digital collections. A presentation at the Workshop on Linking and searching.
S YCAMORE S CHOLARS ISU Institutional Repository.
Johnson Museum Online 15,800 works on paper 6,700 objects in Asian collection high resolution, medium resolution, and thumbnail Luna.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
Portico An Electronic Archiving Service Ken DiFiore, MLS Associate Director of Library Relations, Portico Orbis-Cascade October 6, 2006.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Digital library infrastructure -- systems Repositories for storing digital resources protect, manage, deliver, and preserve digital resources over time.
UKSG 2006 Archiving scholarly material Gordon Tibbitts President Blackwell Publishing, Inc. April 4, 2006.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Electronic Archiving & JSTOR Kevin Guthrie e-icolc, Thessanoliki, Greece October
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004.
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Building a Framework to Support Scholarly Journal Publishing at the University of Pittsburgh Vanessa Gabler Electronic Publications Associate, Office of.
From Access to Archive Transforming Scholars Portal into an E-Journal Archive.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Greater Visibility, Greater Access QSpace QSpace Queen’s University Research & Learning Repository.
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Digital Commons digitalcommons.unl.edu. Digital Commons is: an “institutional repository” (IR) a resource for scholarly communication an opportunity for.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Digitalcommons.unl.edu Archiving Department Records.
Publishing from the Library: New Roles for Libraries in Scholarly Communications David Ruddy Cornell University Library September, 2004.

Building A Repository for Digital Objects
OceanDocs Digital Repository of Marine Science Research Outputs
Digital Preservation and Scholarly Publications E-Journals & ePrints
Presentation transcript:

HARVARD E-JOURNAL ARCHIVING STUDY Dale Flecker June, 2002

JOURNAL ARCHIVING IN THE PAPER ERA Large-scale redundancy Access copy and archival copy usually the same Not just storage, but preservation –includes environmental control, library binding, repair, reformatting... Deliberate, long-term archiving largely the role of national and research libraries

E-JOURNAL MODEL IS DIFFERENT “Copies” are remote, held in publisher systems –Not replicated across different institutions Perpetual license provides limited comfort in the absence of independent copies Long-term preservation involves very different issues than day-to-day access

E-JOURNAL ARCHIVING A GROWING PROBLEM Libraries bearing double costs –the e-journals users prefer –the paper for preservation Publishers cannot convert totally to digital –authors and editors distrust e-only journals because of concerns about persistence –libraries demand paper for preservation Libraries preserving paper version, but electronic more complete, increasingly the copy of record

MELLON E-JOURNAL ARCHIVING PROGRAM 13 institutions invited to submit proposals for a planning projects Two approaches –Large-scale distributed replication (LOCKSS) –Centralized archives serving a wider community

CENTRAL ARCHIVES PLANNING PROJECTS Publisher-based –Harvard (Wiley, Blackwell, University of Chicago Press) –Penn (Oxford and Cambridge University Presses) –Yale (Elsevier) Discipline-based –Cornell (agriculture), –NYPL (performing arts) Dynamic e-journals –MIT

FOUR BASIC ASSUMPTIONS Archive should be independent of publishers –responsibility of institutions for whom archiving is a core mission Archiving requires active publisher partnership Address long timeframes (100 years?) Archive design based on Open Archival Information System (OAIS) model

CENTRAL ARCHIVE MODEL Archive negotiates relationship with publisher Publisher deposits content regularly Content accompanied by metadata to support discovery and preservation Archived content only accessible under specific conditions Archive assumes responsibility for long- term preservation

SOME INTERESTING QUESTIONS What is archived? In what format? When is archive accessible? Who can access archived content? What does the archive “preserve”? Who does archiving? How is the archive paid for? How is the archive governed?

WHAT CONTENT IS ARCHIVED? E-journals not simply articles….

SOME COMMON STUFF Journal description Editorial board Instructions to authors Rights and usage terms Copyright statement Ordering information Reprint information Indexes Career information News Events lists Discussion fora Editorials Errata Reviewers Conference announcements

HARD AREAS Masthead, “front matter” stored as web pages, not in content management systems No control over the format of “associated materials” (datasets, images, tables, etc.) Advertising very complex –dynamic, frequently from third party, can involve country-specific complexities Links frequently separate from articles –regularly updated, sometimes dynamic

OUR INCLINATION Exclude little except advertisements –based on discussions with librarians and scholars –different from most “local loading” Articles include supplementary materials Include an “issue object” in addition to the article components – masthead, news, jobs, meetings, etc

Format for archived articles?

PDF? PDF almost universally available from publishers –and the only format available for some journals There are qualms... –proprietary –marked-up for display, not meaning –supports limited functionality –long-term “preservability” unclear –unlikely to remain the universal format over time

MARKED-UP TEXT? SGML/XML increasingly common –and likely to become more so Greater functionality, easier migration as technology changes Complex –DTDs vary widely from publisher to publisher –DTDs far from stable –archive documentation and rendering would be complex

“INTERCHANGE” ARTICLE DTD Intended for exchanging content between independent players Reduces complexity of interaction –archive needs to document, migrate, and display only one format archive can choose whether to maintain articles in interchange DTD, or transform at ingest for long- term storage –publisher needs deposit only one format for all archives

“INTERCHANGE” ARTICLE DTD Mellon, Harvard, National Library of Medicine, 2 consultants (Inera, Mulberry) working on draft standard DTD Design based on current publisher practice –must be easy for publishers to produce –homogenizes many elements –leaves options in some difficult areas –eliminates elements specific to individual publisher delivery systems

INTERCHANGE DTD ISSUES How low is the common denominator? What gets lost? –inevitably sacrifices some functionality and original appearance Transformation from publisher’s “native” DTD involves risks Some technically difficult areas –extended character sets, mathematical and chemical formulae, tables. “generated text”

SGML/XML QUALITY CONTROL PROBLEM SGML/XML is an output rather than the input for many publishers today –may not fully reflect the output (PDF, print) that users see day-to-day…how do you know it is good? If SGML/XML is transformed for deposit, errors can introduced Quality control of ingested content is expensive but critical for a sound archive

ARCHIVE MORE THAN ONE FORMAT? Publisher-based archive must accept PDF in any case (only format available for some titles) –so include both SGML and PDF when available? belt and suspenders Accept publisher’s original SGML also? –preserve information lost in conversion to interchange DTD –maintenance over time problematic

WHEN IS ARCHIVE ACCESSIBLE? Most publishers instinctively prefer “dark” archives –does not compete with publisher’s service If “dark”, what “trigger events” make it accessible? –after a given period of time (‘moving wall”)? –when content is not otherwise accessible (“failsafe”)? –only when content enters the public domain?

IS “DARK” DANGEROUS? If content is dark, how do you know it is still good? (real users are the best auditors)

WHO CAN ACCESS ARCHIVE CONTENT? Just other subscribing institutions? –does the archive need to maintain complex records of license rights? defining licensees a nightmare tracking license changes over time another nightmare Individual subscribers? –an even greater nightmare Everybody? –dramatically easier to administer

WHAT DOES THE ARCHIVE PRESERVE? Preservation is a format-by-format issue –and most e-journals are composed of many formats How much “look and feel” preserved? Just preserve the “core intellectual content”? Does archive insure content remains “render-able” as technology changes?

HARVARD’S DIGITAL REPOSITORY Repository specifies preferred (“normative”) formats, which will be kept useable Just maintain bits for others –for e-journals this is likely for many “associated materials” (datasets, models, etc.) generally accepted in ANY format maintaining the viability of such wildly heterogeneous materials unrealistic –keep unaltered for future “digital archeology”

WHO DOES ARCHIVING? “Common good” activity –model based on a few archives serving many subscribers Is this an appropriate role for individual universities? –research libraries have technical capability, relationships with publishers and subscribers –BUT how archiving would be paid for is central…...

HOW IS THE ARCHIVE PAID FOR? First question: who benefits? –publishers, libraries, authors, scholarly societies… –is there a way to share costs? Cost categories include –preparation of “archivable” objects –ingestion and quality control –long-term storage –preservation

PROPOSED MODEL Publisher assumes cost of preparing objects in standard format (whenever possible) Deposited material accompanied by two part fee from publisher –ingest fee to cover up-front costs varies with publisher effort to create easily archived objects??? –“dowry” to create maintenance endowment Real funding sources include subscribers, authors, societies

HOW IS THE ARCHIVE GOVERNED? * Publishers hand their its intellectual property to independent party -- do they have a continuing say? * Are there other stakeholders who should also have a say?

HARVARD’S MODEL ARCHIVE Accept content for all titles a publisher produces –archive as many journal elements as possible Maintain an archive serving the entire community Store and maintain more robust formats (e. g., XML) when possible Collect metadata to support administration and preservation

HARVARD’S MODEL ARCHIVE Requires only a few archival copies of any given journal Archive assumes responsibility for preservation migration when canonical versions deposited Organizational and economic model difficult

NEXT? Over to Kevin….