Distributed Digital Preservation ETD Workshop Gail McMillan, Virginia Tech Martin Halbert, University of North Texas 14 th International Symposium on ETDs.

Slides:



Advertisements
Similar presentations
ETD Preservation Survey Results Gail McMillan Digital Library and Archives, Virginia Tech 11th International ETD Symposium Robert Gordon University.
Advertisements

Ensuring Long-term Access to ETDs through Distributed Digital Preservation Gail McMillan Director, Digital Library and Archives Virginia Tech Newcomers.
ETD Preservation Workshop Session Four: Collection Management for Preservation Gail McMillan, Virginia Tech.
ETD Preservation Workshop Session One: ETDs and Preservation Needs Gail McMillan, Virginia Tech.
Katherine Skinner Executive Director, Educopia Institute Program Manager, MetaArchive Cooperative An Age of Discovery, ARL-CNI Washington D.C. Friday,
The National Digital Stewardship Alliance: Community, Content, Commitment.
Distributed Digital Preservation Workshop for ETDs Gail McMillan, Virginia Tech Martin Halbert, University of North Texas Bill Donovan, Boston College.
Distributed Digital Preservation Networks Across a Region, Across a State: Stretching LOCKSS Gail McMillan, Virginia Tech Martin Halbert, Emory Aaron Trehub,
Collaborative Preservation of ETDs: The MetaArchive Cooperative and LOCKSS Gail McMillan Digital Library and Archives, Virginia Tech 1 st Canadian ETD.
Preservation Collaboration: NDLTD & MetaArchive Cooperative Gail McMillan Digital Library and Archives, Virginia Tech Newcomers’ ETDs 2010 University.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
The Alabama Digital Preservation Network (ADPNet) A statewide private LOCKSS network Aaron Trehub, Auburn University Libraries NDIIPP Partners Meeting.
MetaArchive Distributed Digital Preservation Workshop Session 3: Costs and Operational Considerations Wednesday, May 30, 2007 Robert W. Woodruff Library.
Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo.
T OWARD A C OLLABORATIVE A PPROACH TO S TAKEHOLDERS ’ I NVOLVEMENT IN ETD S C URATION Presenters: Daniel Gelaw Alemneh, Geneva Henry, & Shannon Stark L.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
MetaArchive Distributed Digital Preservation Workshop Wednesday, May 30, 2007 Robert W. Woodruff Library Emory University Atlanta, Georgia.
Tyler Walters Dean, University Libraries and Professor Virginia Tech July 18, 2013 Collaboratively Preserving Our Digital Memory.
Katherine Skinner, Executive Director, Educopia Institute Martin Halbert, Dean of Libraries, University of North Texas CNI 2010 Spring Forum, Baltimore.
ZIMBABWE UNIVERSITY LIBRARIES CONSORTIUM (ZULC) OPEN ACCESS AND CREATING A KNOWLEDGE SOCIETY CONFERENCE 24 – 26 April 2006 Crowne Plaza Monomotapa, Zimbabwe.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Robin L. Dale Director of Digital & Preservation Services LYRASIS Getting Started with the Digital Commonwealth Getting Started With the Digital Commonwealth.
Electronic Thesis and Dissertation Initiative at Indiana State University(ISU) where to start and where to go Valentine Muyumba (Chair of Cataloging and.
Growing the MetaArchive Cooperative: ETDs (electronic theses and dissertations) Gail McMillan Digital Library and Archives, Virginia Tech July 2008 NDIIPP.
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
Distributed Digital Preservation ETD Workshop 15 th International Symposium on ETDs Lima, Peru Tuesday, September 11, 2012 Dr. Martin Halbert Dr. Guillermo.
Katherine Skinner Educopia Institute and MetaArchive Cooperative Matt Schultz Educopia Institute and MetaArchive Cooperative NDIIPP Partners Meeting Arlington,
Preserving ETDs: NDLTD & MetaArchive Collaboration Gail McMillan Digital Library and Archives, Virginia Tech Newcomers’ USETDA 2012.
Katherine Skinner, Emory University Gail McMillan, Virginia Tech NDIIPP Annual Partners Meeting June 24, 2009.
1 Designing Storage Architecture for Digital Collections 2012.
Martin Halbert UNT Dean of Libraries MetaArchive President Monday, April 11, 2011 Newspaper Archive Summit University of Missouri Columbia, MO.
Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
MetaArchive Cooperative Annual Membership Meeting Welcome & Overview Dr. Martin Halbert MetaArchive Annual Membership Meeting Atlanta, Georgia Friday,
T HE M ETA A RCHIVE M ODEL : D ISTRIBUTED D IGITAL P RESERVATION N ETWORKS Dr. Martin Halbert VIVA/SCHEV LAC Meeting Christopher Newport University Trible.
Katherine Skinner, Executive Director, Educopia Institute ESOPI 2013 Chapel Hill, NC April 19, 2013.
Session 3.  Now you know WHY to make policies and WHAT they should contain…  But HOW do you implement policies?  And then HOW do you implement a program.
Growing the MetaArchive Cooperative ETDs Gail McMillan Digital Library and Archives, Virginia Tech July 2008 NDIIPP Partners Meeting.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
Martin Halbert President, MetaArchive Cooperative DigCCurr 2009 Meeting Chapel Hill, NC Friday, April 3, 2009.
February, 2006 Open Repositories, Sydney, Australia Transition to a Broader Participation: Experience from the DSpace Project MacKenzie Smith MIT Libraries.
Dr. Martin Halbert Dr. Katherine Skinner Digital Preservation: What’s Now, What’s Next. Amigos Online Conference, August 12, 2011.
The Alabama Digital Preservation Network (ADPNet) Aaron Trehub Director of Library Technology Auburn University State Council of Higher Education for Virginia.
The Alabama Digital Preservation Network (ADPNet) A statewide Private LOCKSS Network Aaron Trehub, Auburn University Libraries SAA/CoSA Joint Annual Meeting.
UK LOCKSS Alliance: Investigation into Private LOCKSS Networks Adam Rusbridge EDINA, University of Edinburgh.
ETD-db: Workflow, the Short Story Edward A. Fox and Gail McMillan Virginia Tech Newcomers’ ETD 2009 University.
Collaborative Preservation of ETDs: The MetaArchive Cooperative and LOCKSS Gail McMillan Digital Library and Archives, Virginia Tech Canadian.
MetaArchive Cooperative Annual Membership Meeting Welcome & Overview Dr. Martin Halbert, President MetaArchive Annual Membership Meeting Houston, TX Friday,
Providing the ETDs of Today for the Researchers of Tomorrow Martin Halbert, Katherine Skinner, Matt Schultz 2012 CNI Fall Membership Meeting Washington,
Katherine Skinner, Educopia Institute Emily Gore, Clemson University U.S. Workshop on Roadmap for Digital Preservation Interoperability Framework NIST,
Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation.
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
Distributed Digital Preservation Workshop for ETDs Gail McMillan, Virginia Tech Martin Halbert, Emory University Bill Donovan, Boston College MetaArchive.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
Distributed Digital Preservation Networks Across a Region, Across a State: Stretching LOCKSS Gail McMillan, Virginia Tech Martin Halbert, Emory Aaron Trehub,
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Custodians of Culture, Architects of Archives  Martin Halbert (Emory Univ., MetaArchive Cooperative) - Facilitator  Thib Guicherd ‐ Callin (Stanford.
LOCKSS at Georgia Tech Patricia E. Kenly April 2007.
Collection Description considerations in the nof-digitise programme Sarah Mitchell Programme Manager New Opportunities Fund.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
The National Digital Stewardship Alliance: Stewardship, Collaboration, Inclusiveness, Exchange.
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
Digital Preservation MetaArchive Cooperative, Digital Preservation Policy Planning Workshop Boston College, Boston, MA October 26, 2010.
Gail McMillan (Director, Digital Library and Archives, VA Tech) Martin Halbert (President, MetaArchive Cooperative) ETD 2009 Meeting Pittsburgh, PA Thursday,
Beyond Technology: Creating and Sustaining the MetaArchive Cooperative Joint Annual Meeting, Society of American Archivists & the Council of State Archivists.
Distributed Digital Preservation Workshop for ETDs Gail McMillan, Virginia Tech Martin Halbert, Emory University Bill Donovan, Boston College MetaArchive.
ETD Preservation Survey Results
Gail McMillan Digital Library and Archives, Virginia Tech
The MetaArchive Model: Distributed Digital Preservation Networks
Presentation transcript:

Distributed Digital Preservation ETD Workshop Gail McMillan, Virginia Tech Martin Halbert, University of North Texas 14 th International Symposium on ETDs University of Cape Town South Africa Friday, September 16, 2011

Instructors  Gail McMillan  Director, Digital Library and Archives, University Libraries, Virginia Tech  Martin Halbert  Dean of Libraries, University of North Texas  President, MetaArchive Cooperative 29/16/2011DDP Workshop for ETDs

Attendees  Please state your name and institution.  Does your university currently accept ETDs? Alternatively, are you considering an ETD program?  What brings you in this morning?  What sorts of institutional repository solutions are represented?  What do you hope to get out of this workshop? 39/16/2011DDP Workshop for ETDs

Pre-Registered Attendees  15 Universities  2 National libraries  5 Organizations  ? Others  19 Africa  Botswana, Ethiopia, Ghana, South Africa, Uganda  1 Europe: Germany  1 South America: Peru  3 USA: Nebraska, Texas, Virginia 49/16/2011DDP Workshop for ETDs

Agenda 1:30 – 1:45Welcome, Introductions, Overview of Workshop 1:45 – 2:00 ETDs and Preservation Needs 2:00 – 2:30 MetaArchive and Distributed Preservation 2:30 – 2:45NDLTD/MetaArchive ETD DDPN Archive 2:45 – 3:00 Break 3:00 – 3:30 Collections Management for Preservation 3:30 – 4:00 MetaArchive and its Member Roles and Responsibilities 4:00 – 4:30 ETD Lifecycle Management Project 4:30 – 5:00Questions and Answers 59/16/2011DDP Workshop for ETDs

ETDs and Preservation Needs Prof. Gail McMillan Director, Digital Library and Archives, Virginia Tech Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

What is Digital Preservation?  Systematic management of digital works over an indefinite period of time  Processes and activities ensure continued access to works in digital formats  Ongoing attention—constant resources: effort, time, money  Technological and organizational change are obstacles for preserving beyond a few years. 79/16/2011DDP Workshop for ETDs

Preservation is more than Back-ups  Back-ups address short-term problems with minimal investment  Copies for restoration after data loss event  Stored nearby in a single location  Long-term, error-free storage  Ongoing investment  Dispersed secure caches = DDPN  Distributed Digital Preservation Network 9/16/2011DDP Workshop for ETDs8

NDLTD Preservation Strategy: MetaArchive Cooperative  MetaArchive is a Private LOCKSS Network (PLN)  Programmatically and securely  Harvests ETDs from partner repositories  Distributes ETDs to only partners’ servers  Regularly audits and repairs files as needed  ETD Preservation Network is a Dark Archive. 99/16/2011DDP Workshop for ETDs

NDLTD/MetaArchive ETD Participants 1. Auburn University 2. Boston College 3. Consorci de Biblioteques Universitatries de Catalunya 4. Florida State University 5. Georgia Tech 6. Indiana State University 7. Poltifícia Universidade Catolica Rio de Janerio 8. Rice University 9. University of Louisville 10. Virginia Tech 9/16/2011DDP Workshop for ETDs10

ETD Preservation Survey  Purpose: Gauge academic community’s interest in an ETD-specific archive  6 academic listservs  14 multiple-choice, short answer questions  Dec. 13, April 10, 2008  96 institutions responded 119/16/2011DDP Workshop for ETDs

129/16/2011DDP Workshop for ETDs

ETD File Formats MetaArchive’s file formats  85% PDF  30% JPG  27% WAV  24% GIF  23% HTML, MOV  21% AVI, MP3 139/16/2011DDP Workshop for ETDs

ETD Collections are hosted by  26%DSpace  13% ETD_db  3%Fedora  1%Eprints  29%Locally developed systems  29% Others 149/16/2011DDP Workshop for ETDs

Structure of ETD Collections  25%Subject-like categories  21% Everything-in-one  21%Year  9%Accessibility  7%Degree It’s best to group ETDs into discrete and finite units such as annual cumulations. 159/16/2011DDP Workshop for ETDs

169/16/2011DDP Workshop for ETDs

MetaArchive Cooperative  DDPN: 2004 –  Separate preservation from access  LOCKSS w/out public access  Bit-level  Sustained by fees (membership, consulting), grants  Library of Congress (NDIIPP) awards,  NHPRC, NEH, IMLS  Nonprofit corporation: charter, membership agreement  Cooperative, not a vendor  Training and model for others

NDLTD Preservation Strategy  NDLTD and MetaArchive Cooperative  Help higher education institutions provide long-term access to ETDs  Institutions can achieve this goal by becoming part of the ETD Preservation Network.  Participate in an NDLTD MetaArchive Preservation Network Workshop  Join: NDLTD and MetaArchive 189/16/2011DDP Workshop for ETDs

199/16/2011DDP Workshop for ETDs

9/16/2011DDP Workshop for ETDs20

Dr. Martin Halbert Dean, UNT Libraries & President, MetaArchive Cooperative Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

 What is the MetaArchive Cooperative? Why did we form it?  What is distributed digital preservation? Why is it important for ETD preservation?  What is LOCKSS? How does MetaArchive use the LOCKSS software? 229/16/2011DDP Workshop for ETDs

 Planning meetings by librarians and archivists in on concerns about preserving digital archives  Sense that we needed to do something practical to help each other preserve our data  Not based on studies, just the observation of our anxieties about keeping our (expensive) digital materials preserved and viable. 239/16/2011DDP Workshop for ETDs

From NDIIPP Website on the Importance of Digital preservation ( 249/16/2011DDP Workshop for ETDs

 66% of cultural heritage institutions (academic libraries, archives, art museums, public libraries, and other similar kinds of institutions) report that no one is responsible for digital preservation activities  30% of all archives have been backed up one time or not at all Source: 2005 NEDCC Survey by Bishoff and Clareson 259/16/2011DDP Workshop for ETDs

“The increased number and diversity of those concerned with digital preservation—coupled with the current general scarcity of resources for preservation infrastructure—suggests that new collaborative relationships that cross institutional and sector boundaries could provide important and promising ways to deal with the data preservation challenge. These collaborations could potentially help spread the burden of preservation, create economies of scale needed to support it, and mitigate the risks of data loss.” - The Need for Formalized Trust in Digital Repository Collaborative Infrastructure NSF/JISC Repositories Workshop (April 16, 2007) 269/16/2011DDP Workshop for ETDs

What differentiates a schedule for data backups from a digital preservation program?  Backups are tactical measures. Backups are typically stored in a single location (often nearby or collocated with the servers backed up) and are performed only periodically. Backups are designed to address short-term data loss via minimal investment of money and staff time resources. Backups are better than nothing, but not a comprehensive solution to the problem of preserving information over time.  Digital preservation is strategic. Preserving information over long periods requires systematic attention rather than benign neglect or unthinking actions. 279/16/2011DDP Workshop for ETDs

What differentiates an IR program from a distributed digital preservation program?  The IR is not distributed. The IR is a centralized approach aimed at managing information flow within the institution. It typically does not attempt to securely cache prioritized content at multiple geographically dispersed sites.  DDP mobilizes efforts of multiple institutions. A digital preservation program entails a geographically dispersed set of secure caches of critical information. A true digital preservation program will require multi-institutional collaboration and at least some ongoing investment to realistically address the issues involved in preserving information over time. 289/16/2011DDP Workshop for ETDs

Why are the characteristics of geographically distribution and security so important? This strategy maximizes survivability of content in both individual and collective terms:  Security reduces the likelihood that any single cache will be compromised.  Distribution reduces the likelihood that the loss of any single cache will lead to a loss of the preserved content. By creating a collaborative network for secure and distributed preservation, a group can also work together on more complex issues such as format migration. 299/16/2011DDP Workshop for ETDs

 A single cultural heritage organization is unlikely to have the capability to operate several geographically dispersed and securely maintained servers  Collaboration between institutions on technological solutions is essential  Similarly, inter-institutional agreements must be put in place or there will be no commitment to act in concert over time 309/16/2011DDP Workshop for ETDs

Lessons from the NDIIPP Archive Ingest and Handling Test (AIHT) and other shared archiving experiments: Encounter many unexpected incompatibilities because of different systems and data packaging Realization that much of the cost in preserving digital material is in coordinating the organizational and institutional imperatives of preservation, and not the technological costs of storage space 319/16/2011DDP Workshop for ETDs

A distributed digital preservation cooperative for digital archives  Established under the auspices of and with funding from the National Digital Information and Infrastructure Preservation Program (NDIIPP) of the Library of Congress  A functioning DDP network and cooperative for libraries and other cultural memory organizations  Sustained by cooperative fee memberships, LC contracts, and other sponsored funding  Provides training and models for other groups to establish similar distributed digital preservation networks  Fosters broader awareness of digital preservation issues 329/16/2011DDP Workshop for ETDs

MetaArchive Cooperative 9/16/2011DDP Workshop for ETDs33  A distributed digital preservation cooperative for digital archives, based on LOCKSS  286 TB network with 24 secure caches  Preserving collections for/with 18 members and 46 institutions in 4 countries  Actively growing (outreach campaign in progress, aim to double membership)  Provide preservation consulting and training

LOCKSS E-Journal Preservation Network Software  Developed at Stanford University by Vicky Reich and David Rosenthal  Enables libraries to preserve subscribed electronic journal content  Used by hundreds of libraries worldwide  MetaArchive adapted this software for preserving digital archives 9/16/2011DDP Workshop for ETDs34

 Format agnostic  Collections include:  Images  Text files  Multimedia files  Datasets  Program executables 359/16/2011DDP Workshop for ETDs

36 9/16/2011DDP Workshop for ETDs MetaArchive 46 institutions 12 states/districts 4 countries MetaArchive 46 institutions 12 states/districts 4 countries

 Began hosting workshops in distributed digital preservation strategies in 2007  Instruct new MetaArchive members in processes  Advise other groups considering DDP approaches  Assisted in creation of two additional DDPNs  Alabama – state digitization projects  Arizona – state government records 379/16/2011DDP Workshop for ETDs

 Conspectus Database (Original) Curators enter collection level entries for collections Meant to be used for cooperative prioritization in DDP selection and decision-making activities Not interactive with some key MetaArchive systems (Cache Manager, Ingest Plugins)  Second Generation Conspectus Database Integrates operation of all network functions Designed in concert with guidance from other private LOCKSS networks (PLNs) in ways that enable re-use 38 9/16/2011DDP Workshop for ETDs

 Developed a new cooperative with guidance from both legal team, librarians, and intellectual property specialists  Created core organizational documents in 2006: charter, membership agreement, papers of incorporation, business plans, etc.  Allows members to understand their commitment and liability clearly 39 9/16/2011DDP Workshop for ETDs

 Southern Digital Culture (initial collecting area, founding members were Southeastern)  Transatlantic Slave Trade Historical Data (made cooperative international)  Electronic Theses and Dissertations (inter- consortia strategic alliance with NDLTD)  Early Modern Literature (broad area, with Folger Shakespeare Library as cornerstone)  Additional archives regularly added 40 9/16/2011DDP Workshop for ETDs

 LOCKSS (collaborative development of LOCKSS Cache Manager)  Data-PASS Alliance (developing in-common standard and tools for Private LOCKSS Network (PLN) interoperation)  ECHO DEPository Project (PLN interoperation standard using HandS)  SDSC Chronopolis (PLN/ SRB interoperation testing and bridges) 41 9/16/2011DDP Workshop for ETDs

 Preservation Members are organizations responsible for the ongoing activity of preserving digital content. At a minimum, every preservation site must include responsible staff and a node server of the relevant preservation network. Preservation sites collectively comprise a preservation network.  Sustaining Members are preservation sites that wish to participate as leaders of the cooperative, and serve on the Steering Committee  Collaborative Members act as preservation sites for groups of institutions. 429/16/2011DDP Workshop for ETDs

MetaArchive Home Page 9/16/2011DDP Workshop for ETDs43

Dr. Martin Halbert Dean, UNT Libraries & President, MetaArchive Cooperative Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

 Overview of the NDLTD/MetaArchive ETD Program  MetaArchive Member Strategies for Preserving ETDs  Considerations for Prospective ETD Preservation Sites 459/16/2011DDP Workshop for ETDs

Overview of the NDLTD/MetaArchive ETD Program  Started in 2008 with the establishment of a partnership between MetaArchive and NDLTD  Project allowed us to begin studying the genre- specific preservation issues that arise with ETD collections  Initial partners: Virginia Tech, Boston College, Georgia Tech, Rice U, Emory U, and Auburn U.  Highly successful--preserving ETDs for most of our members, including a consortia of 20 members in Barcelona, Spain. 9/16/2011DDP Workshop for ETDs46

MetaArchive Member Work on ETD Preservation  Studied the "calf path" issues that arise in ETD programs  Analyzed a range of ETD repository structures and developed exchange mechanisms between those and LOCKSS (CONTENTdm, ETD-db, DSpace)  Provided simple addition mechanisms so that as new and embargoed ETDs are added, members are able to easily add them to the archive  Developed mechanisms to version content, so that if ETDs are changed/replaced, reflected in preservation copies  Determined the need for documented best practices for ETD preservation readiness (IMLS project) 9/16/2011DDP Workshop for ETDs47

Considerations for Prospective ETD Preservation Sites  Partnership between college and libraries has to be established with particular roles and responsibilities.  Metadata, metadata, metadata! Many programs have their students assigning this using either DC or non-standard metadata formats due to CAS involvement and ownership.  Folder and file structure in which ETD collections are stored matters greatly, especially since preservation will be ongoing--need to submit new files each semester for preservation, and that's easiest if the storage structure allows for this. Grouping by year may be helpful as a start. 9/16/2011DDP Workshop for ETDs48

Considerations for Prospective ETD Preservation Sites (cont.)  Issues of rights management and embargo must be managed well. Dark archiving for everything? for portions? Most institutions need that dark element for their preservation work in ETD collections.  The partnership you choose matters greatly. Look at the differences between what you can do with content digitized by a vendor vs. doing it yourself.  Similar issues arise in preservation--what are your rights to your own content? Are there unnecessary charges or restrictions? What happens if you want to move to a new solution? Taking an active role (like members in MetaArchive do) helps to ensure you are driving the solution, not being driven by the vendor constraints 9/16/2011DDP Workshop for ETDs49

ETD Collections Management for Preservation Readiness Prof. Gail McMillan Director, Digital Library and Archives, Virginia Tech Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

Best Practices: Directory Names Unique, standardized, uniform, easy to decipher: Timestamp  etd-mmddyyyy-tttttt   ETD submission began on Oct. 2, 2007at 2:48:64 pm  Use same naming convention for scanned and born-digital TDs 519/16/2011DDP Workshop for ETDs

Best Practices: File Names  etd.pdf NOT  Identical file names can work when directory names are unique  May not be good for local management  Lastname_initials_doctype_year.format  McMillanGM_T_1981.pdf  SoundararajanS_D_2010.pdf  SoundararajanS_D_2010_copyright.pdf 529/16/2011DDP Workshop for ETDs

Best Practices: Archival Units  Discreet unchanging groupings  Periodic ingest into preservation caches  etd …  Not too big and not too small  >20 GB  Divide directories into subunits  etd …-etd …. 539/16/2011DDP Workshop for ETDs

Best Practices: Triage for ETDs  Recognize there is a problem.  Stop poor practices.  Isolate the problem files.  How? Data wrangling  Define direct path for ingest into the network  Everything that does not follow definition becomes one group—an outlier Archival Unit 549/16/2011DDP Workshop for ETDs

Best Practices: Web Accessible  Keep ETDs on servers--live, spinning discs  Not on CDs or other static storage devices  Avoids problems: locating discs, loading them onto servers, rectifying errors, failed media  Declining cost of online storage  $1/GB/year 559/16/2011DDP Workshop for ETDs

Best Practices: Web Accessible  Public  NDLTD preservation partners only  Add IPs to server’s firewall to enable access  Restricted and Withheld/Embargoed ETDs  Permission  “LOCKSS system has permission to collect, preserve, and serve this Archival Unit” 569/16/2011DDP Workshop for ETDs

Best Practices: Metadata Discipline  Describe institution’s individual ETDs  ETD MS: ETD Metadata Standard  MARC: MAchine Readable Cataloging  Describe institution’s ETD collection  MetaArchive Conspectus Database 579/16/2011DDP Workshop for ETDs

MetaArchive Conspectus Database: Collection-level Metadata (1)  Title  Describe the collection  Subjects, key words/phrases  Uniform Resource Identifier: usually a locator (URL) or name (URN) 589/16/2011DDP Workshop for ETDs

MetaArchive Conspectus Database: Collection-level Metadata (2)  Formatting, size, language(s)  Formats ▪ image: jpg ▪ text: pdf ▪ video: mpeg  Language(s) of the content  Type of content ▪ Text, sound, datasets, software, animation, etc.  Extent: size or duration of the entire collection 599/16/2011DDP Workshop for ETDs

MetaArchive Conspectus Database: Accrual Information  Anticipate growth of the ETD collection  Accrual Periodicity ▪ How frequently will items will be ? ▪ Yearly? Twice-yearly?  Accrual Policy ▪ How is it decided to add items to the collection? ▪ Every approved ETD? Except embargoed ETDs? 609/16/2011DDP Workshop for ETDs

MetaArchive Conspectus Database: Rights and Ownership  Institution hosting ETD collection  Publisher: entity responsible for making the ETDs available  Rights: statement about who owns the copyright  Access Rights UnrestrictedRestrictedEmbargoed  Custodial History: provenance 619/16/2011DDP Workshop for ETDs

MetaArchive Conspectus Database: Harvesting Information  Details the web crawl to gather files into the ETD Dark Archive in the Preservation Network  Harvest Procedure: Web crawl or OAI harvest  Identifier: URI or URL  Extra Parameters: Archival Units, e.g., year = 2007  LOCKSS Manifest Page: permission to preserve  OAI provider 629/16/2011DDP Workshop for ETDs

ETD Management for Preservation 1. Live storage media 2. Standardize file, directory structures 3. Metadata discipline 4. Preservation viability, recovery program 9/16/2011DDP Workshop for ETDs63

Dr. Martin Halbert Dean, UNT Libraries & President, MetaArchive Cooperative Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

 MetaArchive Charter and Membership Agreement  Three types of membership that that are available  Associated fees and responsibilities 659/16/2011DDP Workshop for ETDs

 Charter is a formative agreement that lays out the conceptual roles and responsibilities of participants  Membership agreement is between new members and MetaArchive’s administrative nonprofit corporation  Agreement to preserve content for specified period  Pledge to not intentionally harm the network 669/16/2011DDP Workshop for ETDs

 Program Managers are leaders that accept responsibility for coordinating the activities of a digital preservation network.  Data Wranglers are programmers and other technically adept workers that prepare local digital archives for ingestion into a preservation network.  System Administrators are staff members that maintain individual preservation node servers of the relevant preservation network.  IR/ETD Program Managers are staff that are knowledgeable about ETD collection structure. 679/16/2011DDP Workshop for ETDs

 A “Plugin” is written for collections selected for preservation  Plugins are programs describing rules and structure for the “archival unit”  Either local staff or MetaArchive staff write these plugins and install them in the network  At least 6 dispersed sites are selected for repositing the archival unit  Caching process begins, with updates following if necessary 689/16/2011DDP Workshop for ETDs

 Be able to bring up and maintain a Linux server over time  Task local staff with both program management and systems administration duties, and preferably data wrangling as well  Contribute content and monitor system functioning occasionally  Sign membership agreement and pay membership dues 699/16/2011DDP Workshop for ETDs

 Sustaining Members: contribute the most and receive the most in terms of control and leadership (Steering Committee is comprised of representatives from Sustaining Members)  Preservation Members: participants and beneficiaries, rather than leaders of the Cooperative  Collaborative Members: groups of institutions that act as one unified member because they share a central server, allowing existing digital collaboratives to preserve their co- hosted content for a fraction of what it would cost to do so as individual members 709/16/2011DDP Workshop for ETDs

 Three Membership Levels: 1. Preservation Members ($3,000/year): Ability to reposit content in the shared network infrastructure 2. Sustaining Site Members ($5500/year): Above, plus seat on the Steering Committee and participation in directing the cooperative 3. Collaborative Members (contact MetaArchive): Requires group negotiated membership  All members are obligated to provide and operate a minimal server on the network and accept at least as much content from others as they themselves reposit into the network  Membership commitment is in three year increments  Membership fees are reduced for members joining both NDLTD and MetaArchive simultaneously 719/16/2011DDP Workshop for ETDs

Dr. Martin Halbert Dean, UNT Libraries & President, MetaArchive Cooperative Distributed Digital Preservation for ETDs Workshop Cape Town, South Africa Friday, September 16, 2011

Reasons for this project  Universities have been steadily transitioning from traditional paper/microfilm to digital ETD submission, dissemination, and preservation processes.  While this move from print-based to digital-based theses and dissertations greatly enhances the accessibility and sharing of graduate student research, it also raises grave concerns about the potential ephemerality of these digital resources. 9/16/2011DDP Workshop for ETDs73

Reasons for this project (cont.)  The intended audience for this project includes academic libraries currently managing or prospectively considering programs for ETD preservation.  How will institutions ensure that the electronic theses and dissertations they acquire from students today will be available to future researchers?  We need to better understand, document, and address the preservation challenges presented by ETDs to ensure that colleges and universities have the requisite knowledge to properly curate these new collections. 9/16/2011DDP Workshop for ETDs74

ETD Lifecyle Management Project Goals and Products  Dissemination of Guidance Documents for Lifecycle Management of ETDs: Based on collaborative research between NDLTD and MetaArchive on how to best manage the lifecycle of ETDs  Production of ETD Lifecycle Management Tools: modular micro- services that can be used alone or incorporated into larger repository systems to address targeted needs in managing ETDs throughout their lifecycle  Creation of Educational Materials and Associated Workshop: will be made freely available and utilized in a workshop that will be offered in the second year of the project. Materials will include curriculum syllabi, training handouts, PowerPoint presentations, exercises, and other relevant items 9/16/2011DDP Workshop for ETDs75

Project Partners 1. University of North Texas Libraries 2. Networked Digital Library of Theses and Dissertations (NDLTD) 3. Educopia Institute and MetaArchive Cooperative 4. Virginia Tech Library 5. Rice University Library 6. Boston College Library 7. Indiana State University Library 8. Pennsylvania State University Library 9. University of Arizona Library 9/16/2011DDP Workshop for ETDs76

Project Timeline  Project will take place over a two year period from October 2011 to September 2013  Workshop will be held in February 2013 at the Texas ETD Association conference  Project will create two public websites (maintained by MetaArchive and NDLTD) to disseminate the documents and micro- services produced 9/16/2011DDP Workshop for ETDs77

Contact information:  Gail McMillan  Martin Halbert ( )  Katherine Skinner 789/16/2011DDP Workshop for ETDs