The Cost of Archiving: The AILLA Perspective Susan Smythe Kung, PhD 3rd INNET Conference “Costing and sustainable.

Slides:



Advertisements
Similar presentations
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Advertisements

Endangered Languages and Web-Based Archiving Megan J. Crowhurst The University of Texas at Austin & CELP Contributors: Chris Beier, Heidi Johnson, Lev.
Paperless & E-Learning Environments SCHOOL DIGITAL LIBRARY Development, Configuration, maintenance, CD ROM Publishing. E-LEARNING DRIVEN WEBSITES ENVIRONMENTS.
                      Digital Audio 1.
Lesson 01: The Digital Experience  Transition from traditional devices to multipurpose digital devices. Wired phones move to cell phones and now smart.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
 One page only  12-point type  Be consistent in using one citation format  Do Not list sources that are not referred to in the Research Plan  Must.
DIGITIZATION OF AUDIOVISUAL COLLECTIONS: EMPOWERING PUBLIC LIBRARIES THROUGH THE PUBLIC-PRIVATE PARTNERSHIPS Bogdan Trifunović Digital Projects Librarian.
Your Interactive Guide to the Digital World Discovering Computers 2012.
Digitization Projects: Internal Development vs. Outsourcing Production or D.I.Y. vs. The Pros.
Discovering Computers Fundamentals, 2012 Edition Your Interactive Guide to the Digital World.
Living in a Digital World Discovering Computers 2011.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
Professor Michael J. Losacco CIS 1110 – Using Computers Application Software Chapter 3.
Application Software.  Topics Covered:  Software Categories  Desktop vs. Mobile Software  Installed vs. Web-Based Software.
AS Level ICT Capturing data. Data can be captured (or collected) for processing from a variety of sources. These sources can include: –Data capture forms,
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
Rethinking language documentation & support for the 21st century David Nathan Endangered Languages Archive SOAS University of London.
Digital Alternatives to Transcribed Records at FAO IAMLADP Working Group on Technology for Conferences, Languages and Publications Task Force on Digital.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
1 Chapter 6 Understanding Computers, 11 th Edition Software Ownership Rights Software license: agreement, either included in a software package or displayed.
Preservation and Provision of Access of
Introduction to Computers
Jones Hall Archives: From the National Archives to Your Family Papers.
Multimedia and the Web Chapter Overview  This chapter covers:  What Web-based multimedia is  how it is used today  advantages and disadvantages.
Ready – Set – ACTION! Jumpstart your information literacy classes with visual blogs using videos, photos and more! Jane Verostek Associate Librarian SUNY.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Are you as technology-literate as a 5 th grader?.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
DIGITIZE, SHARE, AND BACKUP YOUR PICTURES, MOVIES, DOCUMENTS.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
L.E.A. Data Technologies L.E.A. Data Technologies Introduction.
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
Your Portfolio 1Copyright © Texas Education Agency, All rights reserved. Images and other multimedia content used with permission.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
Gael Stream, Gaelic, and Digitization Rita Campbell, Special Projects Librarian Sheldon MacDonald, Special Projects Assistant Susan Cameron, Celtic Collection.
NCLA RTSS September 30, Cataloging  Completed In-house  Copy cataloging with editing for local needs  Original cataloging when necessary  Professional.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Business Software What is database software? p. 145 Allows you to create, access, and manage data Add, change, delete, sort, and retrieve data Next.
AAM Overview An Introduction to An Adventure of the American Mind For New Partners.
Cost Principles provide guidance for determining eligible costs and whether those costs are direct or indirect. Outlined in detail in OMB Circular A-21.
EDT 608 Unit 6 ePortfolios EDT 608 Unit 2. EDT 608 Unit 2 There are many ways to create materials for ePortfolios Your choices will need to take into.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
How to Retain the Balance? Archives of Latvian Folklore in the Digital Age Aldis Pūtelis LFK.
INDIANA STATE LIBRARY 2010 LSTA Grants The Project Budget Worksheet.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Digital Library of the Caribbean Project Planning Phone:
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Discovering Computers Fundamentals, 2010 Edition Living in a Digital World.
Memory Masters Preserving Digitized Histories— for today, for tomorrow, and for the future This project is made possible by a grant from the federal Institute.
Global Rangelands Data Entry Guidelines March 23, 2015.
Objectives Overview Identify the four categories of application software Describe characteristics of a user interface Identify the key features of widely.
Pre-Course Assignment
Video on the Web.
Application Software Chapter 6.
Egyptian Language School
Discover. Connect. Create.
YugNIRO Digitization Proposal 2012
Heidi Johnson The University of Texas at Austin
2017 AP® Preadministration Session
                      Digital Audio 1.
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Presentation transcript:

The Cost of Archiving: The AILLA Perspective Susan Smythe Kung, PhD 3rd INNET Conference “Costing and sustainable finding of endangered language archives” April 29, 2014

Increased number of deposits due to: Increased awareness of need to preserve primary language materials Increased awareness of AILLA “New” requirement (of US federal funding agencies) for a Data Management Plan (DMP) – NSF requirement since Jan

3 Parts Part 1 – AILLA’s background Part 2 – The costing exercise Part 3 – AILLA’s administrative costs

Part 1: AILLA’s Background

AILLA is a digital repository of multimedia resources in and about the indigenous languages of Latin America. It is a small, special collection within the Benson Latin American Collection at UT-Austin. The collections consist of  linguistic primary source field data such as field notes, audio and video recordings, photos and sketches in a wide range of genres (stories, myths, chants, songs, conversations, prayers, rituals, etc.)  analyzed data such as grammars, dictionaries, ethnographies, and manuscripts.

AILLA’s Mission: Preservation: To preserve irreplaceable materials in and about indigenous languages of Latin America, especially primary source field data of the type that has traditionally not been publicly available. Access: To make these materials and/or their metadata available to everyone, especially indigenous people, over the Internet.

History: Founded as a joint project between College of Liberal Arts (COLA) and the University of Texas Libraries (UTL) by Joel Sherzer, Anthropology Anthony Woodbury, Linguistics Mark McFarland, UTL Digital Initiatives Project began in 2000 with seed money from COLA. Pilot site launched March Permanent site launched Jan. 31, Repository and website upgrade to take place (we hope!).

Today: Jointly supported by the COLA and UTL. Part of LILLAS Benson Latin American Studies and Collections Located inside the Nettie Lee Benson Latin American Collection on the campus of the University of Texas at Austin

AILLA Collection Statistics: (stats as of August 29, 2014) 298 languages 22 Latin American countries 12,796 resources 100,041 media files 19,294 audio recordings (6,773 hrs, 14 min, 18 sec) 2,373 video recordings (1,215 hrs, 34 min, 23 sec)

AILLA Collection Statistics (cont’d): 5,302 digital texts (97,580 pages) 38,491 scanned pages 4,331 images Only 20% restricted Access 1.8 TB 138 Depositors Over 5,000 registered users from all over the world

AILLA Staff: Full-time Manager – Susan Kung (supported by COLA & UTL) 2 Graduate Research Assistants, 20 hrs/wk ea. (supported by grant-funded projects)

Work is also done by: UTL Digital Library Services Staff provide server management and minimal technical support – their salaries do NOT come out of the AILLA budget. Undergraduate Interns (paid university stipends; independent research credit; volunteer) MLIS/MSIS Capstone (thesis) projects Volunteers

AILLA’s Costs: 1.Digitization, curation, ingestion – Part 2 2.Data and Metadata storage – Part 2 3.Administration – Part 3 4.Software development and maintenance – Not covered here 5.Data and metadata migration – Not covered here

Part 2: The Costing Exercise Collection 1: Analog Collection 2: Born-Digital

Collection 1 Analog Contents: 20 audio cassettes, each 60 min. long, in good condition (unknown number of recording events) Metadata spreadsheet for recordings on cassettes 5 transcriptions, hand-written, (unknown # of pages) 200 photographs on photo paper + paper list of photo contents Collection size (after digitization) = 100 GB

Additional specifications needed at AILLA for Collection 1: Q1: How many different speech events are on each tape? Our preference is to separate different speech events into separate resources. I’ll assume there are 3 narratives (of about 10 minutes) per side for a total of (3x2x20) 120 speech events. Q2: How long are the transcriptions? I’ll assume they are about 25 pages each, for a total of 125 scanned pages. Q3: How many research participants were involved? I’ll assume that there were 10 participants.

A resource is AILLA’s term for an organized bundle or set of related files. A resource might consist of just 1 file, e.g., a single mp3 audio file of a recorded narrative, or numerous files, e.g., simultaneous audio and video recordings of a speech event, plus an Elan transcription, or a semester’s worth of recorded lectures about indigenous languages, plus the class syllabus and handouts.

Collection 1 Audio: Required Tasks 1.Digitize the cassettes: each side of each cassette = 1 wav file; total wavs = 40; file names = tape1_sideA, etc. 2.Edit the wave files into individual speech events and assign AILLA IDs: assuming 3 speech events per side, total speech events = 3x2x20 = 120 wav files 3.Convert wav (archival) files to mp3 (access) files 4.Add AILLA IDs to the metadata spreadsheet and collect additional metadata about each speech event, e.g., length of wav, recording specifications, original source, etc.

Collection 1 Paper Transcriptions: Required Tasks 1.Scan each page and create 5 multi-page tif files. Simultaneously assigned AILLA ID as filenames. 2.Add row to spreadsheet for the AILLA ID & Metadata. 3.Convert tif (archival) files to pdf/a (access) files.

Collection 1 Paper Photos: Required Tasks 1.Scan each photo and create 8 multipage tif files of 25 photos each; assign AILLA IDs. 2.Add AILLA IDs, photo contents from paper list, and other metadata to the MD spreadsheet 3.Convert tif (archival) files to jpg (access) files.

Collection 1: Ingestion Required Tasks 1.Create a collection for the depositor 2.Add all of the research participants to AILLA’s “people database” – assume 10 participants 3.Upload all files to the server (100GB) 4.These steps are done together, but consecutively: Create 121 AILLA resources (120 speech events & 1 photo resource), Link the relevant files, Enter the metadata & assign access level, and Complete Spanish (or English) translations.

Collection 1: Total One-time Cost = $1, TaskAudio Paper Transcription Paper Photos Ingestion 1$535.90$75.90$121.90$ $52.90$1.84$9.20$ $92.00$9.20$13.80$ $4.6000$ Total$685.40$86.94$144.90$

Collection 1: Recurring Cost = ??? Yearly server storage for 100 GB = $66/yr Future file conversion when/if archival and access formats change = ???? Future upgrades of digital repository and asset management software = ??? Future file and metadata migration when repository and asset management software upgrades = ???

Collection 2 Born-Digital Contents: 150 audio wav files, average length = 15 min. 20 video mp4 files, average length = 30 min. 250 digital images 120 eaf files (20 for video, 100 for audio) Metadata spreadsheet listing contents of all files Collection size = 150 GB

Additional specifications needed at AILLA for Collection 2: Q1: How many research participants were involved? Again, I’ll assume that there were 10 participants. Q2: What is the file format of the digital images? I’ll assume that it is jpg

Collection 2: Required Tasks for Digital Collections Massage the metadata (study its organization, rearrange as necessary, add missing info) Rename files w/ AILLA IDs: Rename audio and video files and add the AILLA IDs to the MD spreadsheet; Match each eaf file to its corresponding audio or video file, assign the appropriate related AILLA ID, rename the file, and rearrange MDS if necessary. Create mp3 access copies from the wav files.

Collection 2: Ingestion Required Tasks 1.Create a collection for the depositor 2.Add all of the research participants to AILLA’s “people database” – assume 10 participants 3.Upload all files to the server (150GB) 4.These steps are done together, but consecutively: Create 127 AILLA resources (150 audio, 20 video & 1 photo resource), Link the relevant files, Enter the metadata & assign access level, and Complete Spanish (or English) translations.

Collection 2: Total One-time Cost = $1, TaskDitigalIngestion 1 $23.00 $ a$57.50$ b$6.90NA 2c$230.00NA 3a$5.75$ b$190.90NA 40$ Total$514.05$

Collection 2: Recurring Cost = ??? Yearly server storage for 150 GB = $99/yr Future file conversion when/if archival and access formats change = ??? Future upgrades of digital repository and asset management software = ??? Future file and metadata migration when repository and asset management software upgrades = ???

Price List Categories 1.Digitization of analog media and digital video transfer (all formats except mp4, mpeg, mpg) 2.Curation & organization 3.File conversion 4.Ingestion (file upload, collection creation, participate metadata entry, resource creation and metadata entry) 5.Server storage fees

Category 1: Analog Media and video transfer Part I Audio cassette tape, 60 min.$25 Audio cassette tape, 90 min.$30 Audio open reel tape, 60 min.$35 Audio open reel tape, 90 min.$45 Audio open reel tape, 120 min.$55 Audio minidisk, 60 min.$25 Audio DAT tape, 60 min.$25

Category 1: Analog Media and video transfer Part II Video cassettes (DV, mini DV), 60 min.$45 VHS Video cassette$65 Digital video conversion from any format (excluding mp4, mpeg, mpg) $45 Paper scanning for items up to 24” x 18”, including typed, keyboarded or handwritten manuscripts; photographs, designs, drawings, sketches $10 + $1 per page Negative scanning$10 + $X per negative Slide scanning$10 + $X per slide

Category 2: Curation & Organization Metadata handling fee – required for all collections so that we can determine the state and organization of the metadata $25 flat fee Metadata compilation (e.g., from notes written on paper, tape covers, etc.) $25/hour File/materials organization$25/hour Digital file renaming50¢/file

Category 3: File Splitting and Conversion Audio file splitting (only done when there are detailed and specific notes indicating where the split should be made.) $10 per wav file created by the split Digital audio files, wav to mp35¢/file Video file conversions, mpeg/mpg to mp4 Missing info Image file conversions to pdf/a (manuscripts) or jpg (images) $2/file

Category 4: Ingestion On-line collection creation (for 1 st -time depositors or to start a new/different collection) – There is on collection fee to add new/more data to an existing collection $15 Add participants to the AILLA people database (all research participants MUST be added to the database unless they have chosen to be or a required to be anonymous; or for very old or poorly identified data for which research participants’ names are not known $4/participant Upload all files to AILLA’s server$25 Create resources, link the relevant files, enter the metadata, and complete Spanish (or English) translations – IFF Spanish/English translations are included in your metadata $10/resource Create resources, link the relevant files, enter the metadata, and complete Spanish (or English) translations, add Spanish/English translations $15/resource

Category 5: Storage Fees I haven’t quite figured out how to calculate this charge. I think it’s better to charge a flat fee up front (which can be written into a grant budget), but I want to hear the results of our DELAMAN discussion. 100 GB$66/year 150 GB$99/year

Part 3: AILLA’s Administrative Costs

3 Areas that fund AILLA’s Administrative Costs: 1.Institutional Support 2.Grants – Direct Costs 3.Grants – Indirect Costs A 4 th Area—the AILLA endowment, which was, and still is, built from monetary donation to AILLA– will cover some costs (to be determined) in the future, but it has not been accessed yet.

Institutional Support covers: Manager’s salary & fringe (UTL & COLA) Office space & some furniture (UTL) Phone service (COLA) Electricity (UT) Manager’s travel for professional development (UTL & COLA) Computer ITS – COLA Server ITS – UTL

Direct Grant Costs (currently) cover: Manager travel to get collections & to make presentations about them at conferences 2 GRAs: salary, fringe & tuition remission Depositor/collaborator trips to AILLA Shipping Some server costs

Direct Grant Costs have covered (past): All of the above, plus: PC and Mac computers and laptops Scanners – 2 flat bed, 2 ADF Software – digitization and conversion Audio equipment (tape cassette decks, MD deck, reel-to-reel players) Workshops organized by AILLA (including travel for invited participants)

Indirect Grant Costs cover: Computers for administration, digitization and ingestion Other computer accessories – sound cards, storage media, printers Software – both administrative and for digitization and conversion. Equipment repair (e.g., cassette decks, reel-to-reel players) Office supplies (paper, printer ink, pens, pencils, sticky notes, paper clips, etc.) Printing - AILLA brochure, business cards Shipping Visitor expenses (e.g., lunches, parking) Manager’s membership dues Administrative cloud storage Some office furniture

Operating Budget Counting direct and indirect costs from AILLA’s grants, our operating budget is about $75,000. This # does not include the administrative costs that are provided by UT-Austin. BUT, we have a data backlog of approximately 3 years because we do not have time to process the unsolicited deposits because we are so busy with our “solicited” deposit ( our DEL grant to archive Terrence Kaufman’s collection).

Thank you! Please send comments or questions to or