Download presentation
Presentation is loading. Please wait.
Published byLeona Stafford Modified over 9 years ago
1
The Cost of Archiving: The AILLA Perspective Susan Smythe Kung, PhD skung@austin.utexas.edu www.ailla.utexas.org 3rd INNET Conference “Costing and sustainable finding of endangered language archives” April 29, 2014
2
Increased number of deposits due to: Increased awareness of need to preserve primary language materials Increased awareness of AILLA “New” requirement (of US federal funding agencies) for a Data Management Plan (DMP) – NSF requirement since Jan. 2011.
3
3 Parts Part 1 – AILLA’s background Part 2 – The costing exercise Part 3 – AILLA’s administrative costs
4
Part 1: AILLA’s Background
5
AILLA is a digital repository of multimedia resources in and about the indigenous languages of Latin America. It is a small, special collection within the Benson Latin American Collection at UT-Austin. The collections consist of linguistic primary source field data such as field notes, audio and video recordings, photos and sketches in a wide range of genres (stories, myths, chants, songs, conversations, prayers, rituals, etc.) analyzed data such as grammars, dictionaries, ethnographies, and manuscripts.
6
AILLA’s Mission: Preservation: To preserve irreplaceable materials in and about indigenous languages of Latin America, especially primary source field data of the type that has traditionally not been publicly available. Access: To make these materials and/or their metadata available to everyone, especially indigenous people, over the Internet.
7
History: Founded as a joint project between College of Liberal Arts (COLA) and the University of Texas Libraries (UTL) by Joel Sherzer, Anthropology Anthony Woodbury, Linguistics Mark McFarland, UTL Digital Initiatives Project began in 2000 with seed money from COLA. Pilot site launched March 2001. Permanent site launched Jan. 31, 2003. Repository and website upgrade to take place 2015-2017 (we hope!).
8
Today: Jointly supported by the COLA and UTL. Part of LILLAS Benson Latin American Studies and Collections Located inside the Nettie Lee Benson Latin American Collection on the campus of the University of Texas at Austin
9
AILLA Collection Statistics: (stats as of August 29, 2014) 298 languages 22 Latin American countries 12,796 resources 100,041 media files 19,294 audio recordings (6,773 hrs, 14 min, 18 sec) 2,373 video recordings (1,215 hrs, 34 min, 23 sec)
10
AILLA Collection Statistics (cont’d): 5,302 digital texts (97,580 pages) 38,491 scanned pages 4,331 images Only 20% restricted Access 1.8 TB 138 Depositors Over 5,000 registered users from all over the world
11
AILLA Staff: Full-time Manager – Susan Kung (supported by COLA & UTL) 2 Graduate Research Assistants, 20 hrs/wk ea. (supported by grant-funded projects)
12
Work is also done by: UTL Digital Library Services Staff provide server management and minimal technical support – their salaries do NOT come out of the AILLA budget. Undergraduate Interns (paid university stipends; independent research credit; volunteer) MLIS/MSIS Capstone (thesis) projects Volunteers
13
AILLA’s Costs: 1.Digitization, curation, ingestion – Part 2 2.Data and Metadata storage – Part 2 3.Administration – Part 3 4.Software development and maintenance – Not covered here 5.Data and metadata migration – Not covered here
14
Part 2: The Costing Exercise Collection 1: Analog Collection 2: Born-Digital
15
Collection 1 Analog Contents: 20 audio cassettes, each 60 min. long, in good condition (unknown number of recording events) Metadata spreadsheet for recordings on cassettes 5 transcriptions, hand-written, (unknown # of pages) 200 photographs on photo paper + paper list of photo contents Collection size (after digitization) = 100 GB
16
Additional specifications needed at AILLA for Collection 1: Q1: How many different speech events are on each tape? Our preference is to separate different speech events into separate resources. I’ll assume there are 3 narratives (of about 10 minutes) per side for a total of (3x2x20) 120 speech events. Q2: How long are the transcriptions? I’ll assume they are about 25 pages each, for a total of 125 scanned pages. Q3: How many research participants were involved? I’ll assume that there were 10 participants.
17
A resource is AILLA’s term for an organized bundle or set of related files. A resource might consist of just 1 file, e.g., a single mp3 audio file of a recorded narrative, or numerous files, e.g., simultaneous audio and video recordings of a speech event, plus an Elan transcription, or a semester’s worth of recorded lectures about indigenous languages, plus the class syllabus and handouts.
18
Collection 1 Audio: Required Tasks 1.Digitize the cassettes: each side of each cassette = 1 wav file; total wavs = 40; file names = tape1_sideA, etc. 2.Edit the wave files into individual speech events and assign AILLA IDs: assuming 3 speech events per side, total speech events = 3x2x20 = 120 wav files 3.Convert wav (archival) files to mp3 (access) files 4.Add AILLA IDs to the metadata spreadsheet and collect additional metadata about each speech event, e.g., length of wav, recording specifications, original source, etc.
19
Collection 1 Paper Transcriptions: Required Tasks 1.Scan each page and create 5 multi-page tif files. Simultaneously assigned AILLA ID as filenames. 2.Add row to spreadsheet for the AILLA ID & Metadata. 3.Convert tif (archival) files to pdf/a (access) files.
20
Collection 1 Paper Photos: Required Tasks 1.Scan each photo and create 8 multipage tif files of 25 photos each; assign AILLA IDs. 2.Add AILLA IDs, photo contents from paper list, and other metadata to the MD spreadsheet 3.Convert tif (archival) files to jpg (access) files.
21
Collection 1: Ingestion Required Tasks 1.Create a collection for the depositor 2.Add all of the research participants to AILLA’s “people database” – assume 10 participants 3.Upload all files to the server (100GB) 4.These steps are done together, but consecutively: Create 121 AILLA resources (120 speech events & 1 photo resource), Link the relevant files, Enter the metadata & assign access level, and Complete Spanish (or English) translations.
22
Collection 1: Total One-time Cost = $1,922.34 TaskAudio Paper Transcription Paper Photos Ingestion 1$535.90$75.90$121.90$11.50 2$52.90$1.84$9.20$39.10 3$92.00$9.20$13.80$11.50 4$4.6000$943.00 Total$685.40$86.94$144.90$1005.10
23
Collection 1: Recurring Cost = ??? Yearly server storage for 100 GB = $66/yr Future file conversion when/if archival and access formats change = ???? Future upgrades of digital repository and asset management software = ??? Future file and metadata migration when repository and asset management software upgrades = ???
24
Collection 2 Born-Digital Contents: 150 audio wav files, average length = 15 min. 20 video mp4 files, average length = 30 min. 250 digital images 120 eaf files (20 for video, 100 for audio) Metadata spreadsheet listing contents of all files Collection size = 150 GB
25
Additional specifications needed at AILLA for Collection 2: Q1: How many research participants were involved? Again, I’ll assume that there were 10 participants. Q2: What is the file format of the digital images? I’ll assume that it is jpg
26
Collection 2: Required Tasks for Digital Collections Massage the metadata (study its organization, rearrange as necessary, add missing info) Rename files w/ AILLA IDs: Rename audio and video files and add the AILLA IDs to the MD spreadsheet; Match each eaf file to its corresponding audio or video file, assign the appropriate related AILLA ID, rename the file, and rearrange MDS if necessary. Create mp3 access copies from the wav files.
27
Collection 2: Ingestion Required Tasks 1.Create a collection for the depositor 2.Add all of the research participants to AILLA’s “people database” – assume 10 participants 3.Upload all files to the server (150GB) 4.These steps are done together, but consecutively: Create 127 AILLA resources (150 audio, 20 video & 1 photo resource), Link the relevant files, Enter the metadata & assign access level, and Complete Spanish (or English) translations.
28
Collection 2: Total One-time Cost = $1,910.15 TaskDitigalIngestion 1 $23.00 $11.50 2a$57.50$39.10 2b$6.90NA 2c$230.00NA 3a$5.75$11.50 3b$190.90NA 40$1334.00 Total$514.05$1396.10
29
Collection 2: Recurring Cost = ??? Yearly server storage for 150 GB = $99/yr Future file conversion when/if archival and access formats change = ??? Future upgrades of digital repository and asset management software = ??? Future file and metadata migration when repository and asset management software upgrades = ???
30
Price List Categories 1.Digitization of analog media and digital video transfer (all formats except mp4, mpeg, mpg) 2.Curation & organization 3.File conversion 4.Ingestion (file upload, collection creation, participate metadata entry, resource creation and metadata entry) 5.Server storage fees
31
Category 1: Analog Media and video transfer Part I Audio cassette tape, 60 min.$25 Audio cassette tape, 90 min.$30 Audio open reel tape, 60 min.$35 Audio open reel tape, 90 min.$45 Audio open reel tape, 120 min.$55 Audio minidisk, 60 min.$25 Audio DAT tape, 60 min.$25
32
Category 1: Analog Media and video transfer Part II Video cassettes (DV, mini DV), 60 min.$45 VHS Video cassette$65 Digital video conversion from any format (excluding mp4, mpeg, mpg) $45 Paper scanning for items up to 24” x 18”, including typed, keyboarded or handwritten manuscripts; photographs, designs, drawings, sketches $10 + $1 per page Negative scanning$10 + $X per negative Slide scanning$10 + $X per slide
33
Category 2: Curation & Organization Metadata handling fee – required for all collections so that we can determine the state and organization of the metadata $25 flat fee Metadata compilation (e.g., from notes written on paper, tape covers, etc.) $25/hour File/materials organization$25/hour Digital file renaming50¢/file
34
Category 3: File Splitting and Conversion Audio file splitting (only done when there are detailed and specific notes indicating where the split should be made.) $10 per wav file created by the split Digital audio files, wav to mp35¢/file Video file conversions, mpeg/mpg to mp4 Missing info Image file conversions to pdf/a (manuscripts) or jpg (images) $2/file
35
Category 4: Ingestion On-line collection creation (for 1 st -time depositors or to start a new/different collection) – There is on collection fee to add new/more data to an existing collection $15 Add participants to the AILLA people database (all research participants MUST be added to the database unless they have chosen to be or a required to be anonymous; or for very old or poorly identified data for which research participants’ names are not known $4/participant Upload all files to AILLA’s server$25 Create resources, link the relevant files, enter the metadata, and complete Spanish (or English) translations – IFF Spanish/English translations are included in your metadata $10/resource Create resources, link the relevant files, enter the metadata, and complete Spanish (or English) translations, add Spanish/English translations $15/resource
36
Category 5: Storage Fees I haven’t quite figured out how to calculate this charge. I think it’s better to charge a flat fee up front (which can be written into a grant budget), but I want to hear the results of our DELAMAN discussion. 100 GB$66/year 150 GB$99/year
37
Part 3: AILLA’s Administrative Costs
38
3 Areas that fund AILLA’s Administrative Costs: 1.Institutional Support 2.Grants – Direct Costs 3.Grants – Indirect Costs A 4 th Area—the AILLA endowment, which was, and still is, built from monetary donation to AILLA– will cover some costs (to be determined) in the future, but it has not been accessed yet.
39
Institutional Support covers: Manager’s salary & fringe (UTL & COLA) Office space & some furniture (UTL) Phone service (COLA) Electricity (UT) Manager’s travel for professional development (UTL & COLA) Computer ITS – COLA Server ITS – UTL
40
Direct Grant Costs (currently) cover: Manager travel to get collections & to make presentations about them at conferences 2 GRAs: salary, fringe & tuition remission Depositor/collaborator trips to AILLA Shipping Some server costs
41
Direct Grant Costs have covered (past): All of the above, plus: PC and Mac computers and laptops Scanners – 2 flat bed, 2 ADF Software – digitization and conversion Audio equipment (tape cassette decks, MD deck, reel-to-reel players) Workshops organized by AILLA (including travel for invited participants)
42
Indirect Grant Costs cover: Computers for administration, digitization and ingestion Other computer accessories – sound cards, storage media, printers Software – both administrative and for digitization and conversion. Equipment repair (e.g., cassette decks, reel-to-reel players) Office supplies (paper, printer ink, pens, pencils, sticky notes, paper clips, etc.) Printing - AILLA brochure, business cards Shipping Visitor expenses (e.g., lunches, parking) Manager’s membership dues Administrative cloud storage Some office furniture
43
Operating Budget Counting direct and indirect costs from AILLA’s grants, our operating budget is about $75,000. This # does not include the administrative costs that are provided by UT-Austin. BUT, we have a data backlog of approximately 3 years because we do not have time to process the unsolicited deposits because we are so busy with our “solicited” deposit ( our DEL grant to archive Terrence Kaufman’s collection).
44
Thank you! www.ailla.utexas.org Please send comments or questions to ailla@ailla.utexas.org ailla@ailla.utexas.org or skung@austin.utexas.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.