Download presentation
Presentation is loading. Please wait.
Published byPatricia Baker Modified over 9 years ago
1
Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah
2
Digital formats ● Why do you want your data in digital format? DigitalNonDigital Dictionary databasenotecards in a shoebox under the bed ● examples of increased functionality of digital formats – even in Word format, you can use 'find' instead of flipping through pages
3
What are Best Practices and why should I care? ● Why follow BP – interoperability/data sharing – protect valuable data from loss (obsolescence) – make sure your data outlives you ● Finding out BP: resources – E-MELD (http://www.emeld.org)http://www.emeld.org – OLAC (http://www.language-archives.org/)http://www.language-archives.org/ – DELAMAN (http://www.delaman.org/)http://www.delaman.org/ – Edata (http://www.endangereddata.org)http://www.endangereddata.org
4
Quick and Dirty Best Practice Recommendations Audio ● uncompressed,.wav or.aiff, minimum 44.1khz/16bit Text ● XML, tagged, with valid DTD ● Unicode ● indexed to audio Metadata ● have some
5
Getting there from here ● I've accepted BP, now what?
6
Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use?
7
Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac
8
Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac ● My data is in [insert database name here] which is not supported outside [insert obsolete OS here]. It's fine for me, but others can't use it.
9
General physical format issues ● Analog recordings – cassette – reel-to-reel – wax cylinder ● Field notes – Field Notes, Notebooks – Notecards – Annotated descriptive materials ● Outdated computer media/drives
10
Digitizing analog recordings ● outsourced – audiotechnical experts – equipment and staff limitations – equipment procurement and maintenance problems ● in-house – equipment and space appropriate as part of CAIL's ongoing mission – valuable training for students
11
Field Notes ● Issues – notes may not be in any logical order, even if they are well-catalogued – need to design a digital data structure that represents all of the written information ● Options – scanning as images – scanning/OCR to text files – manual data entry
12
Outdated computer media ● General problems – QUIRKY QUOTE NEEDED ● Floppy disks – consult an expert – building a system that can read old disks and create modern media such as CD or DVD-RAM
13
Software Issues ● More difficult to diagnose ● Usually hand-in-hand w/hardware issues – If your floppy disk is obsolete, the data on it will likely need some updating too ● Fast-paced world of software development – Even files from older versions of the same program may not transfer properly to current software
14
Case Studies ● Hypercard ● Shoebox 3.0 ● Word processors/spreadsheets – MS Word – Excel – WordPerfect – Plain text (.txt) (Not a total pain in the ass)
15
Hypercard data Floppy > CD > Hypercard data > Hypercard-to-text custom tool (VN) > Word Docs >FMPro Database >Print reference 1) Read the floppy 2) Analyze the data (what format is it in, what kind of data is it?) 3) Get the data into a transportable format 4) Structure the data 5) Use the data
16
Shoebox Shoebox 3.0 --> Toolbox --> XML -->XSLT--> HTML Online Mocho dictionary 1) Figured out the structure of the database (Shoebox 3.0) ● ascertain data collector's conventions if possible 2) Migrated to the newest form of the software (Toolbox) 3) Export XML or text version 4) Write XSLT document to create HTML (online/book tutorial) 5) Basic online web version
17
(Word) transcriptions Word transcriptions > Excel > autoglossing tool > Excel dictionary > XML or presentation format 1)Interlinear text documents 2)Tool (VN) to import document to Excel data template 3)Visual Basic tool (VN) to autogloss morphemes ● from Excel dictionary (or other database) 4)Corpus tool 5)Export ● XML for archival format ● XSLT presentation formats
18
Don't forget Metadata ● What is metadata? – Documenting your documents ● Why metadata? – saves lots of trouble later ● Resources – IMDI – OLAC – Your local archivist
19
Minimal metadata These are enough to be getting on with, but always follow your archivist's recommendations – Language – Speaker – Time and place of recording – Collector – Transcriber – Software version and revisions – Transcription conventions / abbreviations
20
(A bit of) What we have learned Save time. Hire a Genius (e.g. Vivian Ngai) – Initiative goes a long way – Knowing your end goal (desired end data format, best practices) makes the intermediate steps more focused – Ask. There are too many people to mention who have answered questions and suggested solutions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.