Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah.

Slides:



Advertisements
Similar presentations
International Household Survey Network Metadata Toolkit Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November.
Advertisements

Designing Tables in Microsoft Access By Ed Lance.
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Alternative FILE formats
Endnote Tutorial The Version pictured is version 9.0 May 8, 2007.
Schaffer Library of Health Sciences E-Journals Troubleshooting Guide Reference: Circulation: Click here to begin.
The Future of the Document Paper is OUT Trees are IN UVic Humanities Computing and Media Centre.
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
IWebFolio Using a Template Tutorial Images in this tutorial:
Bar|Scan ® Asset Inventory System The leader in asset and inventory management.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Cataloguing and Data Sharing: Getting the Most Out of Archives Management Software Joanna Rae Ellen Bazeley-White British Antarctic Survey High Cross,
Increasing Website ROI through SEO and Analytics Dan Belhassen greatBIGnews.com Modern Earth Inc.
 What is web accessibility? ture=relatedhttp://
An Introduction to Microsoft Word. Microsoft Word This program allows you to type letters, papers, reports and even books. It is available through the.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Create a Website on the CWU network Find “How to Post a Web Page with a PC”
A+ Guide to Hardware: Managing, Maintaining, and Troubleshooting, Sixth Edition Chapter 9, Part 2 Satisfying Customer Needs.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Ensuring that digital data last The priority of archival form over working form and presentation.
INTRODUCTION TO DREAMWEAVER 8. What we already know…  Design basics  Contrast  Repetition  Alignment  Repetition  HTML.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Instructional Guide Original presentation created by EasyBib, adapted by S. Hall for educational purposes following Fair Use Guidelines and permission.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Practical training: Advanced PowerPoint 2010 LAN 321 Rachel Shively LAN 321 Rachel Shively.
Digitisation of Archival and Manuscript Materials in Libraries Presentation by Martin Bradley.
Introduction to Computers Seminar I. Parts of the Computer Personal Computer a PC (any non-Mac computer) has four major pieces of hardware-- keyboard,
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
ONENOTE KEEPS TRACK OF STUFF AT WORK, HOME, OR SCHOOL.
An Introduction to Microsoft Word. Microsoft Word This program allows you to type letters, papers, and other documents. This program allows you to type.
FILES & FOLDERS Organization Computer hard drives hold an enormous amount of data or information. Knowing how a computer's organization system works.
Collecting Data Types, coding, accuracy, file formats and the effect of data loss.
Chapter 17 Creating a Database.
QUT Library EndNote : Managing images. Adding images to EndNote records With EndNote Version 7, images may be embedded within records The Figure, Chart.
Introduction to PowerPoint Curriculum Implementation Day Friday, November 3, 2006 K.J. Benoy.
HTML Primer for Technical Communicators TECM 5191 Dr. Lam.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
WEB APPLICATION DEVELOPMENT For More visit:
Operating Systems Foundation Computing Half the people you know are below average.
XP Practical PC, 3e Chapter 3 1 Installing and Learning Software.
By… Prapasri Fungsriwirot Database Training Division Book Promotion & Service Co., Ltd Latest Update 13/01/50.
What it is and how it works
XML Basics A brief introduction to XML in general 1XML Basics.
1 EndNote X2 Your Bibliographic Management Tool 30 September 2009 Aaron Tay Tel: /30
1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
CD Web XMS Training How to use the Xeno Media web site content management system.
WCL303 Business Desktop Deployment (BDD) 2007: Part 2, Deploying the 2007 Office system Michael Niehaus Systems Design Engineer Microsoft
Ten Commandments of Word Processing. I. Thou shall not use spaces n Put no more than two spaces together. n Use the key to line things up. n Better yet,
Quick Launch. Google Drive 30 GB Cloud Space Document.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
Automation Living in a Paper Oriented World and The Steps to Automation.
Basic HTML Page 1. First Open Windows Notepad to type your HTML code 2.
Basic Web Design UVI CELL Dave Gilliss Dave Gilliss
Accessibility and Teaching Online Beth Case Program Manager for Digital, Emerging, and Assistive Technologies University of Louisville, Delphi Center.
Installing and Learning Software
The Version pictured is version 9.0 May 8, 2007
Software and file types
Heidi Johnson The University of Texas at Austin
NexGen Data Entry is a premier outsourcing company in India providing the best IT enabled business process outsourcing services globally. We offer a wide.
INF 620 Enthusiastic Study/snaptutorial.com
Use It or Lose It! Preserving Your Digital Documents
Applications Software
XML- based dissemination process based on Common Structure of Statistical Information (CoSSI) Harri Lehtinen.
Microsoft Office Access 2003
Microsoft Office Access 2003
Making a Change.
Presentation transcript:

Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah

Digital formats ● Why do you want your data in digital format? DigitalNonDigital Dictionary databasenotecards in a shoebox under the bed ● examples of increased functionality of digital formats – even in Word format, you can use 'find' instead of flipping through pages

What are Best Practices and why should I care? ● Why follow BP – interoperability/data sharing – protect valuable data from loss (obsolescence) – make sure your data outlives you ● Finding out BP: resources – E-MELD ( – OLAC ( – DELAMAN ( – Edata (

Quick and Dirty Best Practice Recommendations Audio ● uncompressed,.wav or.aiff, minimum 44.1khz/16bit Text ● XML, tagged, with valid DTD ● Unicode ● indexed to audio Metadata ● have some

Getting there from here ● I've accepted BP, now what?

Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use?

Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac

Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac ● My data is in [insert database name here] which is not supported outside [insert obsolete OS here]. It's fine for me, but others can't use it.

General physical format issues ● Analog recordings – cassette – reel-to-reel – wax cylinder ● Field notes – Field Notes, Notebooks – Notecards – Annotated descriptive materials ● Outdated computer media/drives

Digitizing analog recordings ● outsourced – audiotechnical experts – equipment and staff limitations – equipment procurement and maintenance problems ● in-house – equipment and space appropriate as part of CAIL's ongoing mission – valuable training for students

Field Notes ● Issues – notes may not be in any logical order, even if they are well-catalogued – need to design a digital data structure that represents all of the written information ● Options – scanning as images – scanning/OCR to text files – manual data entry

Outdated computer media ● General problems – QUIRKY QUOTE NEEDED ● Floppy disks – consult an expert – building a system that can read old disks and create modern media such as CD or DVD-RAM

Software Issues ● More difficult to diagnose ● Usually hand-in-hand w/hardware issues – If your floppy disk is obsolete, the data on it will likely need some updating too ● Fast-paced world of software development – Even files from older versions of the same program may not transfer properly to current software

Case Studies ● Hypercard ● Shoebox 3.0 ● Word processors/spreadsheets – MS Word – Excel – WordPerfect – Plain text (.txt) (Not a total pain in the ass)

Hypercard data Floppy > CD > Hypercard data > Hypercard-to-text custom tool (VN) > Word Docs >FMPro Database >Print reference 1) Read the floppy 2) Analyze the data (what format is it in, what kind of data is it?) 3) Get the data into a transportable format 4) Structure the data 5) Use the data

Shoebox Shoebox > Toolbox --> XML -->XSLT--> HTML Online Mocho dictionary 1) Figured out the structure of the database (Shoebox 3.0) ● ascertain data collector's conventions if possible 2) Migrated to the newest form of the software (Toolbox) 3) Export XML or text version 4) Write XSLT document to create HTML (online/book tutorial) 5) Basic online web version

(Word) transcriptions Word transcriptions > Excel > autoglossing tool > Excel dictionary > XML or presentation format 1)Interlinear text documents 2)Tool (VN) to import document to Excel data template 3)Visual Basic tool (VN) to autogloss morphemes ● from Excel dictionary (or other database) 4)Corpus tool 5)Export ● XML for archival format ● XSLT presentation formats

Don't forget Metadata ● What is metadata? – Documenting your documents ● Why metadata? – saves lots of trouble later ● Resources – IMDI – OLAC – Your local archivist

Minimal metadata These are enough to be getting on with, but always follow your archivist's recommendations – Language – Speaker – Time and place of recording – Collector – Transcriber – Software version and revisions – Transcription conventions / abbreviations

(A bit of) What we have learned Save time. Hire a Genius (e.g. Vivian Ngai) – Initiative goes a long way – Knowing your end goal (desired end data format, best practices) makes the intermediate steps more focused – Ask. There are too many people to mention who have answered questions and suggested solutions