Language Documentation & Archiving Heidi Johnson The Archive of the Indigenous Languages of Latin America (AILLA) The University of Texas at Austin.

Slides:



Advertisements
Similar presentations
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Advertisements

IRCS Workshop on Open Language Archives 1 OLAC Access Vocabulary Heidi Johnson / AILLA.
IRCS Workshop on Open Language Archives 1 OLAC Role Vocabulary Heidi Johnson / AILLA.
White Paper on Establishing an Infrastructure for Open Language Archiving Steven Bird and Gary Simons.
The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving.
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
LSA Archiving Tutorial January 2005 Archives, linguists, and language speakers.
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Endangered Languages and Web-Based Archiving Megan J. Crowhurst The University of Texas at Austin & CELP Contributors: Chris Beier, Heidi Johnson, Lev.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
ETD Preservation Workshop Session Four: Collection Management for Preservation Gail McMillan, Virginia Tech.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
Legal & Ethical Aspects of Access Management DELAMAN Access Management Workshop Nov 2004 Heidi Johnson (AILLA)  Gary Holton (ANLC)
Selecting a Data Sharing Repository. 2 Why Share Data? Enabling others to replicate and verify results as part of the scientific process Allows researchers.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
FARMINGTON AREA PUBLIC SCHOOLS SUMMER TECHNOLOGY ACADEMY AUGUST 18TH, 2010 Web 2.0 Tools.
Access, Ownership and Copyright Issues in Preserving and Managing Cultural Heritage Resources International Conference on Challenges in Preserving and.
Oral Histories of 20 th Century Science and Medicine Workshop Presentation July 8, 2003.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Current Trends in Language Documentation and the Hans Rausing Endangered Languages Project Lenore A. Grenoble Dartmouth College Lenore A. Grenoble Linguistics.
Welcome to the Southeastern Louisiana University’s Online Employment Site Applicant Tutorial!
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Resource Discovery (metadata and searching) Working Group Report.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Citation examples and recommendations DELAMAN 2006 Heidi Johnson Archive of the Indigenous Languages of Latin America.
Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
HISTORY FAIR AND YOU Tips for parents and students about History Fair Projects.
Corpus Management 101: Creating archive-ready language documentation Heidi Johnson The Archive of the Indigenous Languages of Latin America (AILLA) The.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Corpus Management 101: Creating archive-ready language documentation Heidi Johnson The Archive of the Indigenous Languages of Latin America (AILLA) The.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Lifecycle Metadata for Digital Objects September 11, 2002 Major archival and digital library metadata schemes.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
Journalism & Media Studies Graduate Student Culminating Work : Steps for Submitting to the Campus Digital Archive at USFSP November 21, 2011 by Carol Hixson.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
By Addison, Jessica, and Lauren. Management The Mountain West Digital Library is a program of the Utah Academic Library Consortium (UALC) Three Governing.
Documenting Endangered Languages Claire Bowern Rice University and CRLC, ANU (talk slides will be available.
Dealing with the Dynamic Ginger Dickens and Sunday Phillips of the University of Texas at Arlington Archiving Dynamic Thesis and Dissertation Documents.
Introduction to metadata
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
Copyright Law & Guidelines for Teachers and Students EDUC 5306 Kimberly Murry.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Mr. P’s Class Term Paper All the Steps on the Path to an “A” Term Paper in World History.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Memory Masters Preserving Digitized Histories— for today, for tomorrow, and for the future This project is made possible by a grant from the federal Institute.
This session starts at 1pm Please use the chat pane to introduce yourself Use the Tools Menu > Audio Setup > Audio Setup Wizard to make sure you can hear.
Lunchtime Byte Carys Morgan – Hazel Thomas. Carys Morgan Office Manager and People's Collection Wales Officer responsible for Editorial and Content.
Heidi Johnson The University of Texas at Austin
W. Christopher Lenhardt
e-Thesis Submission: What You Need to Know About Going Global
Data Management: Documentation & Metadata
Metadata for research outputs management
Presentation transcript:

Language Documentation & Archiving Heidi Johnson The Archive of the Indigenous Languages of Latin America (AILLA) The University of Texas at Austin

Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC Heidi Johnson, AILLA Nick Thieberger, PARADISEC Gary Simons, SIL International Wallace Hooper, Indiana University Susan Hooyenga, University of Michigan

A little history Boasian tradition: grammar, dictionary, collection of texts Linguists gave field materials to museums & libraries, e.g. Smithsonian. Seeking a permanent home for endangered language materials. M & L not really able to preserve recordings, other than by storing them in a cool dark place.

History, cont. Anything that can be published was & is a distillation - the product of analysis. Secondary/tertiary resources. Hitherto no feasible means of preserving OR publishing primary materials. The new millenium: digital archives can preserve and/or publish anything.

What is an archive? Archive: a trusted repository created and maintained by an institution with a demonstrated commitment to permanence and the long-term preservation of archived resources. Collection: the body of documentary materials created by researchers and native speakers. Serves as the basis for research & education. Will be deposited in an archive.

Why should you archive? to preserve recordings of endangered/minority languages for future generations. to facilitate the re-use of primary materials (recordings, databases, field notes) for: language maintenance & revitalization programs; typological, historical, comparative studies; any kind of linguistic, anthropological, psychological, etc. study that you yourself won't do.

More reasons to archive to foster development of both oral and written literatures for endangered languages. to make known what documentation there is for which languages. to build your CV and get credit for all your hard work.

Archiving is a form of publishing Even if the resources are restricted, the metadata is public. Get credit for fieldwork in the early stages: list Archived Resources on your CV. Cite data from archived resources. Give consultants proper credit for their work and their creations.

Citing archived resources Sánchez Morales, Germán. (1994). "Satornino y los soldados." [online] Heidi Johnson, (Res.) : Archive of the Indigenous Languages of Latin America. Access=public. ZOH001R010.

What should you archive? Recordings of discourse - audio and/or video - in as wide a range of genres as your community employs. Always get permission for everything: recording archiving excerpting, publishing, etc.

Things you should archive public events: ceremonies, oratory, dances, chants narratives: historical, traditional, myths, personal, children's stories,... instructions: how to build a house, how to weave a mat, how to catch a fish,... literature: oral or written, poetry, any creative work conversations: anything that's not gossip or too personal, e.g. what we did last spring festival

More things you should archive transcriptions, translations, & annotations of recordings field notes, elicitation lists, orthographies - anything other people might find useful datasets, databases, spreadsheets - your secondary (unpublishable) materials sketches of all kinds: grammar, ethnography photographs

Things you should not archive Anything that would cause injury, arrest, or embarassment to the speakers. Example: Pamela Munro's interviews with Zapotecs in L.A. about entering the U.S. illegally. Sacred works with highly restricted uses. But talk to people about safe ways to preserve such works, if they want.

How should you manage your collection? Corpus management rule #1: Label everything you produce with RUTHLESS CONSISTENCY. Corpus management rule #2: Set up a system before you leave & test it along with your equipment. (Tape your friends and relatives to try things out.)

1. Find an archive & get their guidelines DOBES, for their grant recipients: Regional archives: AILLA, ANLC, PARADISEC, others? (See AILLA's Links page) Note: it's not either/or, it's both/all. If there isn't one, write to any one of us, we'll help you.

2. Identify your archival objects Not necessarily the same as a file or a tape. Language documentation materials typically come in related sets, or bundles. Be aware of relations among materials as you create them so you can label them correctly and keep them together.

Relations among items derivation: e.g. a transcription is derived from a recording series: e.g. a long recording that spans several tapes/discs part-whole: e.g. video & audio recordings made simultaneously of the same event association: (fuzzy) e.g. photographs of the narrator of a recording, commentaries

3. Labelling field materials Nothing could possibly be more important than labelling every single item you produce - track, tape, disc, notebook, file slip, digital file, photograph - with RUTHLESS CONSISTENCY.

Example 1: AILLA resource ID ZOH001R040I001.mp3 ZOH = language code 001 = deposit number (first deposit) R040 = 40th resource in that deposit I001 = 1st item in that resource.mp3 = what kind of file If you have an archive, write and ask them for labelling guidelines.

Example 2: participant initials plus a media type code gsm1_au1audio part 1 gsm1_au2audio part 2 gsm1_dbshoebox interlin of the audio gsm1_tx1text, misc notes gsm1_ph1photo of Germán

Example 3: label by media unit, recordings are primary md1t1 - minidisc 1, track 1 md1t1.db - shoebox database for that text nb1 - field notebook 1 ds19.xls - spreadsheet dataset (e.g. verb roots)

Metadata I Catalog information for digital resources. Supports archive & collection management protection of sensitive materials searching use of resources by many people proper citation of archived resources

Metadata II : Minimum info Speakers' full names (plus alias if you want to anonymize in text). Language: Be specific! Zoque of San Miguel Chimalapa, Oaxaca, Mexico. Date of creation: YYYY-MM-DD. Use the primary (recording) date for the bundle. Place of creation: Be specific: village, state, country, or river valley, region, country… Access restrictions & instructions, if necessary. Genre keyword: dependent on choice of schema.

Metadata III Choose either IMDI or OLAC schema. If you have an archive, use the one they tell you. LABEL every metadata entry with the same label you use for the resource. List every related item in the metadata.

IMDI: Session bundle = resource Title, date, place, description Depositor (you): contact info Project: name, director, sponsor, etc. Participants: role, demographic data, contact Resources: provenance, formats, relations, etc. Content: context, genre, narrative description, etc. References: relevant publications

OLAC: Archival object definition is up to you Contributors / creators Title, date, description Resource info: formats Relation to other objects Subject - linguistic subfield Type.linguistic = genre

Corpus management tools From MPI: IMDI Browser & IMDI Data entry. I have a Shoebox 2.0 template that needs porting to Shoe 5.0 (?). Someday, we'll do a Filemaker Pro one. Otherwise, use any database or spreadsheet or Word template and create your own.

Intellectual property rights Define a policy concerning IPR and develop a consistent practice for obtaining consent, e.g., forms and/or recorded statements. Learn how to talk to your consultants about IPR. Ask other researchers who have worked in your region or language community. Note the IPR status of each resource and each item in the metadata.

Formats Text a grammar Audio a recording Video a film archivaltiff / XMLwav 44.1/16 mp2 presentationpdf / htmlmp3?? workingms / MS Word minidisc??

Archive-quality formats are: non-proprietary; that is, the encoding is in the public domain; supports forward migration to new formats; portable, re-useable, repurposeable; best possible reproduction of the original.

When should you archive? As soon as you get back from the field: to prevent accidental damage or loss; to get back handy presentation formats; to build your CV even before you are ready to publish results. If not then, as soon as possible. At the very least, mention your data and an archive in your will.

Archive your data We encourage you to archive recordings ASAP and add transcriptions, translations, annotations, etc. later. Secondary materials are generally reproducible; the primary recordings are not! Students should password-protect their data until they finish their theses.

Useful websites DELAMAN: IMDI: OLAC: EMELD: AILLA: Write to me: