Download presentation
Presentation is loading. Please wait.
Published byFelicity Barton Modified over 9 years ago
1
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
2
Introduction IMDI: International Standards for Language Engineering Metadata Initiative DOBES: Volkswagen Foundation’s Documentation of Endangered Languages initiative AILLA: the Archive of the Indigenous Languages of Latin America
3
Types of resources Audio and video recordings in various digital formats Annotation text files, e.g. transcriptions and translations Standalone texts, e.g. dictionaries, poetry Wide range of genres: from verbal art to scholarly analyses
4
Bundles of resources Session (IMDI, 2001): resources resulting from a linguistic elicitation session - recordings and annotations. Only models one kind of resource production - a recording session. Collections will include a greater variety of resources, in sets of related materials.
5
Types of bundles Canonical bundle: the original session. A digitized recording, in different formats, and some textual annotation files, also in different formats. Minimal bundle: a single file. Examples: dictionary, poem, recording of uninterpretable chants. Meta-bundle: a bundle containing other bundles. Example: a book about a set of annotated recordings.
6
Bundle elements Current: –Name of bundle –Date and place of production Proposed: –Resource relations –Date archived –Last modified
7
Major subschemas Project Collector Content Participants Resources References
8
The Content Subschema Genre is the top-level category: –Interaction: conversation, interview … –Explanation: description, recipe … –Performance: narrative, poem, oratory … –Teaching: primer, textbook … –Analysis: grammar, dictionary …
9
Other Content categories Modality: speech, writing, gesture Communication context: –Interactivity –Planning –Involvement Languages Task Description Keys
10
AILLA’s Content Keys Register: a characterization of how the discourse reflects the social context. Example: honorific speech Style: about poetic and stylistic effects. Examples: parallelism, metered verse.
11
The Project subschema Current elements: –Name: a nickname or acronym –Title: official title –ID: a unique identifier –Contact information Proposed element: –Funder: name of funding organization
12
The Collector subschema AILLA renames this Depositor, since this is the individual we have to keep track of (e.g. for Level 3 access permission). When the Depositor is not also the Collector, Collector can be listed under Participants.
13
The Participants subschema Type: functional role, e.g. creator Role: family relationship Name/Full name Language(s) Ethnic group, age, sex: Education Anonymous: True if participant’s Full name is reserved; False otherwise
14
AILLA additions to Participants Origin: Place (country, region, etc) of origin of the creator of the primary resource in the bundle (e.g. the speaker whose voice is recorded). Occupation: Can be relevant in assessing accuracy of some kinds of data.
15
The Resources subschema Resources contains information about formats and provenance of files in a bundle. Media Files: audio, video, etc. Annotation Files: text files. Proposal: call them all Media Files, to reduce redundancy in the database. (All have URL, size, etc. elements.)
16
Text resources Current elements: –Type: type of annotation, e.g. phonetic transcription. –Content encoding: annotation encoding scheme, e.g. EUROTYP. –Character encoding: character set(s) used in a text file.
17
Text resources 2 Proposed elements: –Transcription type –Translation (aka Glossing) type –Software: used to produce transcriptions, translations, other annotations (e.g. Shoebox) Describe Annotator in Participants (along with Translator, etc.)
18
Proposed subschema Place: composed of several elements: –Continent –Country –Region –Subregion (address) Repeated at least twice, in Bundle and in Participants (Origin). Might also be useful in the Language subschema.
19
Conclusion IMDI schema is a flexible tool. Customization through Key/Value pairs allows local modifications. Most of the proposed changes are terminological, moving from the DOBES in-house terminology to more general usage.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.