Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 2 Outline What is metadata? Why use OLAC metadata? How can you write OLAC metadata for your resources? Metadata in XML Using ORE
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 3 Preliminaries Language documentation is valuable only if it is findable On the Internet, this means “findable by computational means” Efficient search and retrieval of language resources requires the use of metadata
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 4 Metadata is: Structured data about data Similar to catalogue information Usually consists of a set of elements, each of which describes a property of the resource The elements of a metadata set can be encoded in different “languages,” e.g., html, xml, rdf/xml
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 5 An example Title: Biao Min Data Creator (depositor): David Solnit Subject (linguistic field): Language Description Subject (language): Biao Min Date created: April 5, 1982 Description: The Biao Min data on the E- MELD site includes over 3,000 lexical items.....
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 6 Example in HTML
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 7 Example in XML Biao Min Data David Solnit Biao Min
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 8 Metadata Different metadata specifications: MARC, METS, Dublin Core, IMDI, OLAC IMDI & OLAC designed specifically for language documentation
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 9 OLAC Metadata Product of the Open Language Archives Community Strengths: Ease of creation Search & retrieval via the protocols of the Open Archives Initiative
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 10 Open Archives Initiative Cross-disciplinary initiative for search and retrieval of metadata from multiple archives Establishes protocols for “harvesting” metadata records of participating archives and making them available via “Service Providers.” Supports formation of discipline-specific sub-communities such as OLAC (Open Language Archives Community)
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 11 LINGUIST List = OLAC Gateway LINGUIST List is the main service provider for OLAC Harvests metadata from 27 major archives Collects metadata from individual linguists about their language documentation Offers search interface for over 30,000 records of language-related data See:
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 12 OLAC Metadata OAI uses the Dublin Core (DC) metadata standard 15 elements (each optional & repeatable) Core vocabulary for refining elements (dcterms) Sub-communities may qualify DC metadata to suit their specific needs OLAC has qualified DC metadata to better describe language resources.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 13 OLAC Qualifies 5 of the 15 DC Elements Language Publisher Relation Rights Source Subject Title Type Contributor Coverage Creator Date Description Format Identifier
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 14 OLAC recommends 5 extensions: Language OLAC language Subject OLAC Language Linguistic Field Type Linguistic Data Type Discourse Type Contributor Role Creator Role
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 15 Provides a controlled vocabulary for identifying the role of a Creator or Contributor more precisely. The vocabulary identifies approximately twenty roles that are common in the development of language resources. Examples: depositor, signer, transcriber, respondent, editor, consultant, researcher. Documentation: Participant Role
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 16 Language Identification: Provides codes for identifying all known languages, both living and extinct. Applies to: Language, Subject
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 17 Linguistic Field Provides codes for identifying the content of a resource as relevant to a particular subfield of linguistic science Applies to: Subject Examples: anthropological_linguistics, applied_linguistics, cognitive_science, computational_linguistics, lexicography, discourse_analysis,
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 18 Describes the resource as representing a recognized structural type of linguistic information Applies to: Type Examples: Lexicon Primary text Language description Dataset (Already in DCterms). Linguistic Data Type
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 19 Discourse Type Provides a controlled vocabulary for identifying approximately ten discourse types. It is used with Type to identify the genre of a language resource (particularly a primary text). Types: Interactive Discourse, Report, Singing, Oratory, Narrative, Formulaic Discourse, Procedural Discourse, Language Play, Unintelligible Speech archives.org/REC/discourse.html archives.org/REC/discourse.html
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 20 See “metadata” in the E-MELD School of Best Practices: Or use the OLAC Repository Editor: See: Writing metadata