The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving.

Slides:



Advertisements
Similar presentations
The OLAC Metadata Set and Controlled Vocabularies Steven Bird Gary Simons Penn SIL.
Advertisements

CIDOC 2000 Using GEM Metadata to Access Education Resources Nancy Virgil Morgan Coordinator
Dublin Core for Digital Video: Overview of the ViDe Application Profile.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
Accessing Distributed Resources Information: An OLAC perspective Steven Bird Gary Simons Chu-Ren Huang Melbourne SIL Academia Sinica ENABLER/ELSNET Workshop.
The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
Outreach Jeff Good UC Berkeley. OLAC's Needs Maximal involvement from the whole community –The more data providers involved the more useful the services.
White Paper on Establishing an Infrastructure for Open Language Archiving Steven Bird and Gary Simons.
OLAC Process and OLAC Protocol: A Guided Tour Gary F. Simons SIL International ___________________________ OLAC Workshop 10 Dec 2002, Philadelphia.
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
An Overview of OLAC: The Open Language Archives Community Gary Simons and Steven Bird Workshop on The Digitization of Language Data: The Need for Standards.
The OLAC Metadata Set Gary Simons Workshop on The Digitization of Language Data: The Need for Standards June 2001.
LSA Archiving Tutorial January 2005 Archives, linguists, and language speakers.
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002http://linguistlist.org.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LREC Symposium: The Open Language Archives Community.
Helen Dry & Anthony Aristar LINGUIST List: LSA Symposium: The Open Language Archives Community 4 January 2002http://linguistlist.org.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
OLAC: Open Language Archives Community OLAC : The Open Language Archives Community Gary F. Simons SIL International and Graduate Institute of Applied Linguistics.
Creating Institutional Repositories Stephen Pinfield.
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
Towards consensus on collection-level description Collection Description Focus Briefing Day 1 British Library, St Pancras, London 22 October 2001 Bridget.
Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
Kristin Eberle Monica Hampton Carmen Velasquez Kristin Eberle Monica Hampton Carmen Velasquez Knowledge Management.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
Digital Encoding What’s behind E-text Resources?.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
Digital Library Architecture and Technology
© LNB Latvian Digital Library as a Resource for Life-long Learning Uldis Straujums Lecturer, University of Latvia, Information System Designer, National.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
Resource Discovery (metadata and searching) Working Group Report.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Is Dublin Core Dying? Kayla Willey – Brigham Young University Cheryl Walters – Utah State University Utah Library Association Annual Conference St. George,
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
P. Schirmbacher Humboldt-Universität zu Berlin The Changing Process of Scholarly Publishing or the Necessity of a New Culture of Electronic.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
Metadata for the Web Andy Powell UKOLN University of Bath
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Serenate1 The librarian’s view Raf Dekeyser K.U.Leuven.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Outline Pursue Interoperability: Digital Libraries
Cataloging the Internet
Open Archives Initiative
Presentation transcript:

The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving and Linguistic Resources 6 Jan 2005, Oakland, CA

Unprecedented opportunity Digital archiving of language documentation and description on the World-Wide Web offers: Digital archiving of language documentation and description on the World-Wide Web offers: Minimal cost multimedia publishing Minimal cost multimedia publishing Maximal access by the citizens of the world Maximal access by the citizens of the world This holds the promise of unparalleled access to information. This holds the promise of unparalleled access to information.

Or, Unprecedented chaos? Pursuing digital archiving of language documentation in isolation will result in: Pursuing digital archiving of language documentation in isolation will result in: Resources that are as good as lost since others wont be able to find them. Resources that are as good as lost since others wont be able to find them. Resources that are not usable by others due to the proliferation of idiosyncratic formats and practices. Resources that are not usable by others due to the proliferation of idiosyncratic formats and practices. This holds out the specter of unparalleled frustration and confusion. This holds out the specter of unparalleled frustration and confusion.

The vision Fulfill the promise (and avoid the specter) by acting in community to define and follow best common practice Fulfill the promise (and avoid the specter) by acting in community to define and follow best common practice A gap analysis: A gap analysis: What users wantthe ideal What users wantthe ideal What users actually get the gap What users actually get the gap What it would take to bridge the gap a community infrastructure What it would take to bridge the gap a community infrastructure

What users want The individuals who use and create language documentation and description are looking for three things: The individuals who use and create language documentation and description are looking for three things: Primary and secondary data about languages Primary and secondary data about languages Computational tools to create, view, query, or otherwise use language data Computational tools to create, view, query, or otherwise use language data Advice on how best to do the above Advice on how best to do the above

The ideal situation

What users actually get The data are archived at hundreds of sites The data are archived at hundreds of sites Some are on Web and user finds them Some are on Web and user finds them Some are on Web but user cant find them Some are on Web but user cant find them Some are not even on Web Some are not even on Web The tools and advice are at different sites than the data The tools and advice are at different sites than the data

The gap

Its even worse The user may not find all existing data about the language of interest because different sites have called it by different names. The user may not find all existing data about the language of interest because different sites have called it by different names. The user may not be able to use an accessible data file for lack of being able to match it with the right tools. The user may not be able to use an accessible data file for lack of being able to match it with the right tools. The user may locate advice that seems relevant but then has no way to judge how good it is. The user may locate advice that seems relevant but then has no way to judge how good it is.

What a community could provide In order to bridge the gap, the individuals who use and create language documentation and description need a community with standards that define: In order to bridge the gap, the individuals who use and create language documentation and description need a community with standards that define: Uniform metadata for describing resources Uniform metadata for describing resources A single gateway for finding resources A single gateway for finding resources A process to review practices and standards A process to review practices and standards

A community infrastructure

Open Language Archives Community OLAC is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: OLAC is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: Developing consensus on best current practice for the digital archiving of language resources Developing consensus on best current practice for the digital archiving of language resources Developing a network of interoperating repositories and services for housing and accessing such resources Developing a network of interoperating repositories and services for housing and accessing such resources

Participating Archives Aboriginal Studies Electronic Data Archive (ASEDA) Aboriginal Studies Electronic Data Archive (ASEDA) Academia Sinica Academia Sinica Alaska Native Language Center Alaska Native Language Center Archive of Indigenous Languages of Latin America (AILLA) Archive of Indigenous Languages of Latin America (AILLA) ATILF Resources ATILF Resources CHILDES Data Repository CHILDES Data Repository Cornell Language Acquisition Laboratory (CLAL) Cornell Language Acquisition Laboratory (CLAL) Dictionnaire Universel Boiste 1812 Dictionnaire Universel Boiste 1812 Digital Archive of Research Papers in Computational Linguistics Digital Archive of Research Papers in Computational Linguistics Ethnologue: Languages of the World Ethnologue: Languages of the World European Language Resources Association (ELRA) European Language Resources Association (ELRA) LACITO Archive LACITO Archive LDC Corpus Catalog LDC Corpus Catalog LINGUIST List Language Resources LINGUIST List Language Resources Natural Language Software Registry Natural Language Software Registry Oxford Text Archive Oxford Text Archive PARADISEC PARADISEC Perseus Digital Library Perseus Digital Library Rosetta Project 1000 Languages Rosetta Project 1000 Languages SIL Language & Culture Archives SIL Language & Culture Archives Surrey Morphology Group Databases Surrey Morphology Group Databases Survey for California and Other Indian Languages Survey for California and Other Indian Languages TalkBank TalkBank Tibetan and Himalayan Digital Library Tibetan and Himalayan Digital Library TRACTOR TRACTOR Typological Database Project Typological Database Project Univ. of Bielefeld Language Archive Univ. of Bielefeld Language Archive Univ. of Queensland Flint Archive Univ. of Queensland Flint Archive

Metadata standard Based on Dublin Core metadata standard: Based on Dublin Core metadata standard: Contributor, Coverage, Creator, Date, Description, Format, Identifier, Language, Publisher, Relation, Rights, Source, Subject, Title, Type Contributor, Coverage, Creator, Date, Description, Format, Identifier, Language, Publisher, Relation, Rights, Source, Subject, Title, Type OLAC adds extensions (with controlled vocabularies) specific to our community: OLAC adds extensions (with controlled vocabularies) specific to our community: Language Identification, Linguistic Data Type, Linguistic Field, Participant Role, Discourse Type Language Identification, Linguistic Data Type, Linguistic Field, Participant Role, Discourse Type

Gateway standard Based on a Digital Library Federation standard Based on a Digital Library Federation standard Open Archives Initiative Protocol for Metadata Harvesting Open Archives Initiative Protocol for Metadata Harvesting Service providers use the protocol to harvest metadata from data providers Service providers use the protocol to harvest metadata from data providers OLAC has four ways to become a data provider OLAC has four ways to become a data provider Implement a dynamic interface to existing database Implement a dynamic interface to existing database Map existing database to a static XML document Map existing database to a static XML document Use web forms of OLAC Repository Editor service Use web forms of OLAC Repository Editor service Under development: Install an E-prints server Under development: Install an E-prints server

Process standard Defines how OLAC is organized: Defines how OLAC is organized: Coordinators, Advisory Board, Council, Archives, Services, Working Groups, Participating individuals Coordinators, Advisory Board, Council, Archives, Services, Working Groups, Participating individuals Defines three types of documents: Defines three types of documents: Standards, Recommendations, Notes Standards, Recommendations, Notes Defines how a document moves from one life-cycle status to another. Defines how a document moves from one life-cycle status to another. Draft, Proposed, Candidate, Adopted, Retired Draft, Proposed, Candidate, Adopted, Retired

Call for participation All institutions and individuals with language resources to share are enthusiastically invited to participate. All institutions and individuals with language resources to share are enthusiastically invited to participate. Visit to: Visit to: Try our two search services Try our two search services Read workpapers and published articles Read workpapers and published articles Subscribe to the OLAC-General mailing list Subscribe to the OLAC-General mailing list Learn how to become a data provider Learn how to become a data provider