Zum Aufbau eines multimedialen Spracharchivs Dagmar Jung (Institut für Linguistik, Allgemeine Sprachwissenschaft, Universität zu Köln) CCeH Eröffnungsworkshop.

Slides:



Advertisements
Similar presentations
Current design issues for digital archives Robert Munro (presented by David Nathan) Endangered Languages Archive (ELAR), School of Oriental and African.
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
Mitglied der Leibniz-Gemeinschaft Querying Spoken Language Corpora Thomas Schmidt IDS Mannheim.
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Chapter Thirteen Conclusion: Where We Go From Here.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
SCIENTIFIC SOLUTIONS Thomson ResearchSoft Paul Torpey April 8, 2005.
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
InfoTrac Power Search 2.0 Lund Online 2009 – Products & Platforms Monique Schutterop.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such.
Current Trends in Language Documentation and the Hans Rausing Endangered Languages Project Lenore A. Grenoble Dartmouth College Lenore A. Grenoble Linguistics.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
The Western Waters Digital Library: Building a Resource Through Multi- State Collaboration and Technology Dawn Paschal Assistant Dean, Digital Library.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Digital Libraries: Background and Overview NAWeb 2003 Jeremy Rowe Arizona State University Partnership for Research In Spatial Modeling.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.
Max Planck Institute for the History of Science Urs Schoepflin & Simone Rieger, Max Planck Institute for the Histoy of Science, 2009Schoepflin/Rieger December.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
A Repository of Cultural Heritage Objects: Criteria of Annotation and Archiving Carina Kargl & Elisabeth Steiner Zentrum für Informationsmodellierung –
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
Documenting Endangered Languages Claire Bowern Rice University and CRLC, ANU (talk slides will be available.
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Improving Description through Collaboration: The Ethnomusicological Video for Instruction & Analysis Digital Archive Music Library Association, February.
E-Humanities in Germany: Some thoughts. (Not just on Germany.) Dr. Max Vögler Libraries and Information Sciences German Research Foundation (DFG)
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
IT and Network Organization Ecommerce. IT and Network Organization OPTIMIZING INTERNAL COLLABORATIONS IN NETWORK ORGANIZATIONS.
ARKIVA The Digital Archive of the Society of Swedish Literature in Finland Jessica Parland-von Essen
NLBIF The Netherlands Biodiversity Information Facility NLBIF The Netherlands Biodiversity Information Facility Cees Hof Netherlands Biodiversity Information.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
1 Annotation Framework March Terminology CV - abbreviation for controlled vocabulary CRS - Community Review System (a collection within DLESE)
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
Joel Priestley, Text Laboratory Oxford, April 2016
? What is Institutional Repository for Rutgers University
Evaluating and Interpreting Oral History
Heidi Johnson The University of Texas at Austin
European Network of e-Lexicography
Bird of Feather Session
New Platform to Support Digital Humanities in the Czech Republic
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

Zum Aufbau eines multimedialen Spracharchivs Dagmar Jung (Institut für Linguistik, Allgemeine Sprachwissenschaft, Universität zu Köln) CCeH Eröffnungsworkshop “IT in der Forschung an der Philosophischen Fakultät der Universität zu Köln”

A brief step back in time: setting the scene of language documentation The linguist/anthropologist The native speaker The transcription The translation The analysis The ideal outcome: texts, grammar, dictionary CCeH

Pliny Earl Goddard 1914: The present condition of our knowledge of North American Languages “There remains a great amount of linguistic work to be done. With so little known of the origin of languages, and the conditions controling their development and their dispersion, it is important that a record should be preserved of every language spoken. In order that that record be adequate, great care must be taken in phonetic representation. The sounds which correspond to the characters employed in writing should be so carefully described as to their manner of articulation and their acoustic effects as to make them thoroughly intelligible for all time. Sufficient material from each dialect should be recorded in the connected form of texts to furnish a fairly complete lexicon of the words it contains and a representation of the grammatical forms in use.” (1914:592, American Anthropologist Vol. 16) CCeH

The DoBeS-Program (Dokumentation bedrohter Sprachen) Funded by the VolkswagenFoundation Started in 2000 – ca. 45 projects worldwide Technical team and archive development: MPI Two main goals: – Documentation of endangered languages (gathering of audio and video data in the field and annotating them) – Creation of a web-accessible, multi-media digital archive that will persist over a longer period of time CCeH

The DoBeS projects (2008) CCeH

KÖBES – Kölner Dobes-Projekte Prof. G. Dimmendaal (Afrikanistik): “A multi-media documentation of verbal communication among the Tima” ( ) „A linguistic and anthropological documentation of Tima” ( ) Dr. K. Haude (ASW): “Documenting Movima, an unclassified language of the Moxos region (Bolivia)” ( ) „Making Movima visible: documenting a linguistic isolate in the Moxos cultural complex” ( ) Dr. D. Jung (ASW): “Beaver knowledge systems: language documentation from a placenames’ perspective” ( ) “Real places and virtual representation - Beaver language documentation” ( ) CCeH

Challenges today Once the fieldwork situation is set up, a myriad of language data can be recorded There is no limit to the quantity of recordings set by hardware any longer Potentially a flood of audio and video data is collected -> how can it be processed to be useful? CCeH

Flexible Annotation Tools ELAN (time-aligned video/audio annotation) Toolbox (parsing tool and lexical database) Interoperable with other representational and analytic tools (e.g. by providing XML- interfaces) CCeH

Elan: annotation of multi-modal data CCeH

Elan: multiple tiers CCeH

Toolbox CCeH

Tools: LEXUS (under development) Web-based lexical database: allows for customized lexicon creation Also import from Toolbox Multi-media links allowed Its on-line nature ideal for collaborative efforts CCeH

The Multi-Media Archive Is not a place to merely ‘dump’ data and forget about them, but serves for: Data preservation Data presentation Data analysis (e.g. by making use of metadata or intelligent searches) And last but not least (for the scientific community): Data accountability – unique resource identifiers CCeH

The Archive Location CCeH

The Archive: flexible corpus structures CCeH

Metadata Necessary for archival organization – Identity of resources: language name, etc. – also physical characteristics: quality, quantity Desirable for scientific use of resources – Sociolinguistic data of participants – Characteristics of genre – Key words (free) CCeH

ANNEX searches in the archive Allows for simple searches or advanced multi- tier searches within annotations CCeH

ANNEX: multiple views CCeH

Ways of Access and Visualization: Google Earth layer CCeH

Ways of Access and Visualization CCeH

Ways of access: web-accessible stories (derived from ELAN) CCeH

Ways of access: Community Portal CCeH

Changes in Language Resources: Data and Tools Data are not the same (audio, video, quantity and quality) Archive is inherently work-in-progress, NOT published end-product Tools are certainly not the same (annotation, presentation, search engines) Linguistic work has become more cooperative: with communities, with international colleagues, with other disciplines New foundation for linguistics as an empirical science CCeH

PS Goddard, Documentation of Beaver Athabaskan (1917) Rousselot- Apparatus (Kymography) CCeH