Download presentation
Presentation is loading. Please wait.
Published byMyles Cannon Modified over 8 years ago
1
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for Psycholinguistics DOBES Endangered Languages Project
2
What is a digital archive? Two essential dimensions Long-term Preservation of all resources and relations Accessibility and Interpretability Why preserve? face the loss of our cultural memory on electronic media UNESCO: 80% of the recordings about languages and cultures are highly endangered There are no guarantees for preservation but we can increase chances of survival store everything in a well-organized repository (browsable/searchable) take care of redundancy, migration and curation on various dimensions establish organizations that take responsibility Digital Archives are living Entities! Live Archives Concept: allow enrichments (standoff), relations etc
3
What is in MPI’s archive? Endangered Language Documentation resources –Representative record of a language in its cultural context –Crucial is the active involvement of the community –May help in maintaining and revitalizing languages –Therefore: trend towards complementing linguistic information with ontological one in collaborative spaces Child language, bilingualism, gesture, sign language, corpus spoken Dutch, sound corpora, second learner corpora, etc. Mostly annotated audio/video recordings 30 Terabyte, 53.000 AV resources, 24.000 annotation files, 60 Mio annotations, lexicons, sketch grammars, etc. All from a large number of depositors
4
DOBES Languages 40 language teams from the DOBES program documenting about 60 languages and working independently
5
Language Archiving Technology Shoebox/CHAT Transcriber XML Data Archiving and Copying ELAN/LEXUS/SYNPATHY Annotation + Lexicon preparation IMDI / GIS Metadata Browsing & Searching ANNEX/LEXUS/IMEX/ TROVA Complex Access via Web ODIT/ISOcat Ontology management framework utilization ADDIT/VICOS/MEL Enrichments/Views LAT LAT to support operations during resource life-time IMDI Data Organization, Metadata LAMUS Data Uploading and Management Access Management integration Archive Grid Federation support standards where possible
6
LAT Dimensions: Management & Upload resources metadata take care of consistency check uploaded formats convert where possible create presentation formats create indexes allow access rights definition add unique & persistent IDs take care of distribution basis is a robust repository system with reliable mechanisms metadata editing repository system LAT
7
LAT Dimensions: Complex Access access to annotated media or multimedia lexica callable via any other web application LAT
8
LAT Dimensions: Customized views fostering the creation of special web-sites by REST interfaces and templates fostering GIS presentations by special converters LAT
9
Who are our users? StakeholderInterest archivist easy management, easy discovery, consistency, statistics, versioning,.. researchers easy visualization, easy discovery, virtual collections, extensions, permissions,.. communities semantic exploration, extensions, permissions,.. journalistsappetizers, easy inspection,.. studentscuriosity, navigation, inspection,.. Still in a download first paradigm – not cyberinfrastructure usage (result of an ESF/NSF workshop)
10
‘Download first…’ problems and disadvantages Tool and format updates are propagated to users at a slow rate ’legacy’ formats offered to archives pose an increasing burden on archives or tool builders (conversion/migration) New techniques slowly spread through the community Can we provide more incentives on the tools side? Orchestration between tools becomes much more difficult if not impossible Users need to install tools locally
11
How to extend LAT? Paper dictionaries’ limited usefulness in language maintenance & language revival (Manning et al., 2000) “Linear” lexicons not at all interesting except for linguists Speech community may prefer explicit semantic acces and links, possibly of a wide variety of types (i.e. beyond formal systems) Semantic view not limited to lexicons, but should include all fragments LAT Therefore, introduction of conceptual spaces, where concepts are related to others anchored in language illustrated with multimedia Extension of LAT with ADDIT and VICOS towards cyberspace paradigm ADDIT: relations between arbitrary fragments VICOS (Visualizing Conceptual Spaces): relations within and across lexicons and easy visualization make VICOS a collaborative tool
12
ADDIT: Commentary & Relations allow authorized people to make arbitrary comments on and relations between object fragments visualize them in tools and via VICOS
13
VICOS: Lexical relations & navigation Allow users to create relations within and across lexicons across: cognate sets etc Visualize and allow easy navigation in conceptual spaces Empower community members to actively describe their L&C and to learn from such resources –Decide which words offer key access to cultural concepts –Technology needed to link words (and the associations they evoke) to other words and to all sorts of relevant fragments Conceptual Spaces = informal ontology of fuzzily-defined concepts and relationships But where “concepts” are anchored in corresponding formal lexicon entries
15
Team and Acknowledgements LAT Team System Managers Archive Managers & Digitization Software Developers Acknowledgements The work was funded by the VolkswagenFoundation, the European Commission, the Dutch Science Organization, the Dutch Institute for Lexicology, the Max Planck Society and the Max Planck Institute for Psycholinguistics LAT
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.