Download presentation
Presentation is loading. Please wait.
Published byLaura Feld Modified over 5 years ago
1
Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
Timofey Arkhangelskiy Universität Hamburg / Alexander von Humboldt Foundation Anne Ferger Universität Hamburg Hanna Hedeland
2
INEL Long-term documentation project at Hamburg, currently corpora of Selkup, Kamas and Dolgan are being prepared Spoken corpora (+ archival transcriptions) All annotated data stored and edited in EXMARaLDA (time-aligned XML format + GUI) Our goal is (a) long-term preservation of the data; (b) providing easy access to corpora through an online user interface
3
EXMARaLDA > ISO/TEI > tsakorpus
We transform EXMARaLDA data to the XML based on the ISO/TEI standard (good for long- term preservation) We use the Tsakorpus corpus platform for online access ISO/TEI files are converted to Tsakorpus JSON The pipeline is applicable to other spoken corpora hosted at Hamburg Center for Language Corpora
4
Disclaimers (from all of us) INEL-internal data handling (tools, glossing strategies, choice of EXMARaLDA etc.) is outside the scope of our presentation (from me personally) I am only responsible for the ISO/TEI > Tsakorpus conversion and do not participate in INEL
5
Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.