Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uralic multimedia corpora: ISO/TEI corpus data in the project INEL

Similar presentations


Presentation on theme: "Uralic multimedia corpora: ISO/TEI corpus data in the project INEL"— Presentation transcript:

1 Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
Timofey Arkhangelskiy Universität Hamburg / Alexander von Humboldt Foundation Anne Ferger Universität Hamburg Hanna Hedeland

2 INEL Long-term documentation project at Hamburg, currently corpora of Selkup, Kamas and Dolgan are being prepared Spoken corpora (+ archival transcriptions) All annotated data stored and edited in EXMARaLDA (time-aligned XML format + GUI) Our goal is (a) long-term preservation of the data; (b) providing easy access to corpora through an online user interface

3 EXMARaLDA > ISO/TEI > tsakorpus
We transform EXMARaLDA data to the XML based on the ISO/TEI standard (good for long- term preservation) We use the Tsakorpus corpus platform for online access ISO/TEI files are converted to Tsakorpus JSON The pipeline is applicable to other spoken corpora hosted at Hamburg Center for Language Corpora

4 Disclaimers (from all of us) INEL-internal data handling (tools, glossing strategies, choice of EXMARaLDA etc.) is outside the scope of our presentation (from me personally) I am only responsible for the ISO/TEI > Tsakorpus conversion and do not participate in INEL

5 Thank you for your attention!


Download ppt "Uralic multimedia corpora: ISO/TEI corpus data in the project INEL"

Similar presentations


Ads by Google