Uralic multimedia corpora: ISO/TEI corpus data in the project INEL

Uralic multimedia corpora: ISO/TEI corpus data in the project INEL
Timofey Arkhangelskiy Universität Hamburg / Alexander von Humboldt Foundation Anne Ferger Universität Hamburg Hanna Hedeland

INEL Long-term documentation project at Hamburg, currently corpora of Selkup, Kamas and Dolgan are being prepared Spoken corpora (+ archival transcriptions) All annotated data stored and edited in EXMARaLDA (time-aligned XML format + GUI) Our goal is (a) long-term preservation of the data; (b) providing easy access to corpora through an online user interface

EXMARaLDA > ISO/TEI > tsakorpus
We transform EXMARaLDA data to the XML based on the ISO/TEI standard (good for long- term preservation) We use the Tsakorpus corpus platform for online access ISO/TEI files are converted to Tsakorpus JSON The pipeline is applicable to other spoken corpora hosted at Hamburg Center for Language Corpora

Disclaimers (from all of us) INEL-internal data handling (tools, glossing strategies, choice of EXMARaLDA etc.) is outside the scope of our presentation (from me personally) I am only responsible for the ISO/TEI > Tsakorpus conversion and do not participate in INEL

Thank you for your attention!

Uralic multimedia corpora: ISO/TEI corpus data in the project INEL

Similar presentations

Presentation on theme: "Uralic multimedia corpora: ISO/TEI corpus data in the project INEL"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Uralic multimedia corpora: ISO/TEI corpus data in the project INEL

Similar presentations

Presentation on theme: "Uralic multimedia corpora: ISO/TEI corpus data in the project INEL"— Presentation transcript:

Similar presentations

About project

Feedback