Download presentation
Presentation is loading. Please wait.
Published byGordon Andrews Modified over 8 years ago
1
CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna
2
TEI at ICLTT AAC – Austrian Academy Corpus – diachronic corpus ~ 500 mil. tokens – being converted into TEI C4 – distributed corpus of german of 20 th century – Basel, Berlin, Bozen, Wien – harmonized format (TEI/teiHeader) Dict-Gate – TEI encoded multilingual lexicons (persian, arabic, german, english) – however described with LexicalResourceProfile Abacus – Austrian Baroque Corpus – 3 (5) historical texts encoded in TEI – elaborate teiHeader 2
3
TEI (and friends?) in CMD 3 ProjektAuthor, YearProfileComp/Elem/Datcatsinstances Deutsches Text Archiv ? teiHeader #clarin.eu:cr1:p_1345180279115 (NOT in CompReg!) 56/82/10857 ICLTTDurco, 2010 teiHeader #clarin.eu:cr1:p_1282306194508 16/35/13 (7 dublincore, 6 isocat) 467 Leipzig Corpora Eckart, 2012 TEIDocumentDescription #clarin.eu:cr1:p_1337778924992 4/17/17 (isocat) ? NederlabZhang 2013 ? DBNL_Tekst #clarin.eu:cr1:p_1361876010678 DBNL_Tekst_Onzelfstandig #clarin.eu:cr1:p_1366279029218 (private) 20/38/15 20/47/21? overview of currently existing TEIish CMD-profiles
4
teiHeader (ICLTT) 4 size = reuse in other profiles
5
teiHeader (DTA) 5 size = count elements in instance data
6
datcats in teiHeader(DTA) 6
7
TEI and ISOcat a special DCS: TEi Header (2.1.0) – Windhouwer, 2012 – a datcat for every element of the teiHeader (135 datcats) – based on an ODD-file (ODD2DCIF.xsl and DCIF2ODD.xsl available) – owed to CLARIN-NL projects using TEI header a enriched schema was generated = annotated with these new data categories ( dcr:datcat -attribute) put in SCHEMAcat: http://lux13.mpi.nl/schemacat/schema/teiHeaderhttp://lux13.mpi.nl/schemacat/schema/teiHeader define relations between TEI and other data categories in RELcat (the relation registry) 7
8
Next Step(s) ? create (or adapt existing) teiHeader profile – as a union of the existing profiles ? – based on the enriched schema – i.e. linking to the new TEI data categories – define a relation set in RELcat between TEI and ISOcat (and dublincore) data categories 8
9
profile: data (LINDAT) dublincore + metashare 9
10
profile: data (LINDAT) resourceInforesourceInfo-component 10
11
dublincore I 2 profiles with dc-terms (55 datacategories) 2 profiles with dc-elements (called „dc-terms“) as of 2013-01 11
12
dublincore II currently (2013-06) 4 DCMI-terms profiles 4 DCMI-terms profiles 12
13
dublincore III 13 (almost) all datcats shared by all
14
dublincore IV 1 profile has extra component: DANS-DC-metadata example: language 14
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.