Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna.

Similar presentations


Presentation on theme: "CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna."— Presentation transcript:

1 CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna

2 TEI at ICLTT AAC – Austrian Academy Corpus – diachronic corpus ~ 500 mil. tokens – being converted into TEI C4 – distributed corpus of german of 20 th century – Basel, Berlin, Bozen, Wien – harmonized format (TEI/teiHeader) Dict-Gate – TEI encoded multilingual lexicons (persian, arabic, german, english) – however described with LexicalResourceProfile Abacus – Austrian Baroque Corpus – 3 (5) historical texts encoded in TEI – elaborate teiHeader 2

3 TEI (and friends?) in CMD 3 ProjektAuthor, YearProfileComp/Elem/Datcatsinstances Deutsches Text Archiv ? teiHeader #clarin.eu:cr1:p_1345180279115 (NOT in CompReg!) 56/82/10857 ICLTTDurco, 2010 teiHeader #clarin.eu:cr1:p_1282306194508 16/35/13 (7 dublincore, 6 isocat) 467 Leipzig Corpora Eckart, 2012 TEIDocumentDescription #clarin.eu:cr1:p_1337778924992 4/17/17 (isocat) ? NederlabZhang 2013 ? DBNL_Tekst #clarin.eu:cr1:p_1361876010678 DBNL_Tekst_Onzelfstandig #clarin.eu:cr1:p_1366279029218 (private) 20/38/15 20/47/21? overview of currently existing TEIish CMD-profiles

4 teiHeader (ICLTT) 4 size = reuse in other profiles

5 teiHeader (DTA) 5 size = count elements in instance data

6 datcats in teiHeader(DTA) 6

7 TEI and ISOcat a special DCS: TEi Header (2.1.0) – Windhouwer, 2012 – a datcat for every element of the teiHeader (135 datcats) – based on an ODD-file (ODD2DCIF.xsl and DCIF2ODD.xsl available) – owed to CLARIN-NL projects using TEI header a enriched schema was generated = annotated with these new data categories ( dcr:datcat -attribute) put in SCHEMAcat: http://lux13.mpi.nl/schemacat/schema/teiHeaderhttp://lux13.mpi.nl/schemacat/schema/teiHeader define relations between TEI and other data categories in RELcat (the relation registry) 7

8 Next Step(s) ? create (or adapt existing) teiHeader profile – as a union of the existing profiles ? – based on the enriched schema – i.e. linking to the new TEI data categories – define a relation set in RELcat between TEI and ISOcat (and dublincore) data categories 8

9 profile: data (LINDAT) dublincore + metashare 9

10 profile: data (LINDAT) resourceInforesourceInfo-component 10

11 dublincore I 2 profiles with dc-terms (55 datacategories) 2 profiles with dc-elements (called „dc-terms“) as of 2013-01 11

12 dublincore II currently (2013-06) 4 DCMI-terms profiles 4 DCMI-terms profiles 12

13 dublincore III 13 (almost) all datcats shared by all

14 dublincore IV 1 profile has extra component: DANS-DC-metadata example: language 14


Download ppt "CMD and TEI CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďurčo, ICLTT, Vienna."

Similar presentations


Ads by Google