digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen
Digital Humanities Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents
3 Treebanks Available for most ‘modern’ languages But also possible for ‘dead’ languages like Latin, Ancient Greek etinvolved.html Index Thomisticus Treebank, Milano Full query language needed
More treebanks Medieval Portuguese treebank –Under construction In the near future: INPOLDER (CLARIN NL) A parser, not yet a corpus, BUT: through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned –Uncorrected, but manual correction is possible
5 Visualization Gabmap: doing dialect analysis on the web ADEPT-project (CLARIN-NL) Dialects (examples Netherlands/Flanders + USA) including tutorial, manual, video, FAQ, …
6 Pronunciation distance Gabmap: doing dialect analysis on the web
7 Dendrogram Gabmap: doing dialect analysis on the web
Audio CLARIN pilot (NL/FL) TTNWW, audio part TAAL2SPRAAK (CLARIN-Vlaanderen) Audio as a means to enlarge accessibility of larger collections of data (tapes) Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time (useful for psychology, sociology, history)
Audio and older texts Digitization of old texts still problematic (cf DigiHIST) Experiment: Read medieval text aloud and have it automatically transcribed (not trained, modern language model used)
Audio Leuvense Schepenbank _0093_inge_moris.hardsubs.mp4http:// 8130_0093_inge_moris.hardsubs.mp _0093_inge_moris_4gr.pdfhttp:// 8130_0093_inge_moris_4gr.pdf Raw material !!
Written part TTNWW Relate documents, make texts more accessible by making explicit data that are not expressed as such Paris formulated objections, London/John didn’t What is a name, what kind of name is it? Analysis of names in fiction Sagalassos project (archaeology): temporal and geospatial analysis web service, end of 2012
Some more examples When is ‘now’? And where?
Stylometry Stylene (CLARIN-Vlaanderen) –UAntwerpen/Univ.College Gent Is text as a whole written by same person? Show development in style of a specific author Is a text clear? Is it really understandable by, say, children age 10-12? Web service (autumn 2012)
‘stylometry’ as means Is thesis X written by student or by ‘Wikipedia’ Reliability Can text X be written by a 10 year old girl paedophily
Reusability of data For same kind of research For completely other kind of research Both should be encouraged time and money To be taken into account: IPR !
Veterans project Interviews veterans Dutch military actions ( ) 1000 interviews (2.5 h), semi-structured Original: social and military historians Who else can use this archive ? First: reluctance
Veterans 2 People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..) Turned out to be a very valuable corpus! Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content
“Circulation of Knowledge” “Geleerdenbrievenproject” (Letters of scientists) 17 th century: Grotius (Hugo de Groot), Constantijn Huygens, Christiaan Huygens, Descartes, … letters, mainly Dutch, French, Latin Intended for “history of science”, of course also relevant for other disciplines
Polish example: Sejm Polish parliament, 1918 – now Texts, records, video Goal: all kinds of linguistic research But of course: wealth of information for other disciplines as well
Conclusions Several ‘easy-to-use’ research possibilities are (or will soon be) available Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research) Lots of material could be used by third parties as well: do not keep stuff “in your drawer” Students and (young) researchers should be made aware of new possibilities
Sound Registers ( )
35