Presentation is loading. Please wait.

Presentation is loading. Please wait.

DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen.

Similar presentations


Presentation on theme: "DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen."— Presentation transcript:

1 digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen

2 Digital Humanities Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents

3 3 Treebanks Available for most ‘modern’ languages But also possible for ‘dead’ languages like Latin, Ancient Greek http://nlp.perseus.tufts.edu/syntax/treebank/g etinvolved.html Index Thomisticus Treebank, Milano http://itreebank.marginalia.it/ Full query language needed

4 More treebanks Medieval Portuguese treebank –Under construction In the near future: INPOLDER (CLARIN NL) A parser, not yet a corpus, BUT:  through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned –Uncorrected, but manual correction is possible

5 5 Visualization Gabmap: doing dialect analysis on the web ADEPT-project (CLARIN-NL) Dialects (examples Netherlands/Flanders + USA) www.gabmap.nlwww.gabmap.nl, including tutorial, manual, video, FAQ, …

6 6 Pronunciation distance Gabmap: doing dialect analysis on the web

7 7 Dendrogram Gabmap: doing dialect analysis on the web

8 Audio CLARIN pilot (NL/FL) TTNWW, audio part TAAL2SPRAAK (CLARIN-Vlaanderen) Audio as a means to enlarge accessibility of larger collections of data (tapes)  Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time (useful for psychology, sociology, history)

9 Audio and older texts Digitization of old texts still problematic (cf DigiHIST) Experiment: Read medieval text aloud and have it automatically transcribed (not trained, modern language model used)

10 Audio Leuvense Schepenbank http://www.ccl.kuleuven.be/CLARIN/SAL 8130_0093_inge_moris.hardsubs.mp4http://www.ccl.kuleuven.be/CLARIN/SAL 8130_0093_inge_moris.hardsubs.mp4 http://www.ccl.kuleuven.be/CLARIN/SAL 8130_0093_inge_moris_4gr.pdfhttp://www.ccl.kuleuven.be/CLARIN/SAL 8130_0093_inge_moris_4gr.pdf Raw material !!

11 Written part TTNWW Relate documents, make texts more accessible by making explicit data that are not expressed as such Paris formulated objections, London/John didn’t What is a name, what kind of name is it? Analysis of names in fiction Sagalassos project (archaeology): temporal and geospatial analysis  web service, end of 2012

12 Some more examples When is ‘now’? And where?

13 Stylometry Stylene (CLARIN-Vlaanderen) –UAntwerpen/Univ.College Gent Is text as a whole written by same person? Show development in style of a specific author Is a text clear? Is it really understandable by, say, children age 10-12?  Web service (autumn 2012)

14 ‘stylometry’ as means Is thesis X written by student or by ‘Wikipedia’  Reliability Can text X be written by a 10 year old girl  paedophily

15 Reusability of data For same kind of research For completely other kind of research Both should be encouraged time and money To be taken into account: IPR !

16 Veterans project Interviews veterans Dutch military actions (1940-2010) 1000 interviews (2.5 h), semi-structured Original: social and military historians  Who else can use this archive ?  First: reluctance

17 Veterans 2 People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..) Turned out to be a very valuable corpus! Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content

18 “Circulation of Knowledge” “Geleerdenbrievenproject” (Letters of scientists) 17 th century: Grotius (Hugo de Groot), Constantijn Huygens, Christiaan Huygens, Descartes, … 20.000 letters, mainly Dutch, French, Latin Intended for “history of science”, of course also relevant for other disciplines

19 Polish example: Sejm Polish parliament, 1918 – now Texts, records, video Goal: all kinds of linguistic research But of course: wealth of information for other disciplines as well

20 Conclusions Several ‘easy-to-use’ research possibilities are (or will soon be) available Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research) Lots of material could be used by third parties as well: do not keep stuff “in your drawer” Students and (young) researchers should be made aware of new possibilities

21

22 Sound Registers (1739-1799)

23

24

25

26

27

28

29

30

31

32

33

34

35 35


Download ppt "DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen."

Similar presentations


Ads by Google