DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen.

Slides:



Advertisements
Similar presentations
ESDS Qualidata. Qualitative Data Collections Data from National Research Council (ESRC) individual research grant awards Data from ESRC Programme research.
Advertisements

ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
HSE Moscow, Sociological Faculty Master Program Complex Social Analysis Course Environmental sociology 2013/2014 (Elective Course) Teacher: Karl Bruckmeier.
The CLARIN INFRASTRUCTURE Jan Odijk MA Rotation Utrecht,
Understanding American Citizenship
WHAT IS ANTHROPOLOGY? The term originates from two words in Greek:
WEBQUEST Let’s Begin TITLE AUTHOR:. Let’s continue Return Home Introduction Task Process Conclusion Evaluation Teacher Page Credits Introduction This.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Qualitative Data Preparation and Use Jack Kneeshaw ESDS Psychology Department-U of Essex 4 December 2003.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty.
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
November 15, 2008Seventh International Symposium on English Teaching ETA-ROC How Useful are Podcasts in Reinforcing Content Learning? Johanna Katchen (
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Ancestry.com What there is on the site What is pay to view What is free.
Teaching with Primary Sources “PK Yonge classroom scene in Elementary School” – Gainesville, FL - from the University of Florida Digital Collections.
Grace Johnson EDUT Dewey at the age of 21 created the Dewey Decimal System at Amherst College which has been used to revolutionize the field of.
History Resource Center: World. Gale Digital Collections  History Resource Center: World provides a full range of sources for research: Over 22,000 reference.
CLARIN-NL: Dealing with ISOcat Ineke Schuurman. ISOcat and CLARIN Projects call 1 CLARIN-NL Joint Flemish/Dutch pilot Whenever relevant, elements are.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Nederlab Laboratory for research on the patterns of change in the Dutch language and culture E-Humanities Group Research Meeting, May 16 th, 2013 Meertens.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Historical linguistics Historical linguistics (also called diachronic linguistics) is the study of language change. Diachronic: The study of linguistic.
Historical Inquiry To begin the narration wiggle mouse over Mount Rushmore.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities in the Netherlands Jan Odijk Utrecht 28 June 2010.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Task Force on Digital Solutions Working Group on Services for Conferences and Publishing June 2012 New York.
SOC: 531 Community Organization Fall 2010 Hogan. About SOC 531 This course has been taught for years –by Profesor Emeritus Harold Potter –who is still.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
P  We do exegesis every day.  It is the process of understanding what we hear or read.  Exegesis is about communication and understanding :
The Great Vowel Shift Continued The reasons behind this shift are something of a mystery, and linguists have been unable to account for why it took place.
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST,
1 Language Documentation in West Africa July Winneba, Ghana David Nathan & Sophie Salffner Endangered Languages Archive Hans Rausing Endangered.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
Mabel Ortiz N.. Discourse analysis 1. What is discourse? It is written or spoken _______. A. Words B. Sentences C. Paragraphs D. Communication What is.
Descartes’ Correspondence in Digital Format Circulation of Knowledge Project (CKCC) -Huygens ING, KNAW -Descartes Centre, UU -DANS, KNAW -KB.
Begin $100 $200 $300 $400 $500 CategorytwoCategorythreeCategoryfourCategoryfiveCategorysixCategoryone.
Technology in Social Studies Instruction İrem Sak.
Neo-Latin Colloquia From Incunabula to Podcasts by William du Cassé and Ross Scaife.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
WHAT IS ANTHROPOLOGY? The term originates from two words in Greek: (1) anthropos meaning “man” as in “human being” (1) anthropos meaning “man” as in “human.
ICT in Classroom Prepared by: Ymer LEKSI Kukes
WHAT IS ANTHROPOLOGY? The term originates from two words in Greek: (1) anthropos meaning “man” as in “human being” (2) logos meaning “study”.
The science in the human sciences. Historians us the scientific method in there aproach There are 5 steps 1) Ask a question 2) Form an hypothesis (a possible.
Chapter 4 Accessing Primary Sources to Enhance Critical Thinking Dorothy Galanaugh Spencer Homan-Hepner Brittany Rimes Matt Samsel.
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
1 ENGLISH MANUSCRIPTS U210A/B1/Ch 2. 2 ENGLISH MANUSCRIPTS Introduction:  Focus: the historical dimensions of the linguistic forms of English.  The.
INTRODUCTION TO APPLIED LINGUISTICS
In touch with our cultural heritage How museums, historic sites, libraries and archives can support the Welsh Baccalaureate.
Speech data in Swedish national archives and government agencies Jens Edlund, KTH Royal Institute of Technology Dept. of Speech, Music and Hearing.
Maya Sharsheeva, reference-librarian AUCA Library Effective information search in the Library e-Resources.
CLARIN ERIC Franciska de Jong Oxford April 2016
CLARIN - Flanders Activities and Achievements Frank Van Eynde Center for Computational Linguistics (KU Leuven) Digital Humanities Spring Event, April.
AVID Ms. Richardson.
Language Translation Services –Wordpar.com
Transcription Workshop for HIST 499
Types of Oral History Interviews
Modern and Medieval Languages at Cambridge
PCI Training May 2003 Charlotte Longhurst, Training Manager
Transcription Workshop HIST 499
Presentation transcript:

digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen

Digital Humanities Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents

3 Treebanks Available for most ‘modern’ languages But also possible for ‘dead’ languages like Latin, Ancient Greek etinvolved.html Index Thomisticus Treebank, Milano Full query language needed

More treebanks Medieval Portuguese treebank –Under construction In the near future: INPOLDER (CLARIN NL) A parser, not yet a corpus, BUT:  through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned –Uncorrected, but manual correction is possible

5 Visualization Gabmap: doing dialect analysis on the web ADEPT-project (CLARIN-NL) Dialects (examples Netherlands/Flanders + USA) including tutorial, manual, video, FAQ, …

6 Pronunciation distance Gabmap: doing dialect analysis on the web

7 Dendrogram Gabmap: doing dialect analysis on the web

Audio CLARIN pilot (NL/FL) TTNWW, audio part TAAL2SPRAAK (CLARIN-Vlaanderen) Audio as a means to enlarge accessibility of larger collections of data (tapes)  Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time (useful for psychology, sociology, history)

Audio and older texts Digitization of old texts still problematic (cf DigiHIST) Experiment: Read medieval text aloud and have it automatically transcribed (not trained, modern language model used)

Audio Leuvense Schepenbank _0093_inge_moris.hardsubs.mp4http:// 8130_0093_inge_moris.hardsubs.mp _0093_inge_moris_4gr.pdfhttp:// 8130_0093_inge_moris_4gr.pdf Raw material !!

Written part TTNWW Relate documents, make texts more accessible by making explicit data that are not expressed as such Paris formulated objections, London/John didn’t What is a name, what kind of name is it? Analysis of names in fiction Sagalassos project (archaeology): temporal and geospatial analysis  web service, end of 2012

Some more examples When is ‘now’? And where?

Stylometry Stylene (CLARIN-Vlaanderen) –UAntwerpen/Univ.College Gent Is text as a whole written by same person? Show development in style of a specific author Is a text clear? Is it really understandable by, say, children age 10-12?  Web service (autumn 2012)

‘stylometry’ as means Is thesis X written by student or by ‘Wikipedia’  Reliability Can text X be written by a 10 year old girl  paedophily

Reusability of data For same kind of research For completely other kind of research Both should be encouraged time and money To be taken into account: IPR !

Veterans project Interviews veterans Dutch military actions ( ) 1000 interviews (2.5 h), semi-structured Original: social and military historians  Who else can use this archive ?  First: reluctance

Veterans 2 People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..) Turned out to be a very valuable corpus! Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content

“Circulation of Knowledge” “Geleerdenbrievenproject” (Letters of scientists) 17 th century: Grotius (Hugo de Groot), Constantijn Huygens, Christiaan Huygens, Descartes, … letters, mainly Dutch, French, Latin Intended for “history of science”, of course also relevant for other disciplines

Polish example: Sejm Polish parliament, 1918 – now Texts, records, video Goal: all kinds of linguistic research But of course: wealth of information for other disciplines as well

Conclusions Several ‘easy-to-use’ research possibilities are (or will soon be) available Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research) Lots of material could be used by third parties as well: do not keep stuff “in your drawer” Students and (young) researchers should be made aware of new possibilities

Sound Registers ( )

35