Nederlab Laboratory for research on the patterns of change in the Dutch language and culture E-Humanities Group Research Meeting, May 16 th, 2013 Meertens.

Slides:



Advertisements
Similar presentations
ICT PSP Infoday Luxembourg Call 2011 – 2.4 eLearning ICT-PSP Call Objective eLearning Marc Röder Infso E6/eContent and Safer Internet Luxembourg,
Advertisements

Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
“Sociology is the objective study of human behaviour in so far as it is affected by the fact people live in groups”: Sugarman (“Sociology”, 1968) “Sociology.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
2nd Workshop Prague 29/11/2007 WP4: Pilot action plan Region of Central Macedonia Isidoros Passas, URENIO Research Unit.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
Maria Teresa Natale Giza, 4 April 2006 Quality web communication according to MINERVA Maria Teresa Natale Ministerial NEtwoRk for Valorising Activities.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Building Digital Museums, Libraries and Archives David Dawson Senior Policy Adviser (Digital Futures)
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
© Tefko Saracevic, Rutgers University1 DIGITAL LIBRARIES 17:610:553 Tefko Saracevic Michael Lesk
An innovative platform to allow translation and indexing of internet sites Localization World
PRINT ON DEMAND (BURN ON DEMAND) AND ON LINE PUBLISHING …“At its simplest, print on-demand publishing means that whenever a book is demanded (ordered,
‘Approaches to programme planning and budgeting’ Experience of Regional Centre for the Safeguarding of Intangible Cultural Heritage in South-Eastern Europe.
Curriculum Mapping Project
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.
15/11/2011EVA Minerva Jerusalem1 Linked Heritage : Coordination of standards and technologies for the enrichment of Europeana Marie-Véronique Leroi Ministry.
COINE Cultural Objects in Networked Environments.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
UNDERSTANDING SOCIOLOGY
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012.
DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen.
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST,
Datasets of the KB Steven Claeyssens – 19 September 2013.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
CLARIN work packages. Conference Place yyyy-mm-dd
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
< BackNext >PreviewMain Chapter 2 Data in Science Preview Section 1 Tools and Models in ScienceTools and Models in Science Section 2 Organizing Your DataOrganizing.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Nadir Hajiyani NADIR HAJIYANI CSC 253 OCFA. Agenda What Who Specification Architecture - How Snapshots Help Open Source Disadvantages Advantages References.
Tekstcollecties in Nederlab Hennie Brugman Meertens Instituut Workshop ‘morfosyntactisch verrijken van historische teksten’,
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
ESPON Workshop at the Open Days 2012 “Creating Results informed by Territorial Evidence” Brussels, 10 October 2012 Introduction to ESPON Piera Petruzzi,
European strategies for digitisation: the context of i2010 digital libraries Pat Manson Head of Unit Cultural Heritage and Technology Enhanced Learning.
Working with your archive organization: Broadening your user community Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for.
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
UNESCO Public Library Manifesto The public library UNESCO and public libraries The United Nations Educational, Scientific and Cultural Organization.
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
Research Progress Kieu Que Anh School of Knowledge, JAIST.
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
 The web is referred to as a “massive collection of web pages stored on millions of computers across the world that are linked by the Internet” (Chowdhury,
Lecture 12 Teaching L2 Reading Luo Ling
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
Exploring Europe’s Television Heritage in the Digital Age
Libraries as Data-Centers for the Arts and Humanities
Web Information retrieval
Darja Fišer CLARIN ERIC Director of User Involvement
Introduction of KNS55 Platform
Common Solutions to Common Problems
*International Trends
DELNET – Developing Library Network
Exploring and archiving Herbarium images
Note Cards Online Using NoodleBib
Presentation transcript:

Nederlab Laboratory for research on the patterns of change in the Dutch language and culture E-Humanities Group Research Meeting, May 16 th, 2013 Meertens Institute, Amsterdam

A bit of history The CLARIN EU project ( ) intended to provide an answer to the digital challenge set out by the EU: – How to bring together large amounts of data from all over Europe along with the necessary tools to process them? This was followed by a number of national CLARIN projects (CLARIN-NL, D-SPIN…) tackling these challenges at a national level

A bit of history (cont) The CatchPlus project valorizes scientific research results to usable tools and services for the entire Dutch heritage sector. – This software leads to better disclosure and larger accessibility of collections from heritage institutions.

A bit of history (cont) It brought us: – PID services – ‘concept’ registries – Flexible metadata formats (CMDI) – Standard publication protocols (OAI-PMH) – Web authentication methods (SAML 2) – And a lot of tools and data sets at the national levels (Anyone remember the CLARIN-NL call 1-4 projects?)

Scenario characterized mainly by accidental and temporary interactions Scenario where dedicated services centres of new type interact in a stable way and give persistent and easy-to-use services to the community. Researchers must be able to rely on the services offered CLARIN center network

Source: Riding the Wave How Europe can gain from the rising tide of scientific data Report of the High Level Expert Group on Scientific Data

Arguments for Nederlab Bridge the gap between community support services and user community/data providers 7 points towards digitization criticism NRC handelsblad (science section) September 10 and 11, 2011 Digitisation of older texts is going wrong A lot of money is wasted

7 points 1. All the money for digitisation has to come from a single fund; the funding body is to impose requirements to the quality 2. Funds are only provided if both the digitisation and the metadata meet the (international) standards. This is the only way that sub collections can eventually be combined. 3. Linking money and quality. Text quality varies greatly, from corrected OCR to messy, uncorrected OCR. 4. Scientists, researchers and other users have to be more closely involved with the development of large websites. Better cooperation with users. 5. Central register which shows what has already been digitised, as much work is unnecessarily repeated. Money is only offered to those institutions who first investigate what has already been done. 6. Central register has to be accessible to the public. This way, people can donate books which they would otherwise throw away, and which now can be cut up. This saves a lot of time when digitising. 7. A national plan should be drawn up to professionally digitise the most important sources within 10 years, at the lowest possible cost.

Hypothesis The hypothesis is that changes in language and culture – both of which express human cognition – are related to each other and that they are based on identical or comparable regularities. By means of Nederlab we want to uncover these regularities. Research into those regularities will show which parts of the Dutch language and culture are subject to change, and which remain constant.

Hypothesis Nature versus nurture debate

Some research questions Detecting new concepts, words and combinations of words. Concept history: What is meant by ‘burgerschap’ (‘citizenship’)? Systematically mapping linguistic changes; for example deflexion. Determining patterns and motives; How are the nobility, the clergy, etc., described, and with which motives are these ‘groups’ associated?

Some research questions Detecting similarities in texts: Who is citing who? When were terse phrases, idioms and expressions coined and how were they taken over by authors and by different text genres? What was the first text genre in which a certain metaphor was used for the first time? Author recognition. Who was the author of a certain text?

(Some) Challenges for Nederlab 1.Usability 2.Handling large amounts of data from various sources and varying quality 1.Handle editorial process 2.Dealing with diachronic (processing) issues 3.Integrating technologies from different technology providers 4.Integrating technologies that contribute towards answering research questions 5.Identify gaps.

De Gids DBNL has mass digitized all volumes of ‘de Gids.’ Not only have their contents are accessible now, but also the contributions by individual authors. – How did the number of contributions by female authors progress over the years ? – How did the average age vary over the years ? – Where do the authors come from? – The percentage of poetry/prose over the years ? – What are the ‘new’ words occurring over the years ? – Which frequently used terms are used over the years ? How do these change – Which words are used in one period, but not in another ?

Dutch language innovations The second research pilot concerns the hypothesis that in the 19th century innovations in the Dutch language started in Dutch overseas: in Indonesia, Surinam, and the Dutch Antilles. This hypothesis is supported by the fact that in this periode for the first time relatively large contingents of bilingual speakers were living in the Dutch colonies, which is an important condition for language innovation. The hypothesis will be tested (by Sjef Barbiers and Nicoline van der Sijs) by comparing texts printed in the Netherlands and overseas.

Extract articles KB didl KB Alto Nederlab metadata index (SOLR) Convert to Folia Folia XML Postagging (Frog) + cleanup (TICCL) N-gram generatie ( N-grams n-gram indices (SOLR) Blacklab POS indices (Lucene) Index metadata Index n-grams Index POS tags DBNL Metadata Extract articles Covnert to Folia DBNL data

Thank you