The CLARIN INFRASTRUCTURE (NL PART) Jan Odijk IAP Event Utrecht, 2013-09-04 1.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

The CLARIN INFRASTRUCTURE Jan Odijk MA Rotation Utrecht,
Example queries for Federated search Jan Odijk CLARIN Federated Search Workshop Copenhagen, 24 Apr
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Nordic CLARIN Network 1. What have we seen and heard? A lot of different tools Corpus tools – 3 different tools Annotation tools (automatic and manual)
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
CALL 4 Kick-Off Introduction Jan Odijk Utrecht, Feb 21, 2013.
Joint Information Systems Committee Supporting Higher and Further Education Development of an Information Environment for UK Learning and Teaching NOF-Digitise.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Overview of Search Engines
CLARIN (NL PART): Current State and Near Future Jan Odijk Digital Humanities Summer School Leuven,
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN-NL Call 3 Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Jan Odijk LREC May.
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Creating Access to Europe’s Television Heritage Prof. Dr. Sonja de Leeuw (project-coordinator, Utrecht University) Johan Oomen MA (technical director,
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
CLARIN-NL Call 4 Jan Odijk CLARIN-NL Call 4 Info-session Amsterdam, 30 Aug
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
CLARIN ERIC Progress according to the Strategy Plan Steven Krauwer, Bente Maegaard 1.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities in the Netherlands Jan Odijk Utrecht 28 June 2010.
Linguistics with CLARIN Concluding Overview Jan Odijk LOT Winterschool Amsterdam,
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
SUMMON ® 2.0 DISCOVERY REINVENTED. What is Summon 2.0? A new, streamlined, modern interface New and enhanced features providing layers of contextual guidance.
DigiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen.
Populating the infrastructure the case of the Netherlands Hans Bennis executive board of CLARIN-NL Meertens Institute (KNAW) CLARIN COORDINATORS BUDAPEST,
Common Lab Research Infrastructure for the Arts and Humanities CLARIAH Jan Odijk EuroRisNet+ Workshop, Lisbon,
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
Transcripts are stored in a relational database Transcripts are divided up to their smallest constituent (words), while the context is preserved, in a.
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
CLARIN-NL Requirements and Desiderata Jan Odijk CLARIN-NL Call 3 Info-session Utrecht, 25 Aug 2011.
Beyond ISOcat 20 June 2013CLARIN-NL ISOcat tutorial1.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Tekstcollecties in Nederlab Hennie Brugman Meertens Instituut Workshop ‘morfosyntactisch verrijken van historische teksten’,
CLARIN Concept Registry: the new semantic registry Ineke Schuurman, Menzo Windhouwer, Oddrun Ohren, Daniel Zeman
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
1 Common Lab Research Infrastructure for the Arts and Humanities.
Margret Plank 17th International Conference on Grey Literature 1st and 2nd December 2015, Amsterdam (Netherlands) Move beyond text – How TIB manages the.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
CLARIN ERIC Franciska de Jong Oxford April 2016
Summon® 2.0 Discovery Reinvented
Jan Odijk Birmingham, Corpus and Computational Linguistic Methods and Tools beyond corpus linguistics in CLARIAH Jan Odijk Birmingham,
DIVE into the Event-Based Browsing of Linked Historical Media
Malte Dreyer – Matthias Razum
New Platform to Support Digital Humanities in the Czech Republic
Metadata supported full-text search in a web archive
Presentation transcript:

The CLARIN INFRASTRUCTURE (NL PART) Jan Odijk IAP Event Utrecht,

CLARIN-NL & CLARIN CLARIN Infrastructure (NL part) Conclusions Invitation Overview 2

CLARIN-NL National project in the Netherlands Budget: 9.01 m euro Funding by NWO (National Roadmap Large Scale Infrastructures) Coordinated by Utrecht University >33 partners (universities, royal academy institutes, independent institutes, libraries, etc.) >33 partners CLARIN-NL 3

Dutch National contribution to the Europe-wide CLARIN infrastructure Prepared by CLARIN preparatory project ( )CLARIN preparatory project – Also coordinated by Utrecht University From Feb 2012 coordinated by the CLARIN- ERIC, hosted by the Netherlands – ERIC: a legal entity at the European level specifically for research infrastructures ERIC CLARIN-NL 4

A technical research infrastructure in which a humanities researcher who works with language- related resources – Can find all data relevant for the research – Can find all tools and services relevant for the research – Can apply the tools and services to the data without any technical background or ad-hoc adaptations – Can store data and tools resulting from the research via one portal CLARIN Infrastructure 5

Can find all data Can find all tools and services Can apply the tools and services Can store data and tools resulting from the research via one portal CLARIN Infrastructure 6

Virtual Language Observatory – Faceted browsing and geographical navigation – CLARIN-prep CLARIN Metadata Search – Search & Develop MPI-PL corpus tool (CMDI-fied IMDI) MPI-PL corpus tool – Original MPI/TLA CLARIN Infrastructure ‘ Can find all data’ 7

Lexical Data – COAVA project Curated Dutch Dialect Dictionaries for Brabant and Limburg COAVA projectBrabantLimburg – Cornetto-LMF-RFD project Cornetto data in LMF and RDF format and Interface to Cornetto Cornetto-LMF-RFD projectCornetto dataInterface – DuELME project pre-CLARIN data and interface new metadata (data via the HLT-Agency) DuELME projectpre-CLARIN datainterfacemetadata CLARIN Infrastructure ‘ data’ 8

Literary Data – COBWWWEB WomenWriters database connected to other national collections in women's literature … expected in 2014 COBWWWEB – eBNM+: curated e-BNM collection of textual, codicological and historical information about thousands of Middle Dutch manuscripts kept world wide … expected in 2014 eBNM+ CLARIN Infrastructure ‘ data’ 9

Linguistically Annotated Data – Database of the Longitudinal Utrecht Collection of English Accents (D-LUCEA) curated data … expected in SeptemberD-LUCEA – 2013 DISCAN text corpus enriched with discourse Annotation and its metadata … expected in September 2013DISCANits metadata – EXILSEA project enhancements of the Corpus NGT, the world’s first open access sign language corpus, by updating the existing IMDI metadata to CLARIN- standard CMDI descriptions using bilingual ISOcat categories … expected in 2014 EXILSEA projectCorpus NGT CLARIN Infrastructure ‘ data’ 10

Linguistically Annotated Data – FESLI curated specific language impairment data … expected in October 2013 FESLI – INPOLDER curated data … expected in October 2013 INPOLDER – IPROSLA project website and metadata via the VLO (license needed for access to the data) IPROSLAwebsitemetadata – LAISEANG language documentation data … expected in December 2013 LAISEANGlanguage documentation data CLARIN Infrastructure ‘ data’ 11

Linguistically Annotated Data – MIMORE project metadata for DiDDD, Dynasand, and GTRP via Metadata Search (Use the MIMORE Search Engine to search in these data) MIMORE projectDiDDDDynasandGTRP – NEHOL project Negerhollands data (via the Virtual Language Observatory)NEHOL projectNegerhollands data – WIVU Hebrew Text Database curated by the SHEBANQ project … expected in 2014 SHEBANQ project CLARIN Infrastructure ‘ data’ 12

Linguistically Annotated Data – VALID project curated five existing, digital data sets of language pathology data collected in the Netherlands, primarily on Dutch … expected in 2014 VALID project – VU-DNC project Data and its metadata and DocumentationVU-DNCData and its metadata Documentation CLARIN Infrastructure ‘ data’ 13

Historical and Contemporary Data – Curated maritime history legacy datasets curated with the tool chain and methodology developed by the DSS project … expected in 2014DSS project – Polimedia curated multi-media data …expected in September 2013 Polimedia – Loe de Jong’s texts on the Second World War curated (by the Verrijkt Koninkrijk project) also via DANS Loe de Jong’s texts on the Second World War curatedVerrijkt Koninkrijk projectvia DANS CLARIN Infrastructure ‘ data’ 14

Religious Data – PILNAR curated Pilgrimage data …expected in October 2013 PILNAR – WIVU Hebrew Text Database curated by the SHEBANQ project … expected in 2014 SHEBANQ project Art History Data – Rembrandt Documents (RemDoc) database linked with RKD resources and a library catalogue (by the RemBench project) … expected in 2014 RemBench project CLARIN Infrastructure ‘ data’ 15

From the CLARIN-Centres – INL INL – Meertens Institute Meertens Institute – MPI-PL MPI-PL – (TST-Centrale)TST-Centrale – DANS … to follow soon – Huygens ING … to follow soon CLARIN Infrastructure ‘ Can find all data ‘ 16

From the CLARIN Data Providers – Beeld en Geluid (Netherlands Institute for Sound & Vision) Academia Collection via the VLOAcademia Collection – Koninklijke Bibliotheek (National Library) digital collections … expected by the end of 2013 – Utrecht University Library Digital Collection … expected by the end of 2013 CLARIN Infrastructure ‘ Can find all data’ 17

Curated by the Data Curation ServiceData Curation Service – IPNV Interviews with veterans (to DANS) – Dictionary ‘Gelderse’ Dialects, Rivierengebied and Veluwe (to Meertens) – Curation of organisation names for OpenSkos (for the CLAVAS project) – LESLLA Lower Education Second Language Learner Acquisition data – Dutch Bilingualism Database / TCULT … soon Dutch Bilingualism Database – 5 more dialect dictionaries … soon CLARIN Infrastructure ‘ Can find all data’ 18

Can find all data Can find all tools and services Can apply the tools and services Can store data and tools resulting from the research via one portal CLARIN Infrastructure 19

VLO – Application / Tools Application / Tools – Software Software – Services Services – Tools Tools – Web services Web services All from the CLARIN PP Tools InventoryTools Inventory None from CLARIN-NL but Metadata profile and components for software by the MD4T project … expected in October 2013 CLARIN Infrastructure ‘ Can find all tools’ 20

Can find all data Can find all tools and services Can apply the tools and services – Search in and through the data – Annotation – Processing Can store data and tools resulting from the research via one portal CLARIN Infrastructure 21

Search in and through the data: Lexical Resources – COAVA application Dialect Lexicon Browser COAVA – Cornetto-LMF-RFD project Interface to Cornetto Cornetto-LMF-RFD projectInterface – DuELME project interface (new interface hopefully soon) DuELME projectinterface – GTB (Integrated Language Bank) including the WFT-GTB Frisian dictionary in the GTB) (Dutch interface) GTBWFT-GTB CLARIN Infrastructure ‘ Can apply the tools and services’ 22

Search in and through the data: Lexical Resources – GrNe project search interface for searching in a Greek-Dutch dictionary (letter Π only), Dutch interface GrNe projectsearch interface – SignLinc subproject enhancements to LEXUS (version 3.00 and higher) and ELAN tool (version 4.00 and higher) (SignLinC website) SignLinc subprojectLEXUSELAN toolSignLinC website) CLARIN Infrastructure ‘ Can apply the tools and services’ 23

Search in and through the data: Linguistically Annotated Corpora – COAVA application CHILDES browser COAVA – Search interface (beta) to Corpus Gysseling provided by INL Search interface (beta) – FESLI Search application for search in language selective impairment acquisition data … expected in October 2013 FESLISearch application for search in language selective impairment acquisition data – GreTel (result of CLARIN Flanders in the context of the CLARIN-NL/CLARIN Flanders Cooperation) GreTel CLARIN Infrastructure ‘ Can apply the tools and services’ 24

Search in and through the data: Linguistically Annotated Corpora – Mimore search engine through 3 Dutch dialect databases and a presentation of a demonstration scenario Mimore searchpresentation of a demonstration scenario – OpenSoNaR tool for exploring the SoNaR-500 reference corpus … expected in 2014 OpenSoNaR – SHEBANQ web application demonstrator that enables researchers to perform linguistic queries on the curated WIVU web resource and preserve significant results as annotations to this resource … expected in 2014 SHEBANQ CLARIN Infrastructure ‘ Can apply the tools and services’ 25

Search in and through the data: Linguistically Annotated Corpora – SignLinc subproject enhancements to LEXUS (version 3.00 and higher) and ELAN tool (version 4.00 and higher) (SignLinC website) SignLinc subprojectLEXUSELAN toolSignLinC website) TDS-Curator project Access to the Typological Database System (TDS) TDS-CuratorTypological Database System CLARIN Infrastructure ‘ Can apply the tools and services’ 26

Search in and through the data: Literary Data – Arthurian Fiction website, metadata via the VLO and web application (ArthurianFiction subproject)websitemetadataweb applicationArthurianFiction subproject – C-DSD project Liederenbank (Song Database) metadata via the VLO or via a direct page C-DSD project metadatadirect page – COBWWWEB scholar application for research on the WomenWriters Database and connected databases … expected in 2014 COBWWWEB CLARIN Infrastructure ‘ Can apply the tools and services’ 27

Search in and through the data: Literary Data – eBNM+ web application for consultation, using facetted search, and collaborative editing…expected in 2014 eBNM+ – EMIT-X...to appear soon EMIT-X – Namescape project web page, (Dutch) search interface, Barcode browser, Visualiser, and Sandbox Namescape projectweb pagesearch interfaceBarcode browserVisualiser Sandbox CLARIN Infrastructure ‘ Can apply the tools and services’ 28

Search in and through the data: Historical and Contemporary Resources – BILAND multilingual application for search and discourse analysis in historical text corpora … expected in September 2013 BILAND – CKCC (Geleerdenbrieven) project ePistolarium (partially funded by CLARIN-NL) CKCC (Geleerdenbrieven) projectePistolarium – Search via the Oral History Annotation Tool [special license required] and its documentation in a collection of 250 interviews from the interview project Nederlandse Veteranen (Dutch Veterans) (INTER-VIEWs subproject)Oral History Annotation Toolspecial license requireddocumentationINTER-VIEWs subproject – Nederlab CLARIN demonstrator … expected in 2014 Nederlab CLARIN Infrastructure ‘ Can apply the tools and services’ 29

Search in and through the data: Historical and Contemporary Resources – Polimedia project application for cross-media analysis Polimediaapplication for cross-media analysis – Quamerdes application for quantitative content analysis of television and printed media …expected in 2014 Quamerdes – Search application of the Verrijkt Koninkrijk project (see also here) in Loe de Jong’s work on the Second World War Search applicationVerrijkt Koninkrijk projecthere – WAHSP project Search Engine for historical sentiment mining in public media and Documentation WAHSPSearch EngineDocumentation – WIP project Search Engine for search in the Dutch parliamentary proceedings. WIPSearch Engine CLARIN Infrastructure ‘ Can apply the tools and services’ 30

Search in and through the data: Other Data – MIGMAP project Dutch Interface or English Interface for migration analysis and web service plus documentation MIGMAP projectDutch InterfaceEnglish Interfaceweb service plus documentation – PILNAR Search application for search in Pilgrimage data … expected in October 2013 PILNAR CLARIN Infrastructure ‘ Can apply the tools and services’ 31

Can find all data Can find all tools and services Can apply the tools and services – Search in and through the data – Annotation – Processing Can store data and tools resulting from the research via one portal CLARIN Infrastructure 32

Annotation & Related Tools – AAM-LR CLAM Webservice supporting annotation of audio-files AAM-LR Webservice – Extensions of the ELAN and ANNEX applications for the annotation and display of time-based resources by the ColTime project … expected in 2014ColTime project – eBNM+ web application for consultation, using facetted search, and collaborative editing…expected in 2014 eBNM+ – eLaborate extended and made CLARIN-compatible by the Huygens Institute plus documentation … expected by the end of 2013 eLaboratedocumentation CLARIN Infrastructure ‘ Can apply the tools and services’ 33

Annotation & Related Tools – EXILSEA project enhancements of ELAN and ANNEX with the multilingual features of ISOCAT … expected in 2014 EXILSEA project – Oral History Annotation Tool [special license required] and its documentation for annotation of a collection of 250 interviews from the interview project Nederlandse Veteranen (Dutch Veterans)(INTER-VIEWs subproject) Oral History Annotation Toolspecial license requireddocumentationINTER-VIEWs subproject CLARIN Infrastructure ‘ Can apply the tools and services’ 34

Annotation & Related Tools – Multicon enhancements for multimodal collocations in new versions of the ELAN and ANNEX tools together with a screencast explaining the new functionality MulticonELAN ANNEXscreencast – SignLinc subproject enhancements to LEXUS (version 3.00 and higher) and ELAN tool (version 4.00 and higher) (SignLinC website) SignLinc subprojectLEXUSELAN toolSignLinC website) – Transcription Quality Evaluation (TQE) Tool and its CMDI metadata made by the TQE subproject Transcription Quality Evaluation (TQE) ToolmetadataTQE subproject CLARIN Infrastructure ‘ Can apply the tools and services’ 35

Can find all data Can find all tools and services Can apply the tools and services – Search in and through the data – Annotation – Processing Can store data and tools resulting from the research via one portal CLARIN Infrastructure 36

Processing Data open source, web-based, user-friendly workflow from textual digital images to TEI…expected in – Adelheid project website, web service for PoS-tagging, tokenizer, lexicon and editor/visualiser Adelheid projectwebsiteweb service tokenizerlexiconeditor/visualiser – Gabmap website for analysis of dialect variation and introduction video (by the ADEPT subproject)website introduction videoADEPT – DSS tool chain and methodology for converting legacy datasets in the area of maritime history … expected in 2014 DSS CLARIN Infrastructure ‘ Can apply the tools and services’ 37

Processing Data – INPOLDER project parsing application for Historical Dutch (also includes a workflow in which it is combined with the Adelheid Tagger) INPOLDER projectparsing application – Namescape project Named Entity Tagger Namescape projectNamed Entity Tagger – TICCLops project application and demonstrator for orthographic normalisation TICCLops projectapplication and demonstrator CLARIN Infrastructure ‘ Can apply the tools and services’ 38

Processing Data – TTNWW workflow system (result of CLARIN-NL / CLARIN Flanders Cooperation) TTNWW workflow system Spelling normalisation Part of Speech-tagging Parsing Named Entity Recognition Semantic Role Assignment Assignment of co-referential relations Transcription of speech files CLARIN Infrastructure ‘ Can apply the tools and services’ 39

Can find all data Can find all tools and services Can apply the tools and services – Search in and through the data – Annotation – Processing Can store data and tools resulting from the research via one portal CLARIN Infrastructure 40

Profiles, Components and Tools for Creating Metadata – Introduction to Component Metadata (CMDI) Introduction – ARBIL Metadata Editor enhanced by the Metadata Project ARBIL Metadata Editor – CMDI Component Registry (including Metadata Component and Profile Editor) and Documentation with profiles and components from the Metadata project CMDI Component RegistryDocumentation – Metadata profile and components for software by the MD4T project … expected in October 2013 CLARIN Infrastructure ‘Can store the data & tools’ 41

Ensuring formal and semantic interoperability – CLARIN standards and best practices CLARIN standards and best practices – ISOCAT ISOCAT Web interface Web Services Manuals, help, and tutorials – RELCAT alpha version RELCAT alpha version – SCHEMACAT alpha version (CGN) SCHEMACAT alpha version – CLAVAS Vocabulary Service ….expected in September 2013 CLARIN Infrastructure ‘Can store the data & tools’ 42

LAMUS (the Language Archive) and its documentation online or as PDF LAMUS onlinePDF EASY (DANS) and its Help and Support Page EASYHelp and Support Page CLARIN Infrastructure ‘Can store the data & tools’ 43

Can find all data Can find all tools and services Can apply the tools and services – Search in and through the data – Annotation – Processing Can store data and tools resulting from the research via one portal CLARIN Infrastructure 44

Portal is under construction (CLAPOP project) This page is a brief overview of what CLARIN-NL has produced, ordered as in this presentation This page CLARIN INFRASTRUCTURE ‘via one portal’ 45

CLARIN is starting to provide the data, facilities and services to carry out humanities research supported by large amounts of data and tools With easy interfaces and easy search options (no technical background needed) Still some training is required, to understand both the possibilities and the limitations of the data and the tools – Educational modules are being developed for selected functionality – coordinated by Gerrit Bloothooft & David Onland (UU) Conclusions 46

But there is still a lot to do – Not all data (even some crucial data) are visible via the VLO or via Metadata Search – Very few tools and web services are currently visible via the VLO – Many tools are still prototypes or first versions – There are good search facilities for some individual resources but not for all – The search facilities so far are aimed at a single resource, or a small group of closely related resources. – Federated content search, which enables one to search with one query in multiple, quite diverse, resources, is still being worked on but difficult Actual use of the facilities leads to suggestions for improvementssuggestions for improvements And to suggestions for new functionality Conclusions 47

Use (elements from) the CLARIN infrastructure (Questions? Problems? CLARIN-NL Helpdesk!)CLARIN-NL Helpdesk Join user groups of specific services Provide feedback so that we can further improve CLARIN So that you can improve your research Invitation 48

Thanks for your attention! 49

DO NOT ENTER HERE 50

Actual use of the search facilities leads to suggestions for improvements, e.g. – Selection of inflection (extended PoS) in GreTel was originally not possible (and is still not possible) for LASSY-Small but has been added for search in CGN – In the Dutch CGN/SONAR (de facto standard ) PoS tagging system one cannot easily express ‘definite determiner’ (only as a complex regular expression over PoS tags): a special facility for this is required – The Dutch CGN/SONAR (de facto standard ) Pos tagging system uses, for adjectives, the ø-form tag for cases where the distinction between e-form and ø-form is neutralized. This is not incorrect but a facility to distinguish the two would be very desirable (and this is possible by making use of the CGN lexicon and/or the CELEX lexicon – Idem for adjectives that have an e-form identical to a ø-form because of phonological reasons (adjectives ending in two syllables headed by schwa) – Zero-inflection in MIMORE is represented by absence of an inflection tag. That makes search for such examples very difficult and requires either a NOT-operator (which is not there) or explicit tagging of absence of inflection Improvement Suggestions 51