A Unified Structure for Dutch Dialect Dictionary Data Folkert de Vriend 1, Lou Boves 1,2, Henk van den Heuvel 1, Roeland van Hout 2, Joep Kruijsen 2, Jos.

Slides:



Advertisements
Similar presentations
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Advertisements

United Nations Spatial Data Infrastructure Dr Kristin Stock Social Change Online and Centre for Geospatial Science, University of Nottingham.
INTER-VIEWs Curation of Interview Data 1 feb. – 1 nov CLST, Nijmegen,, Henk van den Heuvel Centre for.
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
D-Square Digital Databases and Digital Tools for WBD and WLD Folkert de Vriend Digital Databases and Digital Tools for WBD and WLD Folkert de.
XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp
A platform of for knowledge and services sharing Fernando Ferri IRPPS-CNR.
D-square (D-kwadraat) Digital Databases and Tools for Dutch Dialect Dictionaries Jos Swanenberg, Folkert de Vriend & Roeland van Hout.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
KOS and the Conduct of Science© Straits Knowledge 2011 Knowledge Organisation Systems as Enablers to the Conduct of Science Patrick Lambe.
ENeL: European Network of e-Lexicography COST Action IS1305.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Hibernate 1. Introduction ORM goal: Take advantage of the things SQL databases do well, without leaving the Java language of objects and classes. ORM.
Joint Information Systems Committee Supporting Higher and Further Education Development of an Information Environment for UK Learning and Teaching NOF-Digitise.
Royal Netherlands Academy of Arts and Sciences NARCIS: The Gateway to Dutch Scientific Information Elly Dijk, Chris Baars, Arjan Hogenaar and Marga van.
Methodologies for improving the g2p conversion of Dutch names Henk van den Heuvel, Nanneke Konings (CLST, Radboud Universiteit Nijmegen) Jean-Pierre Martens.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
Royal Netherlands Academy of Arts and Sciences NARCIS Integration CRIS, OA publications and Web Crawling Marga van Meel euroCRIS Members Meeting Lisbon.
Has EO found its customers? 1 Space Applications Institute Directorate General Joint Research Centre European Commission Ispra (VA), Italy
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
ISIS and XML an introduction by E. de Smet, Univ. of Antwerp.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
Chapter 4 System Models A description of the various models that can be used to specify software systems.
System models Abstract descriptions of systems whose requirements are being analysed Abstract descriptions of systems whose requirements are being analysed.
Learning Technology Interoperability Standards Niall Sclater, and Lorna M. Campbell,
TENCompetence: The European Network for Competence Development Chris Kew CETIS April
AKM 2.0 / ALM 2.0 Radionice AKM11, Pore č. SLAINTE image of the moment Sharing infrastructure with Flickr and Google Maps.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Exploring XML-based Technologies and Procedures for Quality Evaluation from a Real-life Case Perspective Folkert de Vriend 1 & Giulio Maltese 2 1 Speech.
The DNER - a national digital library Andy Powell ZIG Meeting, York October 2001 UKOLN, University of Bath UKOLN is funded by Resource:
Research Information in The Netherlands Marc Dupuis, eResearch Programme Manager, SURFfoundation EUROCRIS, 12 September 2011, Brussels.
CartoMundi Valorization of Cartographic Heritage.
Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.
Royal Netherlands Academy of Arts and Sciences NARCIS, Integrating CRIS, OAI and Web Crawling Elly Dijk, Arjan Hogenaar and Marga van Meel Department of.
Chapter 7 System models.
Slide 1 System models. Slide 2 Objectives l To explain why the context of a system should be modelled as part of the RE process l To describe behavioural.
System models l Abstract descriptions of systems whose requirements are being analysed.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
1 Automatic Classification of Bookmarked Web Pages Individual APT Presentation January 2007.
Co-funded by European Commission eContentplus Skill and competence based search in OpenScout Dr. Wolfgang Greller, Centre for Learning Sciences and Technologies.
LEXUS a flexible web based lexicon tool LEXUS a flexible web based lexicon tool, august 21 th, 2005 Marc Kemps-Snijders Peter Wittenburg
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
D-square Digital Databases and Tools for Dutch Regional Dictionaries Folkert de Vriend - Methods XII, Moncton, Canada,
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Visualization Four groups Design pattern for information visualization
A centre of expertise in digital information management Content Packaging for Complex Objects Technical Workshop: Introduction.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
CRIS and repositories: NARCIS Elly Dijk KNAW Research Information EuroCRIS meeting, Moscow (Rusland), 9 October 2008.
Copyright © 2007, Oracle. All rights reserved. Using Document Management and Collaboration Appendix B.
 To explain why the context of a system should be modelled as part of the RE process  To describe behavioural modelling, data modelling and object modelling.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Semantic and geographic information system for MCDA: review and user interface building Christophe PAOLI*, Pascal OBERTI**, Marie-Laure NIVET* University.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Learning Technology Interoperability Standards Lorna M. Campbell and Boon Low CETIS and the University of Strathclyde LMC, SURF Presentation, April 2002.
Project: Improving accessibility of digitally created archives
Al-Sulaiti Online Museum
Markup of Educational Content
Basics of Drupal for Researchers
Application of Dublin Core and XML/RDF standards in the KIKERES
Data base management system dbms
Presentation transcript:

A Unified Structure for Dutch Dialect Dictionary Data Folkert de Vriend 1, Lou Boves 1,2, Henk van den Heuvel 1, Roeland van Hout 2, Joep Kruijsen 2, Jos Swanenberg 2 1 Centre for Language and Speech Technology (CLST) 2 Department of Linguistics Radboud University Nijmegen, The Netherlands The dialect vocabulary of the Netherlands and Flanders is recorded and researched in several Dutch and Belgian research institutes and universities. Most dictionary creation and research projects collaborate in the “Permanent Overlegorgaan Regionale Woordenboeken” (ReWo). In the project Digital databases and digital tools for WBD and WLD (D-Square) the dialect data published by two of these dictionary projects (Woordenboek van de Brabantse Dialecten (WBD) and Woordenboek van de Limburgse Dialecten (WBL) is being digitised. In addition, the D- square project aims to develop an infrastructure for electronic access to all dialect dictionaries collaborating in the ReWo. Eventually, this infrastructure will enable unified access to dialect geographic data for the complete Dutch language area through one interface and one set of research tools as if it were one homogeneous data collection. Introduction The dialect data reconsidered 1 All dialect dictionary projects in the ReWo use the same core data types, viz. form, sense and location. The most striking difference between the projects is the organisation of their data which is either form-based or sense-based. Form-based Sense-based The dialect data reconsidered 2 However, the nature of the data does not have an intrinsic “sense over form” or a “form over sense” hierarchy. Instead, the relation between the core data types is heterarchical: Heterarchical relation The core data types can be further Classifications classified in higher order structures. Advantages: Optimal flexibility in working with the data. Possibility of treating all data from the various dictionaries as one huge data set. Differences in the more precise nature of each of the data types can be specified by the classifications. Implementation issues Encoding of the data For the core data types, a relational database will be used. For the classifications, XML will be used. Standardisation With the use of LEXUS we will adhere to: The Data Category Registry Lexical Markup Framework One interface for unified data: Google Earth: Concluding remarks The D-Square project lasts until the summer of The unified structure as described will then have been implemented for WBD and WLD. The project is partly funded by Netherlands Organisation for Scientific Research (NWO). The project website is: Unifying the different classifications: Difficulties and solutions Sense Dictionaries use different taxonomies or no taxonomy at all. Use taxonomy already present in WBD and WLD. Senses from other dictionaries can be mapped onto this taxonomy. New senses can be added. Senseless forms (words with only a grammatical function) will be mapped to a separate branch of the taxonomy. Form Form classifications are based on a number of different linguistic criteria. It used to be up to the intuition of the editor what criteria prevailed. This results in the same form possibly being classified differently in different dictionaries. Expert users should be able to choose any of the possible classification mergers or no merger at all. General public is presented with one kind of merger by default. Location Place name ambiguity is introduced when merging location classifications. Either a geopolitical taxonomy covering all locations is introduced, or all locations are converted to a geocoding system that can be used for uniquely encoding geographical locations world wide: longitude and latitude.