Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure
Dutch HLT Platform: aim to contribute to the further development of an adequate language and speech technology infrastructure for Dutch
Dutch Language Union (NTU) Intergovernmental organisation based on Language Union Treaty between the Netherlands and Belgium Mission: fostering integration between the Netherlands and Flanders in the field of language and literature Policy: Dutch and Flemish ministers of culture and education NTU already active in HLT: Spoken Dutch Corpus: NTU owner, responsible for exploitation NL-Translex: NTU coordinator, owner
Other participants the Ministry of the Flemish Community the Flemish Institute for the Promotion of Scientific-technological Research in Industry the Fund for Scientific Research – Flanders the Dutch Ministry of Education, Culture and Sciences the Dutch Ministry of Economic Affairs the Netherlands Organisation for Scientific Research Senter (an agency of the Dutch Ministry of Economic Affairs)
Objectives strengthening the position of Dutch in HLT establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding stimulating co-operation between academia and industry in the field of HLT contributing to the realisation of European co- operation in HLT-relevant areas establishing a network that brings together demand and supply of knowledge, products and services
Action line A ‘broking and linking’ function encouraging co-operation between industry, academia and policy institutions raise awareness and give publicity to the results of HLT research
Action line B strengthening the digital language infrastructure defining the so-called BLARK (Basic LAnguage Resources Kit) for Dutch carrying out a survey to determine what is needed to complete this BLARK and what costs are associated with the development of the material needed drawing up a priority list with cost estimates which can serve as a policy guideline
Action line C working out standards and evaluation criteria drawing up a set of standards and criteria for the evaluation of the basic materials contained in the BLARK and for the assessment of project results.
Action line D management, maintenance and distribution plan defining a blueprint for management (including intellectual property rights), maintenance, and distribution of HLT resources
Action lines B and C Steering committee to draw up plan of activities to develop initial survey framework to define BLARK to supervise survey Field researchers to refine framework to conduct survey to write report
Survey instruments Applications: classes of applications rather than specific applications or products. Modules (or semi-products): the basic software components of HLT applications. Data: sets of language data and descriptions in machine readable form, to be used in building, improving or evaluating natural language and speech processing systems.
BLARK language technology Modules –Robust modular text preprocessing –Morphological analysis and morphosyntactic disambiguation / unknown words –Robust syntactic analysis –Aspects of semantic analysis (word meaning and reference) Data –Monolingual lexicon –Annotated corpus of written Dutch –Benchmarks for evaluation
BLARK speech technology Modules –Automatic speech recognition (module) –Speech synthesis system (module) –Tools for annotation of speech corpora –Confidence measures and utterance verification –Identification (speaker, language, dialect) –Evaluation of speech technology tools and applications Data –Monolingual speech corpora for specific applications –Multilingual speech corpora –Multimodal/medial speech corpora –Richly annotated speech corpora –Pronunciation lexicons
Modules-applications: LT
Modules-applications: ST
Data-modules: LT
Data-modules: ST
Further survey instruments Table containing information on availability of modules and data
Survey results Preliminary priority list that will be submitted to the whole HLT field Comments from the HLT field on priority list Final priority list