Download presentation
Presentation is loading. Please wait.
Published byErick Kelley Modified over 9 years ago
1
ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET
2
Overview ENABLER BLARK BLARK Results Recent developments CLARIN Some reflections MyBLARK Concluding remarks
3
ENABLER EU Project, FP5, under Information Society Technologies (see www.enabler-network.org) bringing together national language resources projects in many EU countries aimed at providing a cooperative framework to foster cooperation and interoperability with a strong industrial drive led by Pisa, and ended –as an EU project– in 2004 … but still existing as a community, in close collaboration with ELSNET (www.elsnet.org)
4
BLARK (1) Basic Language Resource Kit Idea (first launched in 1998): definition of the minimal set that is needed to do any (precompetitive) R&D and education at all Definition should be in principle language independent (although specific languages may require specific adaptations)
5
BLARK (2) Definition should include both data collections (corpora, lexicons) and modules (taggers, parsers, synthesizers, annotation tools) It should include both qualitative aspects (e.g. standards) and quantitative aspects (e.g. size)
6
BLARK (3) Once the definition is available it can be used as a common reference point that allows to –assess the resources situation of a language (how much of the BLARK is available, and what is still missing) –make priority plans for bringing the resources situation up to date
7
BLARK (4) Note that the BLARK is necessarily dynamic, as new technological developments will come with new requirements Note that the BLARK for a language will only work if there is a body that takes responsibility for its implementation and for the maintenance and distribution of the resources created
8
BLARK Results First adopted by the Dutch Language Union, resulting in a first 12 Meuro implementation programme launched at the end of 2004 Explored and developed for Arabic in the NEMLAR project (CST, ELDA, ELSNET, and others; see www.nemlar.org and the presentation at this conference O27-G on Thursday)www.nemlar.org BLARK concept included in a number of proposals, but without tangible results Suggestions for a more advanced variant (ELARK) have been put forward by ELDA and others
9
Recent developments CLARIN: Common Language Resources and Technology Infrastructure (see LREC 2006 workshop on May 22, or otherwise www.mpi.nl/clarin) NOT a project proposal, but rather a proposal for a Research Infrastructure to be included in the European Roadmap for Research Infrastructures
10
CLARIN (1) Creation of open European Language Resources Network with strong service centers and repositories, providing the humanities community at large (i.e. not just the language and speech technology community) with –knowledge about which language resources and tools exist and how to use them –access to existing language resources –coordinated creation of new resources –access to advanced services for access and adaptation –bundling of expertise in specific problem areas –training centers
11
CLARIN (2) Three important observations: CLARIN has no industrial drive CLARIN aims at addressing all languages in the EU (and associated countries) One of CLARIN’s objectives is the definition and the coordinated creation of BLARKs for all languages of the EU
12
Some reflections Whatever progress has been made (DLU, NEMLAR, ELARK) was mostly inspired by industrial needs Industrial considerations do not favour smaller languages Progress of the BLARK since 1998 has been slow No new funding opportunities in FP6 to get anything done CLARIN may offer exciting opportunities (if successful), but this will take a lot of time
13
More reflections The present (embryonic) BLARK definition may be one or more steps too far for under-resourced languages So, why not add to the concept the BLARKette, which should represent a very basic entry level variant of the BLARK, targeting exclusively the research and (especially) education community Small and simple, should fit on a CDROM
14
And yet more reflections Nothing funded will happen before well into 2007 Why wait until then, e.g. if and when CLARIN is in place and some formal process has put into motion to define the BLARK (and the BLARKette)? Why not start an action to consult the language communities and to arrive at a first proposal for a BLARK and BLARKette definition?
15
MyBLARK, the proposal We initiate MyBLARK, aiming at collecting (for each language in the EU) –a description of the essential components of the BLARK –and of the BLARKette We try to distill from this a broadly supported proposal for the definition of both concepts We offer this as an input to the CLARIN project if it ever happens, or otherwise use it to launch other initiatives
16
MyBLARK, the process ELSNET (possibly in collaboration with COCOSDA/WRITE) will send out a simple questionnaire to all known language resources centers, asking for descriptions of BLARK and BLARKette components ELSNET (maybe with COCOSDA/WRITE) will set up a committee to synthesize the results in the form of recommendations
17
MyBLARK participants Language resources centers for languages of EU and associated countries known to us Language resources centers in the EU (+associated countries) that send me a message that they are willing to participate (steven.krauwer@elsnet.org)steven.krauwer@elsnet.org
18
MyBLARK Questionnaire Language Type of resource Usage Size Annotation required Brief description Available for your language? If so: pointer to it If not, pointer to similar resource for another language References Comments
19
MyBLARK Schedule June – August 2006: collection of contacts Sept 2006: questionnaires sent out October 2006: questionnaires in, 1 st analysis and draft definition proposals November 2006: proposals sent out for feedback December 2006 – January 2007: collecting feedback February 2007: Final report
20
Concluding remarks I have proposed the introduction of a slightly weaker variant of the BLARK, the BLARKette, for under-resourced languages I have proposed an action entitled MyBLARK to arrive at an initial definition of both the BLARK and the BLARKette I hope that this will (a) speed up the process, and (b) provide an intermediate coverage level for under-resourced languages
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.