Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.

Similar presentations

Presentation on theme: "Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept."— Presentation transcript:

1 Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept. of Language and Speech, Nijmegen * NTU, Dutch Language Union, The Hague Walter Daelemans Dept. of CNTS Language Technology, Antwerp

2 Dutch HLT Platform NTU NTU - Nederlandse Taalunie (Dutch Language Union) Mission: Strengthening the position of the Dutch Language Dutch HLT Platform Aim: To contribute to the further development of an adequate language and speech technology infrastructure for Dutch

3 Dutch HLT Platform Other participants n Ministry of the Flemish Community n Flemish Institute for the Promotion of Scientific- technological Research in Industry n Fund for Scientific Research - Flanders n Dutch Ministry of Education, Culture and Sciences n Dutch Ministry of Economic Affairs n Netherlands Organisation for Scientific Research n Senter (an agency of the Dutch Ministry of Economic Affairs)

4 Dutch HLT Platform Four action lines A. A.Performing a market place function B. B.Strengthening the HLT infrastructure C. C.Working out standards and evaluation criteria D. D.Developing a management, maintenance, and distribution plan

5 This presentation Platform BC A.- B. B.Strengthening the HLT infrastructure C. C.Working out standards and evaluation criteria D. D.- B+C => Platform BC n n Focus on method (skip many details) n n More details: see publications, web sites

6 Platform BC What? 1.BLARK: Basic LAnguage Resources Kit 2.Inventory & Evaluation 3.Priority lists

7 Platform BC Who? Steering committee: n 8 HLT experts n NTU n NWO (funding body) 4 field researchers

8 Platform BC How? 1.BLARK 2.Inventory & Eval. 3.Priority lists Report 1 Feedback: Dutch HLT FieldDutch HLT Field Workshop 15/11/2001Workshop 15/11/2001 1.BLARK 2.Inventory & Eval. 3.Priority lists Report 2

9 1. BLARK Basic LAnguage Resources Kit Components: Applications: classes of applications rather than specific applications or products.Applications: classes of applications rather than specific applications or products. Modules (or semi-products): the basic software components of HLT applications.Modules (or semi-products): the basic software components of HLT applications. Data: sets of language data and descriptions in machine readable form.Data: sets of language data and descriptions in machine readable form.

10 BLARK Basic LAnguage Resources Kit 2 matrices: 1.Modules x Data 2.Modules x Applications => BLARK

11 DataApplications Modules

12 BLARK Language technology Modules Robust modular text preprocessing Morphological analysis and morphosyntactic disambiguation / unknown words Robust syntactic analysis Aspects of semantic analysis (word meaning and reference) Data Monolingual lexicon Annotated corpus of written Dutch Benchmarks for evaluation

13 BLARK Speech technology Modules Automatic speech recognition Speech synthesis system Tools for annotation of speech corpora Confidence measures and utterance verification Identification (speaker, language, dialect) Data Monolingual speech corpora for specific applications Multilingual speech corpora Multimodal/medial speech corpora Benchmarks for evaluation

14 2. Inventory & Evaluation B. Inventory: Which components in BLARK are available? C. Evaluation: And of sufficient quality? Checklist approach => B&C together: platform BC See matrix 3 - Availability

15 ModulesAvailability

16 3. Priority lists BLARKInventory Priority lists

17 The prioritisation was based on the following requirements: n The components should currently be unavailable, inaccessible, or of insufficient quality. n The components should be relevant for a large number of applications. n Developing the components should be possible in the short term.

18 Priority list Language technology 1. Annotated corpus of written Dutch 2. Syntactic analysis 3. Robust text pre-processing 4. Semantic annotations for treebank in 1 5. Translation equivalents 6. Benchmarks for evaluation

19 Priority list Speech technology 1. Automatic speech recognition 2. Speech corpora 3. Multi-media speech corpora 4. Tools for (semi-) automatic transcription of speech data 5. Speech synthesis 6. Benchmarks for evaluation

20 Feedback Report 1 Feedback n Sent to the Dutch-Flemish HLT field (2000) n Workshop 15/11/2001 => Report 2

21 Platform BC How? 1.BLARK 2.Inventory & Eval. 3.Priority lists Report 1 Feedback: Dutch HLT FieldDutch HLT Field Workshop 15/11/2001Workshop 15/11/2001 1.BLARK 2.Inventory & Eval. 3.Priority lists Report 2

22 When BLARK is established... Intellectual rights by NTU Actual management and maintenance of resources by HLT agency, to be founded Maintenance of expertise by Dutch-Flemish steering committees and HLT management committee, both to be founded

23 General conclusions Goals have been achieved so that the proper prior conditions for development of materials in BLARK are created This work, carried out in the Dutch speaking area, can be profitable for other countries when starting similar activities: n Presentations & publications n Part of the report is translated into English

24 Web sites //

25 That’s it

26 Web sites //


28 Objectives n strengthening the position of Dutch in HLT n establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding n stimulating co-operation between academia and industry in the field of HLT n contributing to the realisation of European co- operation in HLT-relevant areas n establishing a network that brings together supply and demand for knowledge, products, and services

29 Platform BC Who? Steering committee: 8 HLT experts Lang. Tech. Speech Tech. Flanders 1. WD 2. FvE 1. JPM 2. DvC Netherlands 1. GB 2. AN/DH/FdJ 1. HS 2. RV / AD

Download ppt "Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept."

Similar presentations

Ads by Google