Download presentation
Presentation is loading. Please wait.
Published byEmil Hines Modified over 9 years ago
1
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 1 European Language Resources Association (ELRA) HLT Evaluations Khalid CHOUKRI ELRA/ELDA 55 Rue Brillat-Savarin, F-75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: choukri@elda.org http://www.elda.org/http://www.elda.org/ or http://www.elra.info/http://www.elra.info/
2
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 2 Presentation Outline European language Resources Association Evaluation to drive research progress Human Language Technologies Evaluation(s) »What, why, for whom, how …. (Some figures from TC-STAR) Examples of Evaluation campaigns Demo …(available afterwards)
3
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 3 European Language Resource Association An Improved infrastructure for Data sharing & HLT evaluation (1)An Association of users of Language Resources (2)Infrastructure for the evaluation of Human Language Technologies providing resources, tools, methodologies, logistics,
4
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 4 The Association Membership Drive: ELRA is Open to European & Non-European Institutions Resources are available to Members & Non-Members Pay per Resource Substantial discounts on LR prices (over 70%), Substantial discounts on LREC registration fees Legal and contractual assistance with respect to LR matters Access to Validation and production manuals (Quality assessment) Figures and facts about the Market (results of ELRA surveys) Newsletter and other publications ……………. New: Fidelity program … earn miles and get more benefits Some of the benefits of becoming a member:
5
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 5 ELRA: An efficient infrastructure to serve the HLT Community Strategies for the next Decade … New ELRA status: 2005- Extension of ELRA’s official mission to promote LRs and evaluation for the Human Language Technology (HLT): The mission of the Association is to promote language resources (henceforth LRs) and evaluation for the Human Language Technology (HLT) sector in all their forms and all their uses;
6
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 6 Courtesy G.Thurmair Malta workshop.
7
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 7 Long term / high risk Large return of investment Evolutionary Usability Acceptability Basic Research Technology Development Application Development Bottleneck Identification Research results in quantitative evaluation Technologies necessitated for applications Technologies which have been validated for applications. Meeting points with technology development Quantitative Evaluation Usage Evaluation What to evaluate … Levels of Evaluation
8
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 8 What to evaluate … Levels of Evaluation -Basic Research Evaluation (validate research direction) -Technology Evaluation (assessment of solution for well defined problem) -Usage Evaluation (end- users in the field) -Impact Evaluation (socio- economic consequences) -Programme Evaluation (funding agencies) Our concern
9
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 9 Why Evaluate? Validate research hypotheses Assess progress Choose between research alternatives Identify promising technologies (market) Benchmarking … state of the art Share knowledge … dedicated workshops Feedback … Funding agencies Share Costs ???
10
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 10 Progress & Evaluation (Courtesy Charles Wayn)
11
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 11 Technology performance & Applications Bad technology may be used to design useful applications What about good technology ? …. »Software industry
12
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 12 HLT Evaluations …. For whom MT developers want to improve the “quality” of MT output MT users (humans or software e.g. CLIR ) want to improve productivity using the most suitable MT system (e.g. multilinguality) …. -Basic Research Evaluation (validate research direction) -Technology Evaluation (assessment of solution for well defined problem) -Usage Evaluation (end-users in the field) -Impact Evaluation (socio- economic consequences) -Programme Evaluation (funding agencies)
13
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 13 For whom … essential for technology development Share of Information and knowledge between participants: (how to get the best results, access to data, scoring tools) Information obtained by industrialists: state of the art, technology choice, market strategy, new products. Information obtained by funding agencies: technology performance, progress/investment, priorities
14
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 14 Some types of evaluations Comparative evaluation the same or similar control tasks and related data with metrics that are agreed upon Competitive vs Cooperative Black box evaluation … Glass box Objective evaluation … Subjective (Human-based) Corpus based (test suites) Quantitative measures … Qualitative
15
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 15 Comparative Evaluation of Technology Used successfully in the USA by DARPA and NIST (since 1984) Similar efforts in Europe on a smaller scale, mainly projects (EU funded or national programs) Select a common "task" Attract enough Participants Organize the campaign (protocol/metrics/data) Follow-up workshop, interpret results and share info
16
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 16 Requirements for an evaluation campaign Referencial Language Resources (Data) (truth) Metric(s): Automatic, Human judgments … scoring software scale/range of performance to compare with (Baseline) Logistics’ Management reliability assessment: independent body Participants: technology providers
17
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 17 HLT Evaluation Portal… Pointers to projects Overview HLT Evaluations Activities by technology Activities by geographical region Players Evaluation resources Evaluation Services http://www.hlt-evaluation.org/ Let us list some well known campaigns
18
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 18 Speech & Audio/sound ASR: TC-STAR, CHIL, ESTER TTS: TC-STAR, EVASY Speaker identification (CHIL) Speech 2 Speech Translation Speech Understanding (Media) Acoustic Person tracking Speech activity detection, ….. ……… Examples of Evaluation Campaigns – Capitalization
19
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 19 Multimodal --- Video – Vision technologies Face Detection Visual Person Tracking Visual Speaker Identification Head Pose Estimation Hand Tracking Examples of Evaluation Campaigns – Capitalization
20
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 20 Some of the technologies being evaluated within CHIL …http://chil.server.de/ A) Vision technologies –A.1) Face Detection –A.2) Visual Person Tracking –A.3) Visual Speaker Identification –A.4) Head Pose Estimation –A.5) Hand Tracking B) Sound and Speech technologies –B.1) Close-Talking Automatic Speech Recognition –B.2) Far-Field Automatic Speech Recognition –B.3) Acoustic Person Tracking –B.4) Acoustic Speaker Identification –B.5) Speech Activity Detection –B.6) Acoustic Scene Analysis C) Contents Processing technologies –C.1) Automatic Summarisation … Question Answering more at the CHIL/CLEAR workshops
21
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 21 Written NLP & Content IR, CLIR, QA, (Amaryllis, EQUER, CLEF) Text analysers (Grace, EASY) MT (CESTA, TC-STAR) Corpus alignement & processing (Arcade, Arcade-2, Romanseval/Senseval, …) Term & Terminology extraction Summarisation Examples of Evaluation Campaigns – Capitalization
22
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 22 Evaluation Projects …. The French scene Some projects in NL, Italy,... Technolangue/Evalda: the Evalda platform consists of 8 evaluation campaigns with a focus on the spoken and written language technologies for the French language: – ARCADE II: evaluation of bilingual corpora alignment systems. – CESART: evaluation of terminology extraction systems. – CESTA: evaluation of machine translation systems (Ar, Eng => Fr). – EASY: evaluation of parsers. – ESTER: evaluation of broadcast news automatic transcribing systems. – EQUER: evaluation of question answering systems. – EVASY: evaluation of speech synthesis systems. – MEDIA: evaluation of in and out-of context dialog systems.
23
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 23 Some details from relevant projects CLEF TC-STAR
24
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 24 CLEF (Cross-Language Evaluation Forum) Promoting research and development in Cross-Language Information Retrieval (CLIR) (i) providing an infrastructure for the testing and evaluation of information retrieval systems - European languages - monolingual and cross-language contexts (ii) creating test packages of reusable data which can be employed by system developers for benchmarking purposes. Example of Evaluation Initiatives
25
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 25 QA-CLEF: State of the art & Improvement
26
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 26 Back to Evaluation Tasks within TC-STAR ( http://www.tc-star.org/) http://www.tc-star.org/ 2 categories of transcribing and translating tasks –European Parliament Plenary Sessions: (EPPS): English (En) and Spanish (Es), –Broadcast News (Voice of America VoA): Mandarin Chinese (Zh) and English (En) TC-STAR: Speech to speech translation Packages with Speech recognition, speech translation, and speech synthesis Development and Test data, metrics & results.
27
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 27 TC-STAR evaluations ……. 3 Consecutives annual evaluations 1.SLT in the following directions i. Chinese-to-English (Broadcast News) ii. Spanish-to-English (European Parliament plenary speeches) iii. English-to-Spanish (European Parliament plenary speeches) 2.ASR in the following languages i.English (European Parliament plenary speeches) ii.Spanish (European Parliament plenary speeches) iii.Mandarin Chinese (Broadcast News) 3.TTS in Chinese, English, and Spanish under the following conditions: i.Complete system ii.Voice conversion intralingual and crosslingual, expressive speech: iii.Component evaluation
28
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 28 Improvement of SLT Performances (En Es) Input = Text, Verbatim Speech recognition
29
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 29 Improvement of SLT Performances (Es En) Input = Text, Verbatim Speech recognition
30
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 30 Improvement of ASR Performances (En Es)
31
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 31 Human Evaluation Translations … EnEs adequacy (1-5) Combinations Commercial
32
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 32 End-to-End The end-to-end evaluation is carried out for 1 translation direction: English-to-Spanish Evaluation of ASR (Rover) + SLT (Rover) +TTS (UPC) system Same segments as for SLT human evaluation Evaluation tasks: –Adequacy: comprehension test –Fluency: judgement test with several questions related to fluency and also usability of the system
33
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 33 Fluency questionnaire [Understanding] Do you think that you have understood the message? 1: Not at all,...........5: Yes, absolutely [Fluent Speech] Is the speech in good Spanish? 1: No, it is very bad...... 5: Yes, it is perfect [Effort] Rate the listening effort 1: Very high............ 5: Low, as natural speech [Overall Quality] Rate the overall quality of this audio sample 1: Very badm unusable...... 5: It is very useful
34
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 34 End to End results (subjective test: 1…5 )
35
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 35 TC-STAR Tasks More results from the 2007 Campaign http://www.tc-star.org/ Evaluation packages available
36
ELRA & ELDA TC-STAR General Meeting Lux. 2007-05 KC 36 Some concluding remarks on Technology evaluation It saves developers time and money It help assess progress accurately It produces reusable evaluation packages It helps to identify areas where more R&D is needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.