Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,

Similar presentations


Presentation on theme: "The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,"— Presentation transcript:

1 The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4, 2013FIRE

2 The Story Evaluation-guided research The three C’s Five examples Thinking forward

3 Evaluation-Guided Research Information Retrieval Text classification Automatic Speech Recognition Optical Character Recognition Named Entity Recognition Machine Translation Extractive summarization …

4 Key Elements Task model Single-valued evaluation measure Affordable evaluation process

5 Critiques Early convergence Duplicative ($) Incrementalism Privileging the measurable

6 The Big Four TREC NTCIR CLEF FIRE

7 10 More TDT Amarylis INEX TRECVid TAC MediaEval STD OAEI CONLL WePS

8 What We Create Collections Comparison points –Baseline results Communities Competition?

9 Elsewhere in the Ecosystem … Capacity –From universities, industry, individuals, and funding agencies Completed work –Often requires working outside our year-long innovation cycles with rigid timelines Culling –Conferences and journals are the guardians of community standards

10 A Typical Task Life Cycle Year 1: –Task definition –Evaluation design –Community building Year 2: –Creating training data Year 3: –Reusable test collection –Establishing strong baselines

11 Some Sea Stories TDT CLIR Speech Retrieval E-Discovery

12 Topic Detection and Tracking Cultures –Speech, sponsor Event-based relevance Document boundary discovery Complexity –5 tasks, 3 languages, 2 modalities Lasting influence

13 Cross-Language IR TREC CLIR (Arabic) –Standard resources –Light stemming –Problematic task model CLEF Interactive CLIR –Controlled user studies –Problematic evaluation design –Qualitative vs. quantitative

14 Speech Retrieval TREC Spoken Document Retrieval –The “solved problem” CLEF Cross-Language Speech Retrieval –Grounded queries –Start time error evaluation measure FIRE QA for the Spoken Web

15 TREC Legal Track Iterative task design Sampling Measurement error Families Cultures

16 What’s in a Test Collection? Queries Documents Relevance judgments

17 What’s in a Test Collection? Queries Content Units of judgment Relevance judgments Evaluation measure(s)

18 Personality Types Innovators Organizers Optimizers Deployers Resourcers

19 Some Takeaways Progressive invalidation Social engineering Innovation from outside

20 A Final Thought It isn’t what you don’t know that limits your thinking. Rather, it is what you know that isn’t true.


Download ppt "The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,"

Similar presentations


Ads by Google