Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein.

Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein

Task definition Only interested in info-seeking questions Evaluation biased towards current technology – Asking for the “trigger” text is problematic: Future QG systems may not employ a trigger Trigger less important for deep/holistic questions Need to define what counts as QG – Would mining for questions be acceptable? – Require generative component? (defined how?) – Internal representation? Structure?

Evaluation criteria Evaluate question alone, or question+answer? – System provides question Evaluator decides if answer is available – Separately, evaluate system answer if given Answer = contiguous text? – Can this be relaxed? Additional criteria: conciseness?

Annotation guidelines Question type: need more detailed definition – Yao et al (submitted): What category includes (what|which) (NP|PP) Question type identified mechanically with ad-hoc rules

Terminology For QG from sentences task: – “Ambiguity” is really specificity or concreteness – “Relevance” is really answerability

Rating disagreements Many (most?) of the disagreements are between close ratings (e.g. 3 vs. 4) – Need a measure that considers magnitudes, such as Krippendorff’s α – Perhaps normalize ratings by rater? Specific disagreement on in-situ questions – The codes are not what? – Needs to be addressed in the guidelines

New tasks Replace QG from sentences with QG from metadata – Evaluates only the generation component – Finding things to ask remains a component of the QG from paragraphs task Make all system results public for analysis – Required? Voluntary? – Use data to learn from others’ problems

Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein.

Similar presentations

Presentation on theme: "Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein.

Similar presentations

Presentation on theme: "Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein."— Presentation transcript:

Similar presentations

About project

Feedback