The First Question Generation Shared Task and Evaluation Campaign Vasile Rus, Brendan Wyse, Paul Piwek, Mihai Lintean, Svetlana Stoyanchev, and Cristian.

Slides:



Advertisements
Similar presentations
Cognitive Academic Language Learning Approach
Advertisements

Writing constructed response items
Performance Assessment
1 A proposed approach to developing indicators Use the Strategic Targets document as the basis –Recent; explicitly addresses outcomes; relatively concise.
By Anthony Campanaro & Dennis Hernandez
Powerful Proofreading Developed By Elisa P. Paramore Student Support Services Counselor.
Critical Thinking Course Introduction and Lesson 1
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Rubric Design MLTA Conference What is the assessment for?
A brief overview What is program evaluation? How is an evaluation conducted? When should it be used? When can it be used? Used with Permission of: John.
Authentic Assessment Abdelmoneim A. Hassan. Welcome Authentic Assessment Qatar University Workshop.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Essay Exams Indiana State U & Purdue Writing Guides!
Task 1: Intrinsic Evaluation Vasile Rus, Wei Chen, Pascal Kuyten, Ron Artstein.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
University of Sunderland CSEM04 ROSCO Unit 13 Unit 13: Risk Methods CSEM04: Risk and Opportunities of Systems Change in Organisations Dr Lynne Humphries.
© Copyright 2011 John Wiley & Sons, Inc.
Research problem, Purpose, question
[Insert Exercise Name] Evaluator Briefing and Guidance.
What kind of task will help students synthesize their learning?
How to answer exam questions Basic instructions. ColorsColors Use blue or black ink. Do not use pink, red, green, phosphorus green, phosphorus blue, etc.
Testing Writing. We have to : have representative sample of the tasks that we expect the students to perform. those task should elicit valid samples of.
ASSESSMENT OF ESSAY TYPE QUESTIONS. CONSTRUCTING QUESTIONS Construct questions that test HIGHER LEVEL PROCESSES SUCH AS Construct questions that test.
READING QUESTION TYPES
Writing a Research Proposal
Testing Writing Miss. Mona AL-Kahtani.
Decision Making Dr Vasuprada Kartic NAC Batch IX PGDCPM.
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
NSW Curriculum and Learning Innovation Centre Draft Senior Secondary Curriculum ENGLISH May, 2012.
Facilitating Peer Assessment Assessment Workshops in Composition.
Professional Certificate – Managing Public Accounts Committees Ian “Ren” Rennie.
Classroom Assessment A Practical Guide for Educators by Craig A
Academic Essays & Report Writing
1 Project of Reading Course Development Designer: Erin M Instructor: Mavis Shang Date: 06/09/2008.
Home Enrichment (HE) TEST THE IDEA. DAY ONE (1) Focus: Purpose & Questions at Issue 4 Home Enrichment (HE)- 4/13 Do Nightly / Due on Fri. 4/17 TEST THE.
QUESTION TWO REVISION.  To develop the understanding and approach needed for Question Two of the exam.
Standards-Based Assessment Design MCLA PRACTICAL PROFICIENCY OCTOBER 27, 2014 HEIDI MCGINLEY, EXECUTIVE DIRECTOR,
Performance-Based Assessment Authentic Assessment
Higher English Close Reading Types of Questions Understanding Questions Tuesday 8 OctoberCMCM1.
Professional Certificate in Electoral Processes Understanding and Demonstrating Assessment Criteria Facilitator: Tony Cash.
E-asTTle Writing Paekakariki School 29 th May2012.
Writing a Critical Review
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
IAEA International Atomic Energy Agency Methodology and Responsibilities for Periodic Safety Review for Research Reactors William Kennedy Research Reactor.
WERST – Methodology Group
Learning Development Centre
Fundamentals of Governance: Parliament and Government Understanding and Demonstrating Assessment Criteria Facilitator: Tony Cash.
Writing Overview. What to expect in the writing section… 2 tasks (1 independent tasks and 1 integrated tasks). – Integrated task: You will real a
CrowdForge: Crowdsourcing Complex Work Aniket Kittur, Boris Smus, Robert E. Kraut February 1, 2011 Presenter: Karalis Alexandros 1.
Lesson 4 Grammar - Chapter 13.
What are the Command Words? Calculate Compare Complete Describe Evaluate Explain State, Give, Name, Write down Suggest Use information to…..
Scaffolding Cognitive Coaching Reciprocal Teaching Think-Alouds.
Training on Safe Hospitals in Disasters Module 3: Action Planning for “Safe Hospitals”
Designing a Speaking Task Workshop  Intended learning outcomes  Definition of a task  Principles of second language acquisition  Principles of developing.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Today we are… Test Prepping for Sect. 1 Part B Your homework is… ■Finish the Team Paper --(DUE tomorrow p.m.) ■Have one person from your group.
Options in Applied Psychology G543 Generic exam advice.
Science Journals An introduction to Science Journals.
Websites Revision Guides
Classroom Assessment A Practical Guide for Educators by Craig A
Social Knowledge Mining
Text-to-Text Generation
TIPS when you write.
Assessing Writing Asmarin Meilena Merly Viska
Machine Reading.
Presentation transcript:

The First Question Generation Shared Task and Evaluation Campaign Vasile Rus, Brendan Wyse, Paul Piwek, Mihai Lintean, Svetlana Stoyanchev, and Cristian Moldovan

Outline Overview Task A: Question Generation from Paragraphs Task B: Question Generation from Sentences Conclusions

Overview Two tasks selected through community polling from 5 proposed tasks: –Task A: Question Generation from Paragraphs –Task B: Question Generation from Sentences –Ranking Automatically Generated Questions (Michael Heilman and Noah Smith) –Concept Identification and Ordering (Rodney Nielsen and Lee Becker) –Question Type Identification (Vasile Rus and Arthur Graesser)

Guiding Principles Application-independence –PROS: larger pool of participants a more fair ground for comparison –CONS: difficult to determine whether a particular question is good without knowing the context in which it is posed

Guiding Principles No representational commitment for input – raw text –aimed at attracting as many participants as possible –a more fair comparison environment

Data Sources: –Wikipedia –OpenLearn –Yahoo!Answers Development Set – Test Set –

Task A: Question Generation from Paragraphs The University of Memphis –Vasile Rus, Mihai Lintean, Cristian Moldovan 5 registered participants 1 submission – University of Pennsylvania

Task A Given an input paragraph: Two-handed backhands have some important advantages over one- handed backhands. Two-handed backhands are generally more accurate because by having two hands on the racquet, this makes it easier to inflict topspin on the ball allowing for more control of the shot. Two-handed backhands are easier to hit for most high balls. Two-handed backhands can be hit with an open stance, whereas one-handers usually have to have a closed stance, which adds further steps (which is a problem at higher levels of play).

Task A Generate 6 questions at different levels of specificity –1 x General: what question does the paragraph answer –2 x Medium: asking about major ideas in the paragraphs, e.g. relations among larger chunks of text in the paragraphs such as cause-effect –3 x Specific: focusing on specific facts (somehow similar to Task B) Focus on questions answered explicitly by the paragraph

Examples What are the advantages of two-handed backhands in tennis? –Answer: the whole paragraph Why is a two-hand backhand more accurate [when compared to a one-hander]? “Two-handed backhands are generally more accurate because by having two hands on the racquet, this makes it easier to inflict topspin on the ball allowing for more control of the shot. ” What kind of spin does a two-handed backhand inflict on the ball? “topspin ”

Evaluation Criteria Five criteria –Scope: general, medium, specific Some challenges: rater-selected vs. participant- selected Implications for syntactic and semantic validity –Grammaticality: 1-4 scale (1=best) based on participant-selected paragraph fragment

–Semantic validity: 1-4 scale based on participant-selected paragraph fragment –Question type correctness: 0-1 –Diversity: 1-4 scale Evaluation Criteria Scores 1 – semantically correct and idiomatic/natural 2 – semantically correct and close to the text or other questions 3 – some semantic issues 4 – semantically unacceptable (unacceptable may also mean implied, generic, etc.).

Evaluation Methodology Peer-review –Only one submission so … Two independent annotators UPenn Results/Inter-annotator agreement –Scope: g - 100%, m - 117%, s - 80%, other - 0.8% –Syntactic Correctness: 1.82/87.64% –Semantic Correctness: 1.97/78.73% –Q-diversity: 2.84/100% –Q-type correctness: 83.62%

Organizing team: Brendan Wyse Paul Piwek Svetlana Stoyanchev Four participating systems: Lethbridge University of Lethbridge, Canada MrsQGSaarland University and DFKI, Germany JUQGGJadavpur University, India WLVUniversity of Wolverhampton, United Kingdom Task B: QG from Sentences

Task definition Input instance: –single sentence The poet Rudyard Kipling lost his only son in the trenches in –target question type (e.g., who, why, how, when, …) Who Output instance: –two different questions of the specified type that are answered by input sentence 1) Who lost his only son in the trenches in 1915? 2) Who did Rudyard Kipling lose in the trenches in 1915?

Results: Relevance 1The question is completely relevant to the input sentence. 2The question relates mostly to the input sentence. 3The question is only slightly related to the input sentence. 4The question is totally unrelated to the input sentence.

Results: Relevance WLV1.17 MrsQG1.61 JUQGG1.68 Lethbridge1.74 1The question is completely relevant to the input sentence. 2The question relates mostly to the input sentence. 3The question is only slightly related to the input sentence. 4The question is totally unrelated to the input sentence. Agreement 63%

Results: Question Type 1The question is of the target question type. 2The type of the generated question and the target question type are different.

Results: Question Type Lethbridge1.05 WLV1.06 MrsQG1.13 JUQGG1.19 1The question is of the target question type. 2The type of the generated question and the target question type are different. Agreement: 88%:

Results: Syntactic Correctness and Fluency 1 The question is grammatically correct and idiomatic/natural. 2 The question is grammatically correct but does not read as fluently as we would like. 3 There are some grammatical errors in the question. 4 The question is grammatically unacceptable.

Results: Syntactic Correctness and Fluency WLV1.75 MrsQG2.06 JUQGG2.44 Lethbridge The question is grammatically correct and idiomatic/natural. 2 The question is grammatically correct but does not read as fluently as we would like. 3 There are some grammatical errors in the question. 4 The question is grammatically unacceptable. Agreement: 46%

Results: Ambiguity 1The question is un- ambiguous. Who was nominated in 1997 to the U.S. Court of Appeals for the Second Circuit? 2The question could provide more information. Who was nominated in 1997? 3The question is clearly ambiguous when asked out of the blue. Who was nominated?

Results: Ambiguity WLV1.30 MrsQG1.52 Lethbridge1.74 JUQGG1.76 1The question is un- ambiguous. Who was nominated in 1997 to the U.S. Court of Appeals for the Second Circuit? 2The question could provide more information. Who was nominated in 1997? 3The question is clearly ambiguous when asked out of the blue. Who was nominated? Agreement: 55%

Results: Variety 1The two questions are different in content. Where was X born?, Where did X work? 2Both ask the same question, but there are grammatical and/or lexical differences. What is X for?, What purpose does X serve? 3The two questions are identical.

Results: Variety Lethbridge1.76 MrsQG1.78 JUQGG1.86 WLV2.08 1The two questions are different in content. Where was X born?, Where did X work? 2Both ask the same question, but there are grammatical and/or lexical differences. What is X for?, What purpose does X serve? 3The two questions are identical. Agreement: 58%

Results: Variety Lethbridge MrsQG JUQGG WLV 1The two questions are different in content. Where was X born?, Where did X work? 2Both ask the same question, but there are grammatical and/or lexical differences. What is X for?, What purpose does X serve? 3The two questions are identical.

Conclusions Task A –The scope criteria more complex than initially thought –There is need for improvement regarding the naturalness of the asked questions and question type diversity

THANK YOU ! QUESTIONS?