ConvAI2 Competition The task

Slides:



Advertisements
Similar presentations
Automated Software Testing: Test Execution and Review Amritha Muralidharan (axm16u)
Advertisements

Indicators to measure the effectiveness of the implementation of the Strategy State of the art Karin Sollart Netherlands Environmental Assessment Agency.
Evaluating Human-Machine Conversation for Appropriateness David Benyon, Preben Hansen, Oli Mival and Nick Webb.
Models of Computation as Program Transformations Chris Chang
Introduction to Benchmarking (II): The current situation in other imaging communities Introduction to Benchmarking (II): The current situation in other.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
1 ISE 412 Human-Computer Interaction Design process Task and User Characteristics Guidelines Evaluation.
Robert Reid Torri Ortiz Lienemann.  Session I: ◦ Introductions of group members, facilitators, and text ◦ Review format for the book study ◦ Choose partners/small.
Introduction to SDLC: System Development Life Cycle Dr. Dania Bilal IS 582 Spring 2009.
2k BotPrize Presentation – James Shannon. Objective of Competition  Create a bot that seems human.  Bot will judge and be judged as BOT or HUMAN. 
1 Human-Computer Interaction  Design process  Task and User Characteristics  Guidelines  Evaluation.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
OBJECT ORIENTED SYSTEM ANALYSIS AND DESIGN. COURSE OUTLINE The world of the Information Systems Analyst Approaches to System Development The Analyst as.
© J. Christopher Beck 2007 Commentary on Session A J. Christopher Beck Department of Mechanical & Industrial Engineering University of Toronto, Canada.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
1. Need for continuous improvement Today’s dynamic and turbulent business environment has shifted the focus of the organizations from ‘Quality’ to ‘Competitive.
MarioAI Level Generation Track. COMPETITION GOALS  Create an automatic level generator for Infinite Mario Bros.  Levels should be randomly generated.
Reasoning, Attention, Memory (RAM) NIPS Workshop 2015
Human-Computer Interaction Design process Task and User Characteristics Guidelines Evaluation ISE
TRIUMF HLA Development High Level Applications Perform tasks of accelerator and beam control at control- room level, directly interfacing with operators.
Conversational role assignment problem in multi-party dialogues Natasa Jovanovic Dennis Reidsma Rutger Rienks TKI group University of Twente.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
@ nicorosstweet Measuring Social Media #eventprofs.
SOFTWARE LIFECYCLE. What functions would ISEES perform?
Configuration Control (Aliases: change control, change management )
LENOVO INTERNAL. ALL RIGHTS RESERVED. EBG System Software Solution Machine Learning 101 Chao Yan.
COUNTER Code of Practice - an introduction to Release 4
WP2 - INERTIA Distributed Multi-Agent Based Framework
(End-to-End Methods for) Dialogue, Interaction and Learning
Neural Response Generation via GAN with an Approximate Embedding Layer
Teaching Machines to Converse
Software Project Configuration Management
Pixy Python API Charlotte Weaver.
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
DESIGNING IWB RESOURCES FOR LANGUAGE TEACHING: THE iTILT PROJECT
International Space Station HUNCH Video Challenge
International Space Station HUNCH Video Challenge
Deep Learning in Open Domain Dialogue Generation
GATE and the Semantic Web
CARP: Context-Aware Reliability Prediction of Black-Box Web Services
Adversarial Learning for Neural Dialogue Generation
Learning Reproducibility with a Yearly Networking Contest
Games to engage users and collect data
mengye ren, ryan kiros, richard s. zemel
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Reinforcement Learning
Integrating Learning of Dialog Strategies and Semantic Parsing
State-of-the-art face recognition systems
Evaluation activities 2016
Requirements Engineering Bsc Applied Computing Year 2
Evaluate Economic Impact of Transgenes
Software Test Automation and Tools
Variables and Expressions
ConvAI2 Competition: Results and Analysis
Business View on eCTD v4.0 Advantages and challenges when considering implementation to overcome constrains of the current eCTD specification.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Optimize faculty load & course scheduling Summary of Recommendations
Point 8: A revision policy for the UOE data collection History
Achieving the Mission Building Your Team.
Dialogue State Tracking & Dialogue Corpus Survey
SLT 2018 Special Session – Microsoft Dialogue Challenge: Building E2E Task-Completion Dialogue Systems. Agenda 1:00 - 1:10PM Opening, Jianfeng Gao (MSR)
ConvAI2 Competition: FUTURE WORK
Natural Language to SQL(nl2sql)
Teaching Science for Understanding
Education & AI High level discussion points relating ‘techniques / tools’ to possible ‘projects’
Presented by: Anurag Paul
Open Research at Reading 2017–2019
Reinforcement Learning
Presentation transcript:

ConvAI2 Competition The task Organizers: Mikhail Burtsev, Varvara Logacheva, Valentin Malykh, Ryan Lowe, Iulian Serban, Shrimai Prabhumoye, Emily Dinan, Douwe Kiela, Alexander Miller, Kurt Shuster, Arthur Szlam, Jack Urbanek and Jason Weston. Advisory board: Yoshua Bengio, Alan W. Black, Joelle Pineau, Alexander Rudnicky, Jason Williams. Speaker: Jason Weston, Facebook AI Research

COMPETItiON AIM: SHARED TASK TO FIND BETTER Chitchat DIALOGUE MODELS Few available datasets for non-goal-oriented dialogue (chit-chat). Currently no standard evaluation procedure. ConvAI2 Goals: A concrete scenario for testing chatbots A standard evaluation tool datasets are open source baselines and evaluation code (automatic + MTurk) are open source encourage all competitors to open source code, winner must be

Convai2 : conversational AI challenge This is the second ConvAI Challenge. Last year focused on dialogue based on a news/Wikipedia article. This year we aim to improve over last year: Providing a dataset from the beginning, PersonaChat Making the conversations more engaging for humans Clearer evaluation process (automatic evaluation, followed by human eval)

Personachat dataset http://parl.ai/ The dataset consists of (Zhang et al., 2018)  http://parl.ai/ The dataset consists of 164,356 utterances in 11k dialogs, over 1155 personas

COMPETItiON AIM: SHARED TASK TO FIND BETTER Chitchat DIALOGUE MODELS Common issues with chit-chat models include: (i) no consistent personality (Li et al., 2016)  (ii) no long-term memory; in seq2seq last lines of dialogue are input (Vinyals et al., 2015) (iii) tendency for generic responses, e.g. ``I don’t know’’ (Li et al., 2015). ConvAI2 aims to find models that address those issues.

Prize The winning entry will receive $20,000 in Mechanical Turk funding – in order to encourage further data collection for dialogue research.

schedule Competitors submit models until Sep 30th Evaluated on hidden test set via automated metrics (PPL, hits@1, F1). Leaderboard visible to all competitors. `Wild’ live evaluation on Messenger/Telegram can be used to tune models. Sep 30th: system locked Best 7 performing systems make it to the next round Evaluated using Mechanical Turk and the `wild’ evaluation Winners announced at NeurIPS 2018.

rules Competitors must indicate which training sources are used Can augment training with other data as long as publicly released (and hence, reproducible) Competitors must provide their source code so hidden test set evaluation can be computed so that the competition has further impact in the future Code can be in any language but a thin python wrapper must be provided in order to work with our evaluation code via ParlAI’s interface.