Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

Slides:

Advertisements

Similar presentations

L3S Research Center University of Hanover Germany

Advertisements

National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Evaluation Kleanthous Styliani

3 rd A3EH workshop at 12 th International Conference on Artificial Intelligence, Amsterdam, 2005 TU/ e eindhoven university of technology Evaluation of.

“How can I learn AI?” Lindsay Evett, Alan Battersby, David Brown, SCI NTU Penny Standen, DRA UN.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

I-Room : Integrating Intelligent Agents and Virtual Worlds.

Learning Object Metadata From the locally prescribed to the socially derived (or, a look back at 4 years of LORNET at the University of Saskatchewan Scott.

George Lee User Context-based Service Control Group

OASIS Reference Model for Service Oriented Architecture 1.0

Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.

Including Cognitive Disabilities in International Standards David Fourney Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan,

Design Activities in Usability Engineering laura leventhal and julie barnes.

Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.

Multiagent Systems and Societies of Agents

Chapter 7 Usability and Evaluation Dov Te’eni Jane M. Carey.

Verbal (symbol) Based Interactions Dr.s Barnes and Leventhal.

Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.

Guided Conversational Agents and Knowledge Trees for Natural Language Interfaces to Relational Databases Mr. Majdi Owda, Dr. Zuhair Bandar, Dr. Keeley.

Principles and Methods

Usability and Evaluation Dov Te’eni. Figure ‎ 7-2: Attitudes, use, performance and satisfaction AttitudesUsePerformance Satisfaction Perceived usability.

SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.

Software Process and Product Metrics

Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)

The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.

David Chen IMS-LAPS University Bordeaux 1, France

GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.

Measuring the Effort for Creating and Using Domain-Specific Models Yali Wu PhD Candidate 18 October 2010.

Design Science Method By Temtim Assefa.

Affective Computing: Agents With Emotion Victor C. Hung University of Central Florida – Orlando, FL EEL6938: Special Topics in Autonomous Agents March.

Lecture 9 Usability of Health Informatics Applications (Chapter 9)

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

Chapter 6 Cognitive and Learning Characteristics © Taylor & Francis 2015.

Theories of First Language Acquisition

© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.

Towards A Context-Based Dialog Management Layer for Expert Systems Victor Hung, Avelino Gonzalez & Ronald DeMara Intelligent Systems Laboratory University.

Information commitments, evaluative standards and information searching strategies in web-based learning evnironments Ying-Tien Wu & Chin-Chung Tsai Institute.

Workshop on Software Product Archiving and Retrieving System Takeo KASUBUCHI Hiroshi IGAKI Hajimu IIDA Ken’ichi MATUMOTO Nara Institute of Science and.

Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,

Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.

Dialog Management for Rapid-Prototyping of Speech-Based Training Agents Victor Hung, Avelino Gonzalez, Ronald DeMara University of Central Florida.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

CS532 TERM PAPER MEASUREMENT IN SOFTWARE ENGINEERING NAVEEN KUMAR SOMA.

Human Abilities 2 How do people think? 1. Agenda Memory Cognitive Processes – Implications Recap 2.

1 Knowledge Acquisition and Learning by Experience – The Role of Case-Specific Knowledge Knowledge modeling and acquisition Learning by experience Framework.

Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.

Measuring Behavioral Trust in Social Networks

Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.

1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 15a: Product Metrics for Software Software Engineering: A Practitioner’s Approach, 6/e Chapter.

International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.

MAVILLE ALASTRE-DIZON Philippine Normal University

An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,

The “Spatial Turing Test” Stephan Winter, Yunhui Wu

1 Reference Model for Evaluating Intelligent Tutoring Systems Esma Aimeur, Claude Frasson Laboratoire HERON Informatique et recherche opérationnelle Université.

Lyon Research Center for Images and Intelligent Information Systems IEEE International Conference on Pervasive Services 2006 FRE 2672 INSA Lyon ICPS, 27.

Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.

Jon Juett April 21,  Selected very recent papers  Includes some student level event / conference papers  UM Health Counseling Program  Correctly.

1. 2 Issues in the Design and Testing of Business Survey Questionnaires: Diane K. Willimack U.S. Census Bureau Economic Census The International.

COMP 135: Human-Computer Interface Design

Automatic cLasification d

HI 5354 – Cognitive Engineering

Spoken Dialogue Systems

Spoken Dialogue Systems

Toward a Reliable Evaluation of Mixed-Initiative Systems

Teaching Java with the assistance of harvester and pedagogical agents

Towards lifelike Computer Interfaces that learn

Goal-Driven Continuous Risk Management

Goal-Driven Software Measurement

Presentation transcript:

Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems Laboratory University of Central Florida IEEE International Conference on Systems, Man, and Cybernetics San Antonio, Texas October 12, 2009

University of Central Floridawww.ucf.edu Agenda  Introduction  Background  Approach  Project LifeLike

University of Central Floridawww.ucf.edu Introduction  Interactive Conversation Agent Evaluation  Cannot rely solely on quantitative methods  Subjectivity in ‘naturalness’  No general method to judge how well a conversation agent performs  Pivotal focus will be defining naturalness  How well a chatbot can maintain a natural conversation flow  LifeLike virtual avatar project as a backdrop  Provide a suitable validation and verification method

University of Central Floridawww.ucf.edu Background: Early Systems  Declarative knowledge to process data  Explicitly defined rules  Constrained knowledge  Limited capacity to assess and adapt  Goal-oriented and data-driven behavior  ALICEbot

University of Central Floridawww.ucf.edu Background: Naturalness  Automatic Speech Recognition  Context retrieval experimentation  Intelligent tutoring  Adaptive Control of Thought  Knowledge Acquisition agents  Quality of the information received  Conversation length metric  ALICE-based bots

University of Central Floridawww.ucf.edu Background: Recent Advances  Sentence-based template matching  Simple conversational memory  CMU’s Julia, Extempo’s Erin  Interaction occurs in a reactive manner  Wlodzislaw et al  Development of cognitive modules and human interface realism  Ontologies, concept description vectors, semantic memory models, CYC

University of Central Floridawww.ucf.edu Background: Recent Advances  Becker and Wachsmuth  Representation and actuation of coherent emotional states  Lars et al  Model for sustainable conversation  Awareness of the human users and the conversation topics  Relies on textual input similar to ELIZA  Use of natural language processing for reasoning about human speech

University of Central Floridawww.ucf.edu Background: Conclusion  Breadth of research using chatbots  Focus on creating more sophisticated interpretative conversational modules  Need exists for generalizable metrics  Conversational agents widely experimented with, but it has been lacking a basic framework for universal performance comparison

University of Central Floridawww.ucf.edu Approach: Previous Approaches  Mix of quantitative and qualitative measures  Subjective matters employ human user questionnaire  Semeraro et al’s bookstore chatbot  7 characteristics: impression, command, effectiveness, navigability, ability to learn, ability to aid, comprehension.  Does not provide statistical conclusiveness  General indicator of performance

University of Central Floridawww.ucf.edu Approach: Previous Approaches  Shawar and Atwell’s universal chatbot evaluation system  ALICE-based Afrikaans conversation agent  Dialog efficiency  Dialog quality: reasonable, weird but understandable, and nonsensical  Users’ satisfaction, qualitatively measured  Proper assessment is end result in how successfully it accomplishes its intended goals

University of Central Floridawww.ucf.edu Approach: Previous Approaches  Evaluation of naturalness similar to general chatbot assessment  Rzepka et al’s 1-to-10 scale metrics  Naturalness degree  Willing to continue a conversation degree  Human judges used these measures to evaluate a conversation agent’s utterances  No concrete baseline for naturalness  Able to make relative measurements of naturalness between dialog agents

University of Central Floridawww.ucf.edu Approach: Chatbot Objectives  Walker et al’s PARAdigm for DIalogue System Evaluation (PARADISE)  Dialog performance relates to the experience of the interaction (means)  Task success is concerned with the utility of the dialog exchange (ends)  Objectives  Better than other dialog system solutions  Similar to a human-to-human (naturalness) interaction

University of Central Floridawww.ucf.edu Approach: Task Success  Measure of goal satisfaction  Attribute-value matrix  Derived from PARADISE  Expected vs. actual  Task success (κ) computed as the percentage of correct responses

University of Central Floridawww.ucf.edu Approach: Performance Function  Derived from PARADISE  Total effectiveness  Task success (κ) weighted by (α)  Dialog costs (c i ) weighted by (w i )  Function (N) uses Z-score normalization  Balance out (κ) and (c i )

University of Central Floridawww.ucf.edu Approach: Proposed System  Task success  Dialog costs  Efficiency  Resource consumption  Quantitative  Quality  Actual conversational content  Quantitative or qualitative

University of Central Floridawww.ucf.edu Questions