Information Technology – Dialogue Systems Ulm University (Germany) Alexander Schmitt, Gregor Bertrandt, Tobias Heinroth,

Slides:



Advertisements
Similar presentations
10 september 2002 A.Broersen Developing a Virtual Piano Playing Environment By combining distributed functionality among independent Agents.
Advertisements

1 August 9, David Claiborn SLM Tuning: Lessons Learned.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
1 / 22 Issues in Text Similarity and Categorization Jordan Smith – MUMT 611 – 27 March 2008.
Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Language Comprehension Speech Perception Naming Deficits.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
TEL 355: Communication and Information Systems in Organizations Speech-Enabled Interactive Voice Response Systems Professor John F. Clark.
SPEECH PROCESSING AND BRAIN SIGNATURES OF SPEECH, PARTICULARLY IN DISTINGUISHING TRUE/FALSE OR YES/NO RESPONSES  Speech processing can refer either to.
Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, Institute for Intelligent.
1 A Practical Rollout & Tuning Strategy Phil Shinn 08/06.
Customizing Cyberspace: Methods for User Representation and Prediction Amund Tveit Department of Computer and Information Science Norwegian University.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Turning Audio Search and Speech Analytics into Business Intelligence.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
1 Automatic Classification of Bookmarked Web Pages Chris Staff First Talk February 2007.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.
A Language Independent Method for Question Classification COLING 2004.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Annual Review Meeting - Introduction.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
M. Liu, T. Stanley, J. Baca and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL:
“Distributed Planning in a Mixed- Initiative Environment” Authors: Chad DeStefano Kurt Lachevet Joseph Carozzoni USAF / AFRL Rome Research Site Collaborative.
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
Experiences with Undergraduate Research (Natural Language Processing for Educational Applications) Professor Diane Litman University of Pittsburgh.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
By Akhilesh K. Sinha Nishant Singh Supervised by Prof. Amitabha Mukerjee Video Surveillance of Basketball Matches and Goal Detection Indian Institute of.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
TagHelper Track Overview Carolyn Penstein Rosé Carnegie Mellon University Language Technologies Institute & Human-Computer Interaction Institute School.
Introduction to Machine Learning, its potential usage in network area,
BRAIN Alliance Research Team Annual Progress Report (Jul – Feb
Eick: Introduction Machine Learning
Towards Emotion Prediction in Spoken Tutoring Dialogues
Conditional Random Fields for ASR
For Evaluating Dialog Error Conditions Based on Acoustic Information
Spoken Dialogue Systems
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
What is Pattern Recognition?
Data Warehousing and Data Mining
Spoken Dialogue Systems
Saul Greenberg Human Computer Interaction Presented by: Kaldybaeva A., Aidynova E., 112 group Teacher: Zhabay B. University of International Relations.
John H.L. Hansen & Taufiq Al Babba Hasan
AHED Automatic Human Emotion Detection
AHED Automatic Human Emotion Detection
Christoph F. Eick: A Gentle Introduction to Machine Learning
Presentation transcript:

Information Technology – Dialogue Systems Ulm University (Germany) Alexander Schmitt, Gregor Bertrandt, Tobias Heinroth, Wolfgang Minker LREC Conference, Valletta, Malta | May 2010 WITcHCRafT: A Workbench for Intelligent exploraTion of Human ComputeR conversations

| LREC Conference, Valletta, Malta | May 2010 Page 2 Overview Motivation Prediction and Classification Models Features Demo

| LREC Conference, Valletta, Malta | May 2010 Page 3 low medium high Complexity Weather Information Stock Trading Package Tracking Flight Reservation Banking Customer Care Technical Support Informational Transactional Problem Solving Motivation I: Interactive Voice Response Development Vision: Create a framework that allows an exploration and mining of huge dialog corpora Vision: Create a framework that allows an exploration and mining of huge dialog corpora How to handle, explore and mine corpora of 100k dialogues with 50 exchanges and more?

| LREC Conference, Valletta, Malta | May 2010 Page 4 Motivation II: Towards Intelligent IVRs Strive for “intelligent” Voice User Interfaces Many studies that explore –Emotional State, Gender, Age, Native/Non-”Nativeness”, Dialect etc. (Metze et al., Burkhardt et al., Lee & Narayanan, Polzehl et al.) -Probability of Task Completion (Walker et al., Levin & Pieraccini, Paek & Horvitz, Schmitt et al.) -… Evaluation takes place on corpus level, i.e. Batch-Testing Vision: Create a framework that simulates the deployment of prediction models on specific dialogs Vision: Create a framework that simulates the deployment of prediction models on specific dialogs What does it mean for the user when we deploy an anger detection system that reaches 78% accuracy?

| LREC Conference, Valletta, Malta | May 2010 Page 5 Introducing Witchcraft

| LREC Conference, Valletta, Malta | May 2010 Page 6 Training Prediction and Classification Models

| LREC Conference, Valletta, Malta | May 2010 Page 7 Employing Prediction Models in Witchcraft Procedure Define model in Witchcraft, e.g. “Age Model”, „Cooperativity Model“ etc. Determine which type it belongs to –Discriminative binary classification –Discriminative multi-class classification –Regression Define Machine Learning Framework and Process Definition –currently RapidMiner or XML interface “Brain” the call

| LREC Conference, Valletta, Malta | May 2010 Page 8 What can Witchcraft do for you? Exploring and Mining Manage large dialog corpora Group different calls by category Simulate the interaction between user and system based on interaction logs Listen to –full recordings –concatenated user utterances Implement own plugins Model Testing Analyze the impact of your classifiers on an ongoing interaction Evaluate discriminative classification and regression models Retrieve precision, recall, f-score, accuracy, least mean squared error etc. on call level Search for calls with low performance Tune your model Technical Things …and… Based on Java and Eclipse RCP Database: MySQL Currently connected Machine Learner: RapidMiner Get your download at witchcraftwb.sourceforge.org

| LREC Conference, Valletta, Malta | May 2010 Page 9 Adaptability to Your Corpus Exploring, Mining and Managing straight-forward Parse your interaction logs into Witchcraft DB structure Provide path to WAVs Play Model testing Create a process that delivers one XML per turn as prediction Discriminative Classification Regression

Thank you for your attention! See you at witchcraftwb.sourceforge.net

References [1] A. Batliner and R. Huber. Speaker characteristics and emotion classification. pages 138–151, [2] P. Boersma. Praat, a System for Doing Phonetics by Computer. Glot International, 5(9/10):341–345, [5] F. Burkhardt, A. Paeschke, M. Rolfes,W. F. Sendlmeier, and B.Weiss. A Database of German Emotional Speech. In European Conference on Speech and Language Processing (EUROSPEECH), pages 1517– 1520, Lisbon, Portugal, Sep [8] R. Leonard and G. Doddington. TIDIGITS speech corpus. Texas Instruments, Inc, [9] F. Metze, J. Ajmera, R. Englert, U. Bub, F. Burkhardt, J. Stegmann, C. Müller, R. Huber, B. Andrassy, J. Bauer, and B. Littel. Comparison of four approaches to age and gender recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 1, [10] F. Metze, R. Englert, U. Bub, F. Burkhardt, and J. Stegmann. Getting closer: tailored human computer speech dialog. Universal Access in the Information Society. [11] I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, editors, KDD ’06, New York, NY, USA, August ACM. [13] A. Schmitt and J. Liscombe. Detecting Problematic Calls With Automated Agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee (Germany), June References [1] A. Batliner and R. Huber. Speaker characteristics and emotion classification. pages 138–151, [2] P. Boersma. Praat, a System for Doing Phonetics by Computer. Glot International, 5(9/10):341–345, [5] F. Burkhardt, A. Paeschke, M. Rolfes,W. F. Sendlmeier, and B.Weiss. A Database of German Emotional Speech. In European Conference on Speech and Language Processing (EUROSPEECH), pages 1517– 1520, Lisbon, Portugal, Sep [8] R. Leonard and G. Doddington. TIDIGITS speech corpus. Texas Instruments, Inc, [9] F. Metze, J. Ajmera, R. Englert, U. Bub, F. Burkhardt, J. Stegmann, C. Müller, R. Huber, B. Andrassy, J. Bauer, and B. Littel. Comparison of four approaches to age and gender recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 1, [10] F. Metze, R. Englert, U. Bub, F. Burkhardt, and J. Stegmann. Getting closer: tailored human computer speech dialog. Universal Access in the Information Society. [11] I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, editors, KDD ’06, New York, NY, USA, August ACM. [13] A. Schmitt and J. Liscombe. Detecting Problematic Calls With Automated Agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee (Germany), June 2008.