A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information.

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics.

Targeted Meeting Understanding at CSLI Matthew Purver Patrick Ehlen John Niekrasz John Dowding Surabhi Gupta Stanley Peters Dan Jurafsky.

Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.

CS305: HCI in SW Development Evaluation (Return to…)

Cognitive Walkthrough More evaluation without users.

INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.

Department of Informatics, UC Irvine SDCL Collaboration Laboratory Software Design and sdcl.ics.uci.edu 1 Informatics 121 Software Design I Lecture 9 Duplication.

1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.

Important concepts in software engineering The tools to make it easy to apply common sense!

User and Task Analysis Howell Istance Department of Computer Science De Montfort University.

Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.

CS CS 5150 Software Engineering Lecture 12 Usability 2.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Awareness and Distributed Collaboration David Ledo.

Observing users.

An evaluation framework

An evaluation framework

User interface design Designing effective interfaces for software systems Objectives To suggest some general design principles for user interface design.

ICS 463, Intro to Human Computer Interaction Design: 8. Evaluation and Data Dan Suthers.

 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.

Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

An Integrated WebQuest by Judy DowneyJudy Downey HomeHome Overview Introduction Tasks Process Evaluation ConclusionOverviewIntroduction Tasks Process.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Student-Centered Coaching Instructional Design and Assessment Presented by Diane Sweeney Author of: Student-Centered Coaching (Corwin, 2010), Student-

Goals and Objectives.

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Unit 2: Engineering Design Process

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.

GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.

Title of Articulate Module (must match what’s on the VITALS calendar) Johnny Hippocrates, MD Assistant Professor of Western Medicine

CSA3212: User Adaptive Systems Dr. Christopher Staff Department of Computer Science & AI University of Malta Lecture 9: Intelligent Tutoring Systems.

Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.

Using a Story-Based Approach to Teach Grammar

Software Architecture

Strategies for Using Media Effectively in the Classroom.

Melissa Nelson EDU 521 Fall First Grade Standards Whole Class KWLLearning Centers Small Groups Math : Determine and compare sets of pennies.

Agents that Reduce Work and Information Overload and Beyond Intelligent Interfaces Presented by Maulik Oza Department of Information and Computer Science.

Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.

L C SL C S The Intelligent Room’s MeetingManager: A Look Forward Alice Oh Stephen Peters Oxygen Workshop, January, 2002.

Basic Science Terms  Observation: using the five senses to gather information, which can be proven (facts)  Inference: an opinion based on facts from.

Using a Story-Based Approach to Teach Grammar

Introduction Teaching without any reflection can lead to on the job. One way of identifying routine and of counteracting burnout is to engage in reflective.

Problem Solving. o You notice something, and wonder why it happens. o You see something and wonder what causes it. o You want to know how or why something.

EVALUATION PROfessional network of Master’s degrees in Informatics as a Second Competence – PROMIS ( TEMPUS FR-TEMPUS-JPCR)

Automatic recognition of discourse relations Lecture 3.

CPSC 871 John D. McGregor Module 8 Session 1 Testing.

What’s Ahead for Embedded Software? (Wed) Gilsoo Kim

The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.

Observing users. What and when to observe Goals & questions determine the paradigms and techniques used. Observation is valuable any time during design.

Classification using Co-Training

Object-Oriented Software Engineering Practical Software Development using UML and Java Modelling with Classes.

Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.

College of Education Meeting with the Professor: The Field Assignment Project (Case Study)

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Introduction to Machine Learning, its potential usage in network area,

John D. McGregor Session 9 Testing Vocabulary

John D. McGregor Session 9 Testing Vocabulary

Mapping it Out! Practical Tools to Use Assessment Well

John D. McGregor Session 9 Testing Vocabulary

Evaluation of Mobile Interfaces

SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy

Multimodal Human-Computer Interaction New Interaction Techniques 22. 1

Elluminate your World! A quick guide to the explanation on how to use the many features of Elluminate (in Blackboard) to communicate with a classmate or.

Cognitive Walkthrough

Presentation transcript:

A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University

The CALO Meeting Assistant Observe human-human meetings – Audio recording & speech recognition – Video recording & processing – Process written and typed notes, & whiteboard sketches Produce a useful record of the interaction

The CALO Meeting Assistant (cont’d) Offload some cognitive effort from participants during meetings Learn to do this better over time For now, focus on identifying: – Action items people commit to during meeting – Topics discussed during meeting

Human Interpretation Problems Compare to a new temp taking notes during meetings of Spacely Sprockets

Human Interpretation Problems (cont’d) Problems in recognizing and interpreting content: – Lack of lexical knowledge, specialized idioms, etc – Overhearer understanding problem (Schober & Clark, 1989)

Machine Interpretation Problems Machine interpretation: – Similar problems with lexicon – Trying to do interpretation from messy, overlapping multi-party speech transcribed from ASR, with multiple word-level hypotheses

Human Interpretation Solution Human temp can still do a good job (while understanding little) at identifying things like action items When people commit to do things with others, they adhere to a rough dialogue pattern Temp performs shallow discourse understanding Then gets implicit or explicit “corrections” from meeting participants

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves:

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task Somebody needs to do this TZ-3146!

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner Somebody needs to do this TZ-3146! I guess I could do it.

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe Can you do it by tomorrow?

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe – Agreement Sure.

How Do We Do Shallow Understanding of Action Items? Four types of dialogue moves: – Description of task – Owner – Timeframe – Agreement Excellent! Sounds good to me! Sweet! Sure.

Machine Interpretation Shallow understanding for action item detection: – Use our knowledge of this exemplary pattern – Skip over deep semantic processing – Create classifiers that identify those individual moves – Posit action items – Get feedback from participants after meeting

Challenge to Machine Learning and UI Design Detection challenges: – classify 4 different types of dialogue moves – want classifiers to improve over time – thus, need differential feedback on interpretations of these different types of dialogue moves Participants should see and evaluate our results while doing something that’s valuable to them And, from those user actions, give us the feedback we need for learning

Feedback Proliferation Problem To improve action item detection, need feedback on performance of five classifiers (4 utterance classes, plus overall “this is an action item” class) All on noisy, human-human, multi-party ASR results So, we could use a lot of feedback

Feedback Proliferation Problem (cont’d) Need a system to obtain feedback from users that is: – light-weight and usable – valuable to users (so they will use it) – can solicit different types of feedback in a non- intrusive, almost invisible way

Feedback Proliferation Solution Meeting Rapporteur – a type of meeting browser used after the meeting

Feedback Proliferation Solution (cont’d) Many “meeting browser” tools are developed for research, and focus on signal replay Ours: – tool to commit action items from meeting to user’s to-do list – relies on implicit user supervision to gather feedback to retrain classification models

Meeting Rapporteur

Action Items

Subclass hypotheses Top hyp is highlighted Mouse-over hyps to change them Click to edit them (confirm, reject, replace, create)

Action Items Superclass hypothesis delete = neg. feedback commit = pos. feedback merge, ignore

Feedback Loop Each participant’s implicit feedback for a meeting is stored as an “overlay” to the original meeting data – Overlay is reapplied when participant views meeting data again – Same implicit feedback also retrains models – Creates a personalized representation of meeting for each participant, and personalized classification models

Problem In practice (e.g., CALO Y3 CLP data): – seem to get a lot of feedback at the superclass level (i.e., people are willing to accept or delete an action item) – but not as much (other than implicit confirmation) at subclass level (i.e., people are not as willing to change descriptions, etc)

Questions User feedback provides information along different dimensions: – Information about the time an event (like discussion of an action item) happened – Information about the text that describes aspects of the event (like the task description, owner, and timeframe)

Questions (cont’d) Which of these dimensions contribute most to improving models during retraining? Which dimensions require more cognitive effort for the user when giving feedback? What is the best balance between getting feedback information and not bugging the user too much? What is the best use of initiative in such a system (user- vs. system- initiative)? – During meeting? – After meeting?

Experiments 2 evaluation experiments: – “Ideal feedback” experiment – Wizard-of-Oz experiment

Ideal Feedback Experiment Turn gold-standard human annotations of meeting data into posited “ideal” human feedback Using that ideal feedback to retrain, determine which dimensions (time, text, initiative) contribute most to improving classifiers

Ideal Feedback Experiment (cont’d) Results: – both time and text dimensions alone improve accuracy over raw classifier – using both time and text together performs best – textual information is more useful than temporal – user initiative provides extra information not gained by system-initiative

Wizard-of-Oz Experiment Create different Meeting Assistant interfaces and feedback devices (including our Meeting Rapporteur) See how real-world feedback data compares to the ideal feedback described above Assess how the tools affect and change behavior during meetings

Action Item Identification Use four classifiers to identify dialogue moves associated with action items in utterances of meeting participants Then posit the existence of an action item, along with its semantic properties (what, who, when) using those utterances uNuN u1u1 u2u2 Linearized utterances task  agreement owner timeframe

Like hiring a new temp to take notes during meetings of Spaceley Sprockets Even if we say, “Just write down the action items people agree to, and the topics,” That temp will run up against a couple problems in recognizing and interpreting content (rooted in the collaborative underpinnings of semantics): – Overhearer understanding problem (Schober & Clark, 1989) – Lack of vocabulary knowledge, etc machine overhearer that uses noisy multi-party transcripts is even worse We do what a human overhearer might do: shallow discourse understanding If you were to go into a meeting you might not understand what was being talked about, but you could understand when somebody agreed to do something. why:? Because when people make commitments to do things with others, they typically adhere to a certain kind of dialogue pattern

Superclass Feedback Actions Superclass hypothesis delete = neg. feedback commit = pos. feedback (add to “to- do” list)

A Great Example

Some Bad Examples

Feedback Loop Diagram