Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information.
Grantee Program Plan. Components of the Program Plan Goals  Objectives  Activities  Techniques & Tools  Outcomes/Change.
Targeted Meeting Understanding at CSLI Matthew Purver Patrick Ehlen John Niekrasz John Dowding Surabhi Gupta Stanley Peters Dan Jurafsky.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Search Engines and Information Retrieval
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
Information Retrieval in Practice
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
1 of 6 Parts of Your Notebook Below is a graphic overview of the different parts of a OneNote 2007 notebook. Microsoft ® OneNote ® 2007 notebooks are digital.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
From Scenarios to Paper Prototypes Chapter 6 of About Face Defining requirements Defining the interaction framework.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction: Ice Breaker 1. What is your job title and organization? 2. What are you really good at? 3. What is your biggest personal accomplishment thus.
CSSE 533 – Database Systems Week 1, Day 1 Steve Chenoweth CSSE Dept.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Oral Communication The description of the oral communication task indicates two priorities – the development of basic research skills and the communication.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A comparison of two methods of synchronous (real time) interaction in distance learning Jane Montague University of Derby
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Educational Action Research Todd Twyman Summer 2011 Week 2.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Documentation and Comments. What’s a comment? A comment is a simple form of documentation. Documentation is text that you the programmer write to explain.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Educational Action Research Todd Twyman Summer 2011 Week 1.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Mining for What’s Missing: How to Find What’s Not in the Speech Application’s Vocabulary AMY NEUSTEIN, Ph.D. LINGUISTIC TECNOLOGY SYSTEMS
Allison Bloodworth, Senior User Interaction Designer, Educational Technology Services, University of California - Berkeley October 22, 2015 User Needs.
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Interview Techniques (Hand-in partner preferences) Thursday In-class Interviewing.
Annotating Texts and Taking Notes
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
A Use Case Primer 1. The Benefits of Use Cases  Compared to traditional methods, use cases are easy to write and to read.  Use cases force the developers.
Creating Graphical User Interfaces (GUI’s) with MATLAB By Jeffrey A. Webb OSU Gateway Coalition Member.
Systems Analysis and Design in a Changing World, 6th Edition
Chapter 7 The Practices: dX. 2 Outline Iterative Development Iterative Development Planning Planning Organizing the Iterations into Management Phases.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
E VALUATING YOUR E - LEARNING COURSE LTU Workshop 11 March 2008.
Discourse & Dialogue CS 359 November 13, 2001
+ ENG 105i Writing in Business Social Media Bootcamp & Interview Prep Day 1 September 11, 2015.
1 Choosing a Computer Science Research Problem. 2 Choosing a Computer Science Research Problem One of the hardest problems with doing research in any.
Domain Act Classification using a Maximum Entropy model Lee, Kim, Seo (AAAI unpublished) Yorick Wilks Oxford Internet Institute and University of Sheffield.
National Board Study Group Meeting Dan Barber 5 th Grade Teacher, Irwin Academic Center
Task Analysis Lecture # 8 Gabriel Spitz 1. Key Points  Task Analysis is a critical element of UI Design  It describes what is a user doing or will.
Task Analysis Lecture # 8 Gabriel Spitz 1. Key Points  Task Analysis is a critical element of UI Design  It specifies what functions the user will need.
LITERACY-BASED DISTRICT-WIDE PROFESSIONAL DEVELOPMENT Aiken County Public School District January 15, 2016 LEADERS IN LITERACY CONFERENCE.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Today Discussion Follow-Up Interview Techniques Next time Interview Techniques: Examples Work Modeling CD Ch.s 5, 6, & 7 CS 321 Human-Computer Interaction.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
ARMA Boston Spring Seminar 2011 Jesse Wilkins, CRM.
Information Retrieval in Practice
Meetings Dr. E. ElSherief
Development Environment
Maths Information Evening
Interaction Styles.
By Dr. Abdulrahman H. Altalhi
CS 139 – Programming Fundamentals
CSE 635 Multimedia Information Retrieval
Strategic Enhancement
Presentation transcript:

Detecting Action Items in Multi-Party Meetings: Annotation and Initial Experiments Matthew Purver, Patrick Ehlen, John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University

The CALO Project Multi-institution, multi-disciplinary project Working towards an intelligent personal assistant that learns Three major areas – managing personal data  clustering , documents, managing contacts – assisting with task execution  learning to carry out computer-based tasks – observing interaction in meetings

The CALO Meeting Assistant Observe human-human meetings – Audio recording & speech recognition (ICSI/CMU) – Video recording & processing (MIT/CMU) – Written notes, via digital ink (NIS) or typed (CMU) – Whiteboard sketch recognition (NIS) Produce a useful record of the interaction – answer questions about what happened – can be used by attendees or non-attendees Learn to do this better over time (LITW)

The CALO Meeting Assistant Primary focus on end-user Develop something that can really help people when it comes to dealing with all of the meetings we have to deal with

What do people want to know from meetings?

Banerjee et al. (2005) survey of 12 academics: – Missed meeting - what do you want to know? – Topics: which were discussed, what was said? – Decisions: what decisions were made? – Action items/tasks: was I assigned something?

What do people want to know from meetings? Banerjee et al. (2005) survey of 12 academics: – Missed meeting - what do you want to know? – Topics: which were discussed, what was said? – Decisions: what decisions were made? – Action items/tasks: was I assigned something? Lisowska et al. (2004) survey of 28 people: – What would you ask a meeting reporter system? – Similar responses about topics, decisions – who attended, who asked/decided what? – Did they talk about me?

Purpose Helpful system: not only records and transcribes a meeting, but extracts (from streams of potentially messy human-human speech): – topics discussed – decisions made – tasks assigned (“action items”) The system should highlight this information over meeting “noise”

Example Impromptu meeting you might have after your team has boarded a rebel spacecraft in search of stolen plans, and you’re trying to figure out what to do next

Commander, tear this ship apart until you’ve found those plans!

A section of discourse in a meeting where someone is made responsible to take care of something

Action Items Concrete decisions; public commitments to be responsible for a particular task Want to know: – Can we find them? – Can we produce useful descriptions of them? Not aware of previous discourse-based work

Action Item Detection in Corston-Oliver et al., 2004 Marked a corpus of with “dialogue acts” Task act: – “items appropriate to add to an ongoing to-do list” Good inter-annotator agreement (kappa > 0.8) Per-sentence classification using SVMs – lexical features e.g. n-grams; punctuation; message features – f-scores around 0.6

A First Try: Flat Annotation Gruenstein et al (2005) analyzed 65 meetings annotated from: – ICSI Meeting Corpus (Janin et al., 2003) – ISL Meeting Corpus (Burger et al., 2002) Two human annotators “Mark utterances relating to action items” – create groups of utterances for each AI – made no distinction between utterance type/role

A First Try: Flat Annotation (cont’d) Annotators identified 921 / 1267 (respectively) action item-related utterances Human agreement poor (  < 0.4) Tried binary classification using SVMs (like Corston-Oliver) Precision, recall, f-score: all below.25

Try a more restricted dataset? Sequence of 5 (related) CALO meetings – similar amount of ICSI/ISL data for training Same annotation schema SVMs with words & n-grams as features – Also tried other discriminative classifiers, and 2- & 3- grams, w/ no improvements Similar performance – Improved f-scores ( ), but still poor – Recall up to 0.67, precision still low (< 0.36)

Should we be surprised? Our human annotator agreement poor DAMSL schema has dialogue acts Commit, Action-directive – annotator agreement poor (  ~ 0.15) – (Core & Allen, 1997) ICSI MRDA dialogue act commit – Most DA tagging work concentrates on 5 broad DA classes Perhaps “action items” comprise a more heterogeneous set of utterances

Rethinking Action Item Acts Maybe action items are not aptly described as singular “dialogue acts” Rather: multiple people making multiple contributions of several types Action item-related utterances represent a form of group action, or social action That social action has several components, giving rise to a heterogeneous set of utterances What are those components?

Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item

Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item A description of the task itself is given

Commander, tear this ship apart until you’ve found those plans! A person commits or is committed to “own” the action item A description of the task itself is given A timeframe is specified

Yes, Lord Vader! A person commits or is committed to “own” the action item A description of the task itself is given A timeframe is specified Some form of agreement

Exploiting discourse structure Action items have distinctive properties – Task description, owner, timeframe, agreement Action item utterances can simultaneously play different roles – assigning properties – agreeing/committing These classes may be more homogeneous & distinct than looking for just “action item” utts. – Could improve classification performance

New annotation schema Annotated and classified again using the new schema Classify utterances by their role in the action item discourse – can play more than one role Define action items by grouping subclass utterances together in an action-item discussion – a subclass can be missing

Action Item discourse: an example

New Experiment Annotated same set of CALO/ICSI/ISL data using the new schema Ran classifiers to train and identify utterances that contain each of the 4 subclasses

Encouraging signs Between-class distinction (cosine distances) – Agreement vs. any other is good: 0.05 to 0.12 – Timeframe vs. description is OK: 0.25 – Owner/timeframe/description: 0.36 to 0.47 Improved inter-annotator agreement? – Timeframe:  = 0.86 – Owner 0.77, agreement & description 0.73 – Warning: this is only on one meeting, although it’s the most difficult one we could find

Combined classification Still don’t have enough data for proper combined classification – Recall 0.3 to 0.5, precision 0.1 to 0.5 – Agreement subclass is best, with f-score = 0.40 Overall decision based on sub-classifier outputs Ad-hoc heuristic: – prior context window of 5 utterances – agreement plus one other class

Questions we can ask Does overall classification look useful? – Whole-AI-based f-score 0.40 to 1.0 (one meeting perfectly correlated with human annotation) Does overall output improve sub-classifiers? – Agreement: f-score 0.40  0.43 – Timescale: f-score 0.26  0.07 – Owner: f-score 0.12  0.24 – Description: f-score 0.33  0.24

Example output From a CALO meeting:  t = [the, start, of, week, three, just, to]  o = [reconfirm, everything, and, at, that, time, jack, i'd, like, you, to, come, back, to, me, with, the]  d = [the, details, on, the, printer, and, server]  a = [okay] Another (less nice?) example:  o = [/h#/, so, jack, /uh/, for, i'd, like, you, to]  d = [have, one, more, meeting, on, /um/, /h#/, /uh/]  t = [in, in, a, couple, days, about, /uh/]  a = [/ls/, okay]

Where next for action items? More data annotation – Using NOMOS, our annotation tool Meeting browser to get user feedback Improved individual classifiers Improved combined classifier – maximum entropy model – not enough data yet Moving from words to symbolic output – Gemini (Dowding et al., 1990) bottom-up parser

Questions we can ask Does overall classification look useful? – Whole-AI-based f-score 0.40 to 1.0 (one meeting perfectly correlated with human annotation) Does overall output improve sub-classifiers? – Agreement: f-score 0.40  0.43 – Timescale: f-score 0.26  0.07 – Owner: f-score 0.12  0.24 – Description: f-score 0.33  0.24