Www.amiproject.org Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.

Chapter 3: Modules, Hierarchy Charts, and Documentation

Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh.

Programming Logic and Design Fourth Edition, Introductory

Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.

Information Retrieval in Practice

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.

Xyleme A Dynamic Warehouse for XML Data of the Web.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.

Term Project User Interface Specifications in a Usability Engineering Course: Challenges and Suggestions Laura Leventhal Julie Barnes Joe Chao Bowling.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.

Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...

Overview of Search Engines

 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.

Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.

HDF 1 NCSA HDF XML Activities Robert E. McGrath Mike Folk National Center for Supercomputing Applications.

Knowledge Systems Lab JN 8/24/2015 A Method for Temporal Hand Gesture Recognition Joshua R. New Knowledge Systems Laboratory Jacksonville State University.

Collecting, Storing, Coding, and Analyzing Spoken Tutorial Dialogue Corpora Diane Litman LRDC & Pitt CS.

Linux Operations and Administration

Computer Systems Week 10: File Organisation Alma Whitfield.

Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo

The NITE XML Toolkit Jean Carletta University of Edinburgh HCRC Language Technology Group.

Classroom User Training June 29, 2005 Presented by:

ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.

NXT meets the ICSI Corpus Jean Carletta and Jonathan Kilgour University of Edinburgh HCRC Language Technology Group.

GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Welcome to AMI Community of Interest Workshop February 4 and 5, 2008.

ATLAS Demystified: A Practical Introduction Christophe Laprun, Jonathan Fiscus, John Garofolo, Sylvain Pajot National Institute of Standards and Technology.

APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)

Selected Topics in Software Engineering - Distributed Software Development.

THE DISABILITY EXPERIENCE CONFERENCE ROBOTS TO MOTIVATE YOUNGHYUN CHUNG.

2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.

1 Workshop « Multimodal Corpora » Jean-Claude MARTIN Patrizia PAGGIO Peter KÜEHNLEIN Rainer STIEFELHAGEN Fabio PIANESI.

4 November 2000Bridging the Gap Workshop 1 Control of avatar gestures Francesca Barrientos Computer Science Division UC Berkeley.

Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.

CS3041 – Final week Today: Searching and Visualization Friday: Software tools –Study guide distributed (in class only) Monday: Social Imps –Study guide.

Information Retrieval

SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.

Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)

HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.

Using SRDR™ For Systematic Reviews of Diagnostic Tests SRDR is being developed and maintained by the Brown EPC under contract with the Agency for Healthcare.

Wednesday NI Vision Sessions

Information Retrieval in Practice

The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies

The AMI Meeting Corpus: A Pre-Announcement

Visual Information Retrieval

Search Engine Architecture

Course Outcomes of Object Oriented Modeling Design (17630,C604)

Unified Modeling Language

Introduction to Unified Modeling Language (UML)

Module 1: Getting Started

International Marketing and Output Database Conference 2005

DATABASES WHAT IS A DATABASE?

Vocabulary Algorithm - A precise sequence of instructions for processes that can be executed by a computer Low level programming language: A programming.

Use Cases Simple Machine Translation (using Rainbow)

Presentation transcript:

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh

Carletta20 June AMI Partners

Carletta20 June NXT Major Development Sites

Carletta20 June AMI's aim aim: to develop technologies for browsing meetings and to assist people during meetings interdisciplinary: signal processing, language engineering, theoretical linguistics, human-computer interfaces, organizational psychology,...

Carletta20 June Why annotation? For basic scientific understanding - e.g., How do people choose a next speaker? What is the relationship between speech and gesture during deixis? For machine learning Hand-code e.g. statement vs. question Identify features for each like word sequences and prosody Use the data to fit a statistical classifier that codes new data automatically

Carletta20 June

Carletta20 June

Carletta20 June AMI Meeting Rooms 4 close- and 2 wide-view cameras, 4 head-set and 8 array microphones, presentation screen capture, whiteboard capture, pen devices, plus extra site-dependent devices TNOEdinburghIDIAP

Carletta20 June IS1004d, 3:07 - 4:11

Carletta20 June Corpus Overview 100 hrs of well-recorded meetings orthographically transcribed with word timings by forced alignment ASR output heavily annotated by hand for communicative behaviours Creative Commons Share-Alike licensing, with demo DVD

Carletta20 June Hand Annotations transcription with word-level timings from forced alignment (100%) timestamping against signal (10-30%) head gestures; hand gestures for addressing and interactions with objects; location in room; gaze; emotion? discourse structure (70%) dialogue acts (some w/ addressing), named entities, topic segments, linked extractive and abstractive summaries

Carletta20 June Costs in person-hrs/hr transcription30 topic segments + abstractive summaries6-10 dialogue acts w/ some relations20 addressing12 extractive summaries linked to abstract1 named entities2-5 hand gestures (rough timings)6 head gestures (rough timings)6 head gestures (precision timings)20 movement around room4

Carletta20 June Core Problems How do we represent all of these kinds of annotation on the same base data, including both structural relationships and timing? How do we allow for multiple (human and machine) annotations of the same property, so that we can compare them?

Carletta20 June

Carletta20 June

Carletta20 June NITE XML Toolkit Mature toolkit for handling annotations with temporal ordering and full structural relations Data storage format designed to support distributed corpus development Libraries for data handling, query, and writing graphical user interfaces End user annotation tools for common tasks Command line utilities for analysis, feature extraction Open source

Carletta20 June NXT corpus design data model is multi-rooted tree with arbitrary graph structure over the top each node has one set of children, multiple parents annotations often naturally map to a tree corpus design to decide where trees intersect NXT can represent arbitrary graphs but the more the data has this character, the less useful the query language is

Carletta20 June extract from Bdb001.A.words.xml time - line extract from Bdb001.A.speech-quality.xml Stand-off XML

Carletta20 June Metadata file Like set of DTDs for the XML files plus: connections between the files list of "observations" (coded dialogues/group discussions/texts) catalog for finding signals and data on disk

Carletta20 June Simple example query ($w word)($r reference): = “NN”) && ($r ^ $w) Return list of 2-tuples of words and referring expressions where the word’s part of speech is NN and the word is in the referring expression.

Carletta20 June General features of the language Match variable by no type, single type, or disjunctive type Attribute and content tests for existence, ordering, equality, match to regexp The usual boolean combinators Quantifiers forall and exists Filtering by passing results to another query to create a result tree (not list)

Carletta20 June Uses for queries Exploring the data in a browser Basic frequency counts Verifying data quality Indexing complexes for further use Finding things for screen rendering in GUI

Carletta20 June Only configuration needed to: search/index data in NXT format display data in a standardized (ugly) way Set up annotation tools for some common tasks dialogue act named entity time-stamped labelling

Carletta20 June [named entity demo]

Carletta20 June Programming tailored interfaces development time is 1.5 days - 2 weeks depending on how clear the spec is complexity of the interface and whether our "transcription view" middleware fits familiarity with Swing

Carletta20 June Named entity coder

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June

Carletta20 June Summary NXT provides infrastructure for collaborative annotation that Is distributed Provides structural relationships Provides timing w.r.t signals Works for large-scale projects NXT’s best current demonstration is in the AMI Meeting Corpus