Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh
Carletta20 June AMI Partners
Carletta20 June NXT Major Development Sites
Carletta20 June AMI's aim aim: to develop technologies for browsing meetings and to assist people during meetings interdisciplinary: signal processing, language engineering, theoretical linguistics, human-computer interfaces, organizational psychology,...
Carletta20 June Why annotation? For basic scientific understanding - e.g., How do people choose a next speaker? What is the relationship between speech and gesture during deixis? For machine learning Hand-code e.g. statement vs. question Identify features for each like word sequences and prosody Use the data to fit a statistical classifier that codes new data automatically
Carletta20 June
Carletta20 June
Carletta20 June AMI Meeting Rooms 4 close- and 2 wide-view cameras, 4 head-set and 8 array microphones, presentation screen capture, whiteboard capture, pen devices, plus extra site-dependent devices TNOEdinburghIDIAP
Carletta20 June IS1004d, 3:07 - 4:11
Carletta20 June Corpus Overview 100 hrs of well-recorded meetings orthographically transcribed with word timings by forced alignment ASR output heavily annotated by hand for communicative behaviours Creative Commons Share-Alike licensing, with demo DVD
Carletta20 June Hand Annotations transcription with word-level timings from forced alignment (100%) timestamping against signal (10-30%) head gestures; hand gestures for addressing and interactions with objects; location in room; gaze; emotion? discourse structure (70%) dialogue acts (some w/ addressing), named entities, topic segments, linked extractive and abstractive summaries
Carletta20 June Costs in person-hrs/hr transcription30 topic segments + abstractive summaries6-10 dialogue acts w/ some relations20 addressing12 extractive summaries linked to abstract1 named entities2-5 hand gestures (rough timings)6 head gestures (rough timings)6 head gestures (precision timings)20 movement around room4
Carletta20 June Core Problems How do we represent all of these kinds of annotation on the same base data, including both structural relationships and timing? How do we allow for multiple (human and machine) annotations of the same property, so that we can compare them?
Carletta20 June
Carletta20 June
Carletta20 June NITE XML Toolkit Mature toolkit for handling annotations with temporal ordering and full structural relations Data storage format designed to support distributed corpus development Libraries for data handling, query, and writing graphical user interfaces End user annotation tools for common tasks Command line utilities for analysis, feature extraction Open source
Carletta20 June NXT corpus design data model is multi-rooted tree with arbitrary graph structure over the top each node has one set of children, multiple parents annotations often naturally map to a tree corpus design to decide where trees intersect NXT can represent arbitrary graphs but the more the data has this character, the less useful the query language is
Carletta20 June extract from Bdb001.A.words.xml time - line extract from Bdb001.A.speech-quality.xml Stand-off XML
Carletta20 June Metadata file Like set of DTDs for the XML files plus: connections between the files list of "observations" (coded dialogues/group discussions/texts) catalog for finding signals and data on disk
Carletta20 June Simple example query ($w word)($r reference): = “NN”) && ($r ^ $w) Return list of 2-tuples of words and referring expressions where the word’s part of speech is NN and the word is in the referring expression.
Carletta20 June General features of the language Match variable by no type, single type, or disjunctive type Attribute and content tests for existence, ordering, equality, match to regexp The usual boolean combinators Quantifiers forall and exists Filtering by passing results to another query to create a result tree (not list)
Carletta20 June Uses for queries Exploring the data in a browser Basic frequency counts Verifying data quality Indexing complexes for further use Finding things for screen rendering in GUI
Carletta20 June Only configuration needed to: search/index data in NXT format display data in a standardized (ugly) way Set up annotation tools for some common tasks dialogue act named entity time-stamped labelling
Carletta20 June [named entity demo]
Carletta20 June Programming tailored interfaces development time is 1.5 days - 2 weeks depending on how clear the spec is complexity of the interface and whether our "transcription view" middleware fits familiarity with Swing
Carletta20 June Named entity coder
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June
Carletta20 June Summary NXT provides infrastructure for collaborative annotation that Is distributed Provides structural relationships Provides timing w.r.t signals Works for large-scale projects NXT’s best current demonstration is in the AMI Meeting Corpus