Download presentation
Presentation is loading. Please wait.
1
gesture features for coreference
Jacob Eisenstein Randall Davis MIT CSAIL
2
coreference resolution
when do two noun phrases refer to the same thing? "This circle is rotating clockwise and this piece of wood is attached at this point and this point but it can rotate. So as the circle rotates, this moves in and out. So this whole thing is just going back and forth."
3
coreference resolution
when do two noun phrases refer to the same thing? "This circle is rotating clockwise and this piece of wood is attached at this point and this point but it can rotate. So as the circle rotates, this moves in and out. So this whole thing is just going back and forth."
4
coreference resolution
“This Wheel” The same? “This Bar” “This”
5
coreference resolution
“This Wheel” Multimodal Coreference Resolution Demonstrative NP Singular / Neutral Gender Traditional Coreference Resolution The same? “This Bar” “This” Demonstrative NP Singular / Neutral Gender Pronoun Singular / Neutral Gender
6
coreference annotated cheaply and reliably
a building block for NLP applications summarization segmentation information retrieval
7
coreference and catchments
recurring gesture features match semantic patterns when gesture features disambiguate coreference catchment studying coreference gives a quantitative analysis of catchments
8
dataset new corpus of spontaneous multimodal communication
nine speaker-listener pairs explanations of mechanical device behavior manipulation: which modalities are available speech + {diagram | sketch | gesture only} for this study, it’s speech + diagram only more deixis, easier to interpret Total of 16 documents, 2-3 minutes in length
9
tracking hand position
motion, color, and edge cues are used to guide an articulated upper-body model 13DOF, 2.5D
10
particle filtering online search of model configurations
sampled representation to maintain multiple hypotheses at each time step: update weights based on new observation resample particles (with replacement) “drift” to capture system dynamics
11
extracted data position, velocity, acceleration
hands, arms, body and head occlusion model directly manually annotated speech transcripts force-aligned for time synchronization coreference annotations
12
gesture features features on pairs of gestures
to predict coreference features on individual gestures to predict whether an NP introduces a new entity to predict whether gesture is relevant to coreference
13
features on pairs of gestures
distance between gestures is the same hand gesturing?
14
features on individual gestures
speed jitter purpose = speed / jitter bimanual synchronization
15
results: pairwise features
distance between gestures (pixels) coreferent: mean distance = 48.4 non-coreferent: mean distance = 74.8 which hand is used? same hand different hands no gesture corefer 59.9 19.9 20.2 non-corefer 52.8 22.2 25.1
16
results: single-gesture features
does the NP have “parents?” not predicted by these features does the NP have “children?” predicted by speed, purpose
17
results: meta-features
correlate single-gesture features with discriminability of pairwise distance speed, purpose (r = -.17) x distance from body center (r = .22) regression of single gesture features (r = .42)
18
when do catchments happen?
what types of NP coreference are disambiguated by gesture? we assumed pronouns, “this.” not so. definite NPs are not predicted well by gesture
19
when do catchments happen?
there’s a lot of research on gesture-speech synchronization typically measures time at beginning of motion this is a different way to measure gesture-speech synchronization quite precisely
20
where do catchments happen?
21
future work move beyond deictic data, features
we have data without diagrams, which includes more representational gestures recognize or annotate hand shape pairwise features that compare gesture trajectories
22
done? almost
23
does gesture actually improve coreference resolution?
initial evaluation described in NAACL 2006 the answer is yes, but not by as much as you’d hope 54.9% with gestures, 52.8% without coreference resolution in spoken dialogues is hard better feature combination techniques may improve performance, as with prosody need to figure out how to use the meta-features
24
All Done! Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.