Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap.

modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap research institute martigny, switzerland

meetings as sequences of actions –human interaction similar/complementary roles individuals constrained by group –agenda: prior sequence discussion points presentations decisions to be made –minutes: posterior sequence key phases summarized discussions decisions made

the goal: recognizing sequences of meeting actions PresentationGroup Discussion WhetherBudget High Neutral Discussion Phase Topic Group Interest Level Information SharingDecision MakingGroup Task Timeline meeting views group-level actions = meeting actions

our work: two-layer HMMs decompose the recognition problem both layers use HMMs –individual action layer: I-HMM: various models –group action layer: G-HMM

our work in detail 1.definition of meeting actions 2.audio-visual observations 3.action recognition 4.results D. Zhang et al, “Modeling Individual and Group Actions in Meetings with Layered HMMs”, IEEE CVPR Workshop on Event Mining, 2004. N. Oliver et al, ICMI 2002. I. McCowan et al, ICASSP 2003, PAMI 2005.

1. defining meeting actions multiple parallel views –tech-based: what we can recognize? –application-based: respond to user needs –psychology-based: coding schemes from social psychology each view a set of actions A = { A 1, A 2, A 3, A 4, …, A N } actions in a set –consistent: one view, answering one question –mutually exclusive –exhaustive

multi-modal turn-taking describes the group discussion state A = { ‘discussion’, ‘monologue’ (x4), ‘white-board’, ‘presentation’, ‘note-taking’, ‘monologue + note-taking’ (x4), ‘white-board + note-taking’, ‘presentation + note-taking’} individual actions I = { ‘speaking’, ‘writing’, ‘idle’} actions are multi-modal in nature

example PresentationUsed Person 2WSW Person 1SSW Person 3WSSW Person 4SWS WhiteboardUsed Monologue1 + Note-taking Group ActionDiscussion Presentation + Note-taking Whiteboard + Note-taking W W

2. audio-visual observations audio 12 channels, 48 kHz 4 lapel microphones 1 microphone array video 3 CCTV cameras all synchronized

multimodal feature extraction: audio microphone array –speech activity (SRP-PHAT) seats presentation/whiteboard area –speech/silence segmentation lapel microphones –speech pitch –speech energy –speaking rate

multimodal feature extraction: video head + hands blobs –skin colour models (GMM) –head position –hands position + features (eccentricity,size,orientation) –head + hands blob motion moving blobs from background subtraction

3. recognition with two-layer HMM each layer trained independently trained as in ASR (Torch) simultaneous segmentation and recognition compared with single-layer HMM –smaller observation spaces –I-HMM trained with much more data –G-HMM less sensitive to feature variations –combinations can be explored

models for I-HMM early integration –all observations concatenated –correlation between streams –frame-synchronous streams asynchronous (Bengio, NIPS 2002) –a and v streams with single state sequence –states emit on one or both streams, given a sync variable – inter-stream asynchrony multi-stream (Dupont, TMM 2000) –HMM per stream (a or v), trained independently –decoding: weighted likelihoods combined at each frame –little inter-stream asynchrony –multi-band and a-v ASR

linking the two layers hard decision i-action model with highest probability outputs 1; all other models output 0. soft decision outputs probability for each individual action model Audio-visual features HD: (1, 0, 0) SD: (0.9, 0.05, 0.05)

59 meetings (30/29 train/test) four-people, five-minute scripts –schedule of actions –natural behavior features: 5 f/s 4. experiments: data + setup mmm.idiap.ch

performance measures individual actions: frame error rate (FER) group actions: action error rate (AER) Subs: number of substituted actions Del: number of deleted actions Ins: number of added actions Total actions: number of target actions

results: individual actions (0.8,0.2) 43000 frames (0.2-2.2s) visual-only audio-only audio-visual asynchronous effects between modalities accuracy: speaking: 96.6 %, writing: 90.8%, idle: 81.5%

results: group actions multi-modality outperforms single modalities two-layer HMM outperforms single- layer HMM for a- only, v-only and a-v best model: A-HMM soft decision slightly better than hard decision 8% improvement, significant at 96% level

action-based meeting structuring

conclusions structuring meetings as sequences of meeting actions –layered HMMs successful for recognition –turn-taking patterns: useful for browsing –public dataset, standard evaluation procedures open issues –less training data (unsupervised, acm mm04) –other relevant actions (interest-level, icassp05) –other features (words, emotions) –efficient models for many interacting streams

Linking Two Layers (1)

Linking Two Layers (2) Normalization Please refer to: D. Zhang, et al “Modeling Individual and Group Actions in Meetings: a Two- Layer HMM Framework”. In IEEE Workshop on Event Mining, CVPR, 2004.

Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap.

Similar presentations

Presentation on theme: "Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap.

Similar presentations

Presentation on theme: "Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap."— Presentation transcript:

Similar presentations

About project

Feedback