Download presentation
Presentation is loading. Please wait.
Published byBlaise Whitehead Modified over 8 years ago
1
Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter
2
2/12 23.09.2004M4 – Meeting Munich Overview Meeting Event Recognition (MER) by User Modelling MER from Audio-Signal MER from Binary Speech Profile MER from Transcriptions Late semantic fusion of three recognisers Integration of two feature streams via DBN‘s Segmentation based on higher semantic features
3
3/12 23.09.2004M4 – Meeting Munich Meeting Event Recognition Well known Meeting Events: Discussion Monologue 1 Monologue 2 Monologue 3 Monologue 4 Data: Scripted Meetings Note-taking Presentation Whiteboard (Consensus) (Disagreement)
4
4/12 23.09.2004M4 – Meeting Munich MER by User-Modelling Annotations User-State Meeting Event
5
5/12 23.09.2004M4 – Meeting Munich MER by User-Modelling cont. Definition of five states a participant can be in: sitting – silent sitting – silent – writing sitting – talking standing – talking standing – talking – writing Annotations User-State Meeting Event
6
6/12 23.09.2004M4 – Meeting Munich MER by User-Modelling cont. Two-step-approach based on annotations: From annotations to user-states: features: talking, writing, sitting, standing using SVMs 97.75 % From annotations to meeting-events: using SVMs 100.0 % Annotations User-State Meeting Event
7
7/12 23.09.2004M4 – Meeting Munich MER from Audio-Signal Using single lapel files: 12 MFCCs; cont. HMMs, 6 States 78.69 %
8
8/12 23.09.2004M4 – Meeting Munich MER from Binary Speech Profile Using the Speaker Turn Detection results from IDIAP Discrete HMMs, Codebook with 64 entries, 32 States 81.97 %
9
9/12 23.09.2004M4 – Meeting Munich MER from Transcriptions using transcriptions from media file sever 1-state-HMM, discrete all Monologues put together 60.61 %
10
10/12 23.09.2004M4 – Meeting Munich Late semantic fusion Joining the results of three entities (all 10 meeting events): MER from Annotations 82.79 % MER from Audio-Files 68.03 % MER from Transcriptions 44.44 % simple rule-based fusion system: If two or more results are equal, the fused result is considered that class. Otherwise the result with the highest score is taken. recognition rate after fusion: 86.07 %
11
11/12 23.09.2004M4 – Meeting Munich MER using DBNs Integration of two features streams: Binary-Speech- Profile (5Hz) Global-Motion- Features (12.5Hz) Recognition rate: 73.28 %
12
12/12 23.09.2004M4 – Meeting Munich Segmentation based on higher semantic features benefits from Speaker Turn Detection and Gesture Recognition (81.76 %) Segmentation via sliding windows Results:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.