Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter.

Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter

2/12 23.09.2004M4 – Meeting Munich Overview Meeting Event Recognition (MER) by User Modelling MER from Audio-Signal MER from Binary Speech Profile MER from Transcriptions Late semantic fusion of three recognisers Integration of two feature streams via DBN‘s Segmentation based on higher semantic features

3/12 23.09.2004M4 – Meeting Munich Meeting Event Recognition Well known Meeting Events: Discussion Monologue 1 Monologue 2 Monologue 3 Monologue 4 Data: Scripted Meetings Note-taking Presentation Whiteboard (Consensus) (Disagreement)

4/12 23.09.2004M4 – Meeting Munich MER by User-Modelling Annotations User-State Meeting Event

5/12 23.09.2004M4 – Meeting Munich MER by User-Modelling cont. Definition of five states a participant can be in: sitting – silent sitting – silent – writing sitting – talking standing – talking standing – talking – writing Annotations User-State Meeting Event

6/12 23.09.2004M4 – Meeting Munich MER by User-Modelling cont. Two-step-approach based on annotations: From annotations to user-states: features: talking, writing, sitting, standing using SVMs  97.75 % From annotations to meeting-events: using SVMs  100.0 % Annotations User-State Meeting Event

7/12 23.09.2004M4 – Meeting Munich MER from Audio-Signal Using single lapel files: 12 MFCCs; cont. HMMs, 6 States  78.69 %

8/12 23.09.2004M4 – Meeting Munich MER from Binary Speech Profile Using the Speaker Turn Detection results from IDIAP Discrete HMMs, Codebook with 64 entries, 32 States  81.97 %

9/12 23.09.2004M4 – Meeting Munich MER from Transcriptions using transcriptions from media file sever 1-state-HMM, discrete all Monologues put together  60.61 %

10/12 23.09.2004M4 – Meeting Munich Late semantic fusion Joining the results of three entities (all 10 meeting events): MER from Annotations 82.79 % MER from Audio-Files 68.03 % MER from Transcriptions 44.44 % simple rule-based fusion system: If two or more results are equal, the fused result is considered that class. Otherwise the result with the highest score is taken. recognition rate after fusion: 86.07 %

11/12 23.09.2004M4 – Meeting Munich MER using DBNs Integration of two features streams: Binary-Speech- Profile (5Hz) Global-Motion- Features (12.5Hz) Recognition rate: 73.28 %

12/12 23.09.2004M4 – Meeting Munich Segmentation based on higher semantic features benefits from Speaker Turn Detection and Gesture Recognition (81.76 %) Segmentation via sliding windows Results:

Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter.

Similar presentations

Presentation on theme: "Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter.

Similar presentations

Presentation on theme: "Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter."— Presentation transcript:

Similar presentations

About project

Feedback