Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter.

Slides:



Advertisements
Similar presentations
BUILDING THE ASSOCIATION. Defining Your Business Identity YOU Who you are What you like What you want WHAT YOU SELL What it is or does What it costs you.
Advertisements

Descriptive schemes for facial expression introduction.
Creating Elluminate Live! Recordings Best Practices.
An overview of EMMA— Extensible MultiModal Annotation Michael Johnston AT&T Labs Research 8/9/2006.
The Steerable Projector and Camera Unit in an Instrumented Environment Lübomira Spassova Saarland University, Saarbrücken, Germany.
HORIZONT 1 TWS/Graph HORIZONT Software for Datacenters Garmischer Str. 8 D München Tel ++49(0)89 / TWS/Graph The.
Didier Perroud Raynald Seydoux Frédéric Barras.  Abstract  Objectives  Modalities ◦ Project modalities ◦ CASE/CARE  Implementation ◦ VICI, Iphone,
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Sriram Tata SID: Introduction: Large digital video libraries require tools for representing, searching, and retrieving content. One possibility.
Browser Evaluation Test …A Trial Run Pierre Wellner & Mike Flynn, IDIAP Fribourg Nov 26, 2004 Mike Flynn, Pierre Wellner IDIAP Simon Tucker, Steve Whittaker.
CSCW – Evaluation P. Dillenbourg & N. Nova Evaluation & Exam.
Recording Meetings with the CMU Meeting Recorder Architecture Satanjeev Banerjee, et al. School of Computer Science Carnegie Mellon University.
MUSCLE- Network of Excellence Movie Summarization and Skimming Demonstrator ICCS-NTUA (P. Maragos, K. Rapantzikos, G. Evangelopoulos, I. Avrithis) AUTH.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
1 Action Automated Security Breach Reporting and Corrections.
MTA SZTAKI An assistive interpreter tool using glove-based hand gesture recognition Department of Distributed Systems Péter Mátételki Máté Pataki Sándor.
MAT 1235 Calculus II Section 7.1 Integration By Parts
Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo
© 2011 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Matt May | Accessibility Evangelist Accessible Web Conferencing with Adobe.
Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.
Semantic Indexing of multimedia content using visual, audio and text cues Written By:.W. H. Adams. Giridharan Iyengar. Ching-Yung Lin. Milind Ramesh Naphade.
New Meeting IDIAP Daniel Gatica-Perez, Iain McCowan, Samy Bengio Corpus Administration – Joanne Schulz Technical Assistance – Thierry Collado,
September M4 meeting Delft1 VUT Brno meeting data collection for M4 Petr Jenderka, Stanislav Sumec, Igor Potůček, Petr Motlíček, Jan Černocký,
Exploiting video information for Meeting Structuring ….
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {
1 Multimodal Group Action Clustering in Meetings Dong Zhang, Daniel Gatica-Perez, Samy Bengio, Iain McCowan, Guillaume Lathoud IDIAP Research Institute.
CapturaTalk4Android Demonstration Abi James
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Multimodal Integration for Meeting Group Action Segmentation and Recognition M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll,
Windows Media Format. The key features of Windows Media Format Included Microsoft Windows Media Video/Audio 9 codec Included Microsoft Windows Media Video/Audio.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Dynamic Bayesian Networks for Meeting Structuring Alfred Dielmann, Steve Renals (University of Sheffield)
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.
M Institute for Human-Machine Communication Munich University of Technology Sascha Schreiber Face Tracking and Person Action.
Modeling individual and group actions in meetings with layered HMMs dong zhang, daniel gatica-perez samy bengio, iain mccowan, guillaume lathoud idiap.
Instructions  Students will be split up into groups of 3-4 ›Turn your desks to face each other ›Do not talk to anyone outside your group ›Each student.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Operating System 2 Overview. OPERATING SYSTEM OBJECTIVES AND FUNCTIONS.
Head Tracking in Meeting Scenarios Sascha Schreiber.
ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008.
CAMEO: Year 1 Progress and Year 2 Goals Manuela Veloso, Takeo Kanade, Fernando de la Torre, Paul Rybski, Brett Browning, Raju Patil, Carlos Vallespi, Betsy.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Christian Groves Describing Captures in CLUE and relation to multipoint conferencing draft-groves-clue-multi-content-00 CLUE Interim meeting (09/13)
Dr Liz Lyon Associate Director, Outreach Funders: Engaging the Users: the Outreach & Community Support Programme Digital Curation Centre a centre of expertise.
Automatic Video Editing Stanislav Sumec. Motivation  Multiple source video data – several cameras in the meeting room, several meeting rooms in teleconference,
1 Detecting Group Interest-level in Meetings Daniel Gatica-Perez, Iain McCowan, Dong Zhang, and Samy Bengio IDIAP Research Institute, Martigny, Switzerland.
Video Active Presentation Agenda: –Demonstration of videoactive.eu Frontend and Backend fiatifta.dk Copenhagen September 2008.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Principles of Demonstrative Instructional Video Peyton R. Glore Assistant Professor School of Information Technology Macon State College October 17, 2007.
Image by: Fotolia / skvoor Web conferencing Image by: Fotolia / skvoor.
Automatic Transcription of Polyphonic Music
Faculty of Information Technology, Brno University of Technology, CZ
Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Everything You Need in One Simple Overview…. July 27, 2016
Simon Tucker and Steve Whittaker University of Sheffield
Tasks processing Strong participation . Large results comparison on
Is there a meeting in this room right now?
SIMPLE. SECURE. RELIABLE MESSAGING APP SIMILAR TO WHATSAPP (LIVE CHAT).
Studying Spoken Language Text 17, 18 and 19
Video Processing Steps Face Detection Activity Recognition
Presented by: Mónica Domínguez
Integration: Definite Integration
Presentation transcript:

Segmentation and Recognition of Meeting Events M4 – Meeting Munich 23. September 2004 Stephan Reiter

2/ M4 – Meeting Munich Overview Meeting Event Recognition (MER) by User Modelling MER from Audio-Signal MER from Binary Speech Profile MER from Transcriptions Late semantic fusion of three recognisers Integration of two feature streams via DBN‘s Segmentation based on higher semantic features

3/ M4 – Meeting Munich Meeting Event Recognition Well known Meeting Events: Discussion Monologue 1 Monologue 2 Monologue 3 Monologue 4 Data: Scripted Meetings Note-taking Presentation Whiteboard (Consensus) (Disagreement)

4/ M4 – Meeting Munich MER by User-Modelling Annotations User-State Meeting Event

5/ M4 – Meeting Munich MER by User-Modelling cont. Definition of five states a participant can be in: sitting – silent sitting – silent – writing sitting – talking standing – talking standing – talking – writing Annotations User-State Meeting Event

6/ M4 – Meeting Munich MER by User-Modelling cont. Two-step-approach based on annotations: From annotations to user-states: features: talking, writing, sitting, standing using SVMs  % From annotations to meeting-events: using SVMs  % Annotations User-State Meeting Event

7/ M4 – Meeting Munich MER from Audio-Signal Using single lapel files: 12 MFCCs; cont. HMMs, 6 States  %

8/ M4 – Meeting Munich MER from Binary Speech Profile Using the Speaker Turn Detection results from IDIAP Discrete HMMs, Codebook with 64 entries, 32 States  %

9/ M4 – Meeting Munich MER from Transcriptions using transcriptions from media file sever 1-state-HMM, discrete all Monologues put together  %

10/ M4 – Meeting Munich Late semantic fusion Joining the results of three entities (all 10 meeting events): MER from Annotations % MER from Audio-Files % MER from Transcriptions % simple rule-based fusion system: If two or more results are equal, the fused result is considered that class. Otherwise the result with the highest score is taken. recognition rate after fusion: %

11/ M4 – Meeting Munich MER using DBNs Integration of two features streams: Binary-Speech- Profile (5Hz) Global-Motion- Features (12.5Hz) Recognition rate: %

12/ M4 – Meeting Munich Segmentation based on higher semantic features benefits from Speaker Turn Detection and Gesture Recognition (81.76 %) Segmentation via sliding windows Results: