Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
A Novel Approach for Recognizing Auditory Events & Scenes Ashish Kapoor.
Facial feature localization Presented by: Harvest Jang Spring 2002.
Adapted representations of audio signals for music instrument recognition Pierre Leveau Laboratoire d’Acoustique Musicale, Paris - France GET - ENST (Télécom.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
Chapter 1: Introduction to Pattern Recognition
Lecture 20 Object recognition I
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Multimodal Analysis Video Representation Video Highlights Extraction Video Browsing Video Retrieval Video Summarization.
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
What’s Making That Sound ?
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Spatio-Temporal Analysis of Multimodal Speaker Activity Guillaume Lathoud, IDIAP Supervised by Dr Iain McCowan, IDIAP.
Sound-Event Partitioning and Feature Normalization for Robust Sound-Event Detection 2 Department of Electronic and Information Engineering The Hong Kong.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Levi Smith.  Reading papers  Getting data set together  Clipping videos to form the training and testing data for our classifier  Project separation.
VQ for ASR 張智星 多媒體資訊檢索實驗室 清華大學 資訊工程系.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
CS654: Digital Image Analysis
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Introduction to Related Papers of Vessel Segmentation Methods Advisor : Ku-Yaw Chang Student : Wei-Lu Lin 2015/1/7.
Predicting Voice Elicited Emotions
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
SUMMERY 1. VOLUMETRIC FEATURES FOR EVENT DETECTION IN VIDEO correlate spatio-temporal shapes to video clips that have been automatically segmented we.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
CS 445/656 Computer & New Media
Detecting Semantic Concepts In Consumer Videos Using Audio Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li Multimedia Computing Lab,
Introduction Machine Learning 14/02/2017.
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Revision (Part II) Ke Chen
Presentation for EEL6586 Automatic Speech Processing
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Knowledge-based event recognition from salient regions of activity
Audio and Speech Computers & New Media.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001

2 The problem Event detection in sports video In this project: the audio part Our approach Segmentation + Event Detection Incorporate domain knowledge

3 Outline Related work Observations on soccer audio Segmentation Features Decision scheme Result Event detection Scope Feature metric Result Generalization Next step

4 Related Work Audio segmentation Speech-silence discrimination [Rabiner78] Speech / music / mixture segmentation [Saunders96] [Scheirer97] [Williams99] Sports audio analysis Classify excited speech [Rui2000] Keyword/event template matching [Chang96] [Rui2000]

5 Observations #1 Sound Types Foreground speech Noisy vocal sound with visible phoneme structure Background noise Ambient crowd, whistles, cheers, etc. Acoustics [Fahy2001] Sound intensity in open space: Sound attenuation in air Production conditions Frequency response of microphone Automatic Gain Control

6 Observations #2 Large variety across games Commentator “verbosity” Audience “excitability”  not labeling and training In different languages  not ASR Not template-matching & training Assumptions on temporal characteristics Short-term dynamics  Long-term variety  -- Seg.Det. unit0.03sec0.5~1 context15>100

7 Segmentation Algorithm Commentary vs. Crowd segmentation Decision Rules Energy > Global Avg. & adaptive threshold 1 st formant energy Fricative energy Feature extraction sound Morphological operations Post- processing Seg. boundary

8 Segmentation Result Sound length Ground truth HitsMissesFalse Alarms 100 sec crowd commentary

9 Detection #1 Detecting audio events in crowd noise Examples: crowd cheering, whistle, … Subjective definition Spectral: centroid, roll-off Energies: E, Er1, Er2 feature contour and moments of the contours Pick up crowd, chop into units Distance metric Seg. boundaries Feature calculation Most distinctive segment

10 Detection #2 Compute Mahalanobis distances [Duda 73] Feature element normalization and decorrelation Pick up distinctive segments Largest distance to all other segments (typically top 5~10%) Clustering: detecting outliers Merge adjacent segments

11 Detection Results The game: River Plate vs. Los Andes Assumptions: The majority are Unimportant We do have Important parts! Cluster analysis helps Time (sec) Start 49.1 Attacking Foul! 95.2 Penalty kick GOAL!

12 Generalization Segmentation tasks Other Sports (baseball, tennis, etc.) Film sound track (Sense and Sensibility) Detection of sparse audio events Surveillance video Silence MusicSpeech

13 Next step More experiments Improve decision scheme Improve GMM in segmentation Use cluster analysis in detection New features Wish list Classification of speech segments Other interesting noise patterns Investigate sound mixtures

14 Summary Segmentation Use energy features Best result: precision 95%, recall 92% Event detection Use feature distance Interesting segments retrieved More work to follow

15 Thanks!

16