1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Advanced Image Processing Student Seminar: Lipreading Method using color extraction method and eigenspace technique ( Yasuyuki Nakata and Moritoshi Ando.

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.

Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.

Problem Semi supervised sarcasm identification using SASI

Finding Structure in Home Videos by Probabilistic Hierarchical Clustering Daniel Gatica-Perez, Alexander Loui, and Ming-Ting Sun.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

One-Shot Learning Gesture Recognition Students:Itay Hubara Amit Nishry Supervisor:Maayan Harel Gal-On.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

ICME 2008 Huiying Liu, Shuqiang Jiang, Qingming Huang, Changsheng Xu.

Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

ADVISE: Advanced Digital Video Information Segmentation Engine

Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.

A neural approach to extract foreground from human movement images S.Conforto, M.Schmid, A.Neri, T.D’Alessio Compute Method and Programs in Biomedicine.

Video summarization by graph optimization Lu Shi Oct. 7, 2003.

Presented by Zeehasham Rasheed

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Multimodal Analysis Video Representation Video Highlights Extraction Video Browsing Video Retrieval Video Summarization.

김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.

Introduction to machine learning

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

TEMPORAL EVENT CLUSTERING FOR DIGITAL PHOTO COLLECTIONS Matthew Cooper, Jonathan Foote, Andreas Girgensohn, and Lynn Wilcox ACM Multimedia ACM Transactions.

Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.

Presented by Tienwei Tsai July, 2005

Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.

© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.

General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.

Multimodal Information Analysis for Emotion Recognition

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Levi Smith.  Reading papers  Getting data set together  Clipping videos to form the training and testing data for our classifier  Project separation.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.

PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton

Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.

1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.

Semi-Automatic Image Annotation Liu Wenyin, Susan Dumais, Yanfeng Sun, HongJiang Zhang, Mary Czerwinski and Brent Field Microsoft Research.

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Using Cross-Media Correlation for Scene Detection in Travel Videos.

Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

杜嘉晨 PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs.

Tracking Groups of People for Video Surveillance Xinzhen(Elaine) Wang Advisor: Dr.Longin Latecki.

Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

Trajectory-Based Ball Detection and Tracking with Aid of Homography in Broadcast Tennis Video Xinguo Yu, Nianjuan Jiang, Ee Luang Ang Present by komod.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

How to forecast solar flares?

Traffic State Detection Using Acoustics

My Tiny Ping-Pong Helper

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas

EE513 Audio Signals and Systems

Weigang Zhang, Qixiang Yeb, Liyuan Xingc, Qingming Huangc, Wen Gaoa

Presentation transcript:

1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing, Qixiang Ye, Weigang Zhang, Qingming Huang and Hua Yu

2 Outline Introduction System Overview Audio and Video Classification Rally Detection and Ranking Experiment Results Conclusion and Future Work

3 Introduction In the past years, lots of work has been done on sports video analysis and understanding. But the research is mainly following the work on field sports video and there is still no general scheme for different kinds of racquet sports video analysis

4 Introduction We try to provide a general solution to analyze the racquet sports video by fully explore its characteristics on audio, visual features and their temporal relationship.

5 System Overview The audio information is more related to the state of the racquet sports. If we can group the video shots of similar content into same cluster, the temporal video structure of a match will be clear.

6 System Overview A supervised classification method is used to detect important audio symbols such as impact, audience cheers or commentator speech. An unsupervised scene categorization algorithm is employed to group video shots into various clusters which have not been specified with the semantic clusters labels.

7 System Overview Rally scenes have a strong relationship with obvious audio symbols. So we can identify semantic information of the video scene clusters by audio classification results. A refinement procedure for removing false rally scenes through further audio analysis. Finally, the detected rally scenes are ranked by pre-defined exciting model.

8 System Overview

9 Audio and Video Classification I. Audio feature extraction and selection Four groups of clip-level features are extracted Short-time energy (STE) based features Mean value of STE (1) The standard deviation of STE (2) The low STE ratio (3) The high STE ratio (4)

10 Audio and Video Classification I. Audio feature extraction and selection Zero-crossing rate (ZCR) based features Mean value of ZCR (5) Standard deviation of ZCR (6) The low ZCR ratio (7) Mean value of different ZCR (8) Standard deviation of different ZCR (9) The only pitch based feature is the mean value of pitch (10)

11 Audio and Video Classification I. Audio feature extraction and selection The others are frequency based features Mean value of brightness (11) Standard deviation of brightness (12) Mean value of bandwidth (13) Standard deviation of bandwidth (14) Spectrum flux (SF, 15) Mean value of sub-band power (16-19) Standard deviation of sub-band power (20-23) Mean value of LPCC (24-31) Standard deviation of LPCC (32-39) Mean value of MFCC (40-47) Standard deviation of MFCC (48-55)

12 Audio and Video Classification I. Audio feature extraction and selection Using only a small set of the most powerful features will reduce the time for feature extraction and classification. Feature selection is performed before they are fed into the classifier. A forward algorithm is used to perform the feature selection task.

13 Audio and Video Classification I. Audio feature extraction and selection

14 Audio and Video Classification I. Audio training and classification We use the SVM-based classifier which is easier to train, needs fewer training samples and has better generalization ability. To determine the best sound clip length for samples we design a procedure based on SVM classification evaluation. (one second)

15 Audio and Video Classification II. Unsupervised video shot classification - Merging

16 : the total inter-cluster scatter of the initial scene sequence : the intra-cluster scatter of scene cluster c : the total scene number in the initial scene sequence : the shot number of scene cluster c : Euclidean distance : shot I in scene cluster c (the initial scene sequence) : the mean feature value of shots in scene c (the initial scene sequence) Audio and Video Classification Merging Stop criteria (Fisher Discriminant Function)

17 Audio and Video Classification Merging Stop criteria (Fisher Discriminant Function) It is expected that J value is small and the scene number is small. While in real situation, the smaller the scene number is, the large the J value will be. We choose a point where the sum of the J value and the ratio of the scene number to the total number of scenes in the initial scene sequence in the merging process is the smallest.

18 Audio and Video Classification Video shot classification

19 Rally Detection and Ranking Scene Recognition by Audio Symbol : number of sound classes : number of video scene clusters : the number of time slots and belong to the k-th sound class that is distributed in the n-th cluster Supposed that k-th sound class has M time slots and is within the time domain of the i-th time slot : the result of video clustering, : the exact match of k

20 Rally Detection and Ranking Refinement by audio symbol We should make a tradeoff between the precision and the recall of the rally event to make some improvement. RULE1: IF there is an impact in the time scope of coarse rally THEN judge whether at the end of the coarse rally there is excited speech or cheer IF it is THEN it is a rally ELSE it is not RULE2: IF there is no impact in the time scope of coarse rally THEN judge whether at the end of the coarse rally there is excited speech or cheer and whether at the beginning of the coarse rally there is plain speech or cheer IF both are satisfied THEN it is a rally ELSE it is not

21 Rally Detection and Ranking Rally ranking by excitement : the important weight : the duration of a rally : the relative energy of the cheer clips after the rally : the pitch of speech after a rally The higher the value is, the more exciting the corresponding rally will be.

22 Experiment Results

23 Experiment Results

24 Experiment Results

25 Experiment Results

26 Conclusion and Future Work In this paper, we propose a general scheme for racquet sports video content analysis including rallies detection and ranking. In future work, more sophisticated audio/visual information combination model will be developed and more reasonable rally ranking method can be explored.