Trajectory Analysis of Broadcast Soccer Videos Computer Science and Engineering Department Indian Institute of Technology, Kharagpur by Prof. Jayanta Mukherjee.

Slides:

Advertisements

Similar presentations

Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.

Advertisements

Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.

Kien A. Hua Division of Computer Science University of Central Florida.

Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.

Automated Shot Boundary Detection in VIRS DJ Park Computer Science Department The University of Iowa.

Finding Structure in Home Videos by Probabilistic Hierarchical Clustering Daniel Gatica-Perez, Alexander Loui, and Ming-Ting Sun.

Automatic Soccer Video Analysis and Summarization

Visual Event Detection & Recognition Filiz Bunyak Ersoy, Ph.D. student Smart Engineering Systems Lab.

Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.

Broadcast News Parsing Using Visual Cues: A Robust Face Detection Approach Yannis Avrithis, Nicolas Tsapatsoulis and Stefanos Kollias Image, Video & Multimedia.

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

Broadcast Court-Net Sports Video Analysis Using Fast 3-D Camera Modeling Jungong Han Dirk Farin Peter H. N. IEEE CSVT 2008.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Professor Department of Computer Science & Engineering Indian Institute of Technology Delhi April 26, 2007 Visiting Professor Dayalbagh Educational Institute.

ADVISE: Advanced Digital Video Information Segmentation Engine

Segmentation and Event Detection in Soccer Audio Lexing Xie, Prof. Dan Ellis EE6820, Spring 2001 April 24 th, 2001.

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

Multimedia Search and Retrieval Presented by: Reza Aghaee For Multimedia Course(CMPT820) Simon Fraser University March.2005 Shih-Fu Chang, Qian Huang,

TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.

WP -6: Human Tracking and Modelling Year–I Objectives: Simple upper-body models and articulated tracks from test videos. Year-I Achievements: Tracking.

Low-level Motion Activity Features for Semantic Characterization of Video Kadir A. Peker, A. Aydin Alatan, Ali N. Akansu International Conference on Multimedia.

Video Trails: Representing and Visualizing Structure in Video Sequences Vikrant Kobla David Doermann Christos Faloutsos.

Support Vector Machine based Logo Detection in Broadcast Soccer Videos Hossam M. Zawbaa Cairo University, Faculty of Computers and Information; ABO Research.

SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,

Video Classification By: Maryam S. Mirian

WP5.4/3.1/4.2/5.5 meeting 29th of November 2007, DFKI.

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Information Extraction from Cricket Videos Syed Ahsan Ishtiaque Kumar Srijan.

Multimedia Databases (MMDB)

Multimedia Information Retrieval and Multimedia Data Mining Chengcui Zhang Assistant Professor Dept. of Computer and Information Science University of.

Player Action Recognition in Broadcast Tennis Video with Applications to Semantic Analysis of Sport Game Guangyu Zhu, Changsheng Xu Qingming Huang, Wen.

An Architecture for Mining Resources Complementary to Audio-Visual Streams J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Svátek, T. Declerck, A. Cobet,

Objective Understand digital video production methods, software, and hardware. Course Weight : 15%

 Tsung-Sheng Fu, Hua-Tsung Chen, Chien-Li Chou, Wen-Jiin Tsai, and Suh-Yin Lee Visual Communications and Image Processing (VCIP), 2011 IEEE, 6-9 Nov.

IBM QBIC: Query by Image and Video Content Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC 28223

A Confidence-Based Approach to Multi-Robot Demonstration Learning Sonia Chernova Manuela Veloso Carnegie Mellon University Computer Science Department.

Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“

Levi Smith.  Reading papers  Getting data set together  Clipping videos to form the training and testing data for our classifier  Project separation.

Soccer Video Analysis EE 368: Spring 2012 Kevin Cheng.

Action as Space-Time Shapes

Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.

Mosaic Based Characterization of Video Sequences using Fuzzy Inferencing Publish Publish International Conference on Multimedia Processing and Systems.

Bachelor of Engineering In Image Processing Techniques For Video Content Extraction Submitted to the faculty of Engineering North Maharashtra University,

Semantic Scenes Detection and Classification in Sports Videos Soo-Chang Pei ( 貝蘇章 ) and Fan Chen ( 陳凡 ) Conference on Computer Vision, Graphics and Image.

Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.

Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.

MULTIMEDIA DATA MODELS AND AUTHORING

Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.

Image features and properties. Image content representation The simplest representation of an image pattern is to list image pixels, one after the other.

Narration/dialogue: Camera motion: Video effect: Audio effect: Shot duration: Transition to next scene: Storyboard Panel #

By Akhilesh K. Sinha Nishant Singh Supervised by Prof. Amitabha Mukerjee Video Surveillance of Basketball Matches and Goal Detection Indian Institute of.

Trajectory-Based Ball Detection and Tracking with Aid of Homography in Broadcast Tennis Video Xinguo Yu, Nianjuan Jiang, Ee Luang Ang Present by komod.

Event Tactic Analysis Based on Broadcast Sports Video Guangyu Zhu, Changsheng Xu, Senior Member, IEEE, Qingming Huang, Member, IEEE, Yong Rui, Senior Member,

REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR

Visual Information Retrieval

Digital Video Hardware

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Multimedia Content-Based Retrieval

A. Vadivel, M. Mohan, Shamik Sural and A. K. Majumdar

Project Implementation for ITCS4122

V. Mezaris, I. Kompatsiaris, N. V. Boulgouris, and M. G. Strintzis

B.Ramamurthy Partially Based on Ben Jones Book [1]

Sight Visualization Thinking in pictures; The process of depiction; Clarify and intensify an event Ways of Looking Looking at (objectivity) Looking.

Football Video Segmentation Based on Video Production Strategy

Automatic Generation of Personalized Music Sports Video ACM MM’2005

Multimedia Information Retrieval

Region and Shape Extraction

An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.

Soccer Analyzer Introduction to Computational and Biological Vision

Presentation transcript:

Trajectory Analysis of Broadcast Soccer Videos Computer Science and Engineering Department Indian Institute of Technology, Kharagpur by Prof. Jayanta Mukherjee

Collaborators V. Pallavi --- research scholar. Prof. A.K. Majumdar, CSE Prof. Shamik Sural, SIT

OUTLINE Motivation and Objective State Based Video Model Extraction of Features Trajectory Detection States and Event Detection

Motivation Increasing availability of soccer videos Soccer videos appeal to a large audience Processing of soccer videos to deliver it over narrow band networks Relevance of soccer videos drops significantly after a short period of time Therefore soccer video analysis needs to be made automatic and the results must be semantically meaningful

State based Video Model Video data model : representation of information contained in the unstructured video in order to support users queries. State based model: states of soccer video objects and their transitions (due to some event).

State Chart Diagram for Ball Possession

Immediate Goal Our objective is to identify these states and their transitions by analyzing the unstructured video.

In a soccer match, the ball possession states may be any of the following possession of Team A possession of Team B both the teams fighting to possess the ball ball in possession of none during a break Detection of States and Events (contd..)

Cinematic Features – Shot Transitions – Shot Types – Shot Durations Object Based Features – Players – Ball – Billboards – Field Descriptors Features Used

Cinematic Features Feature Extraction Feature Extraction (contd..) Shot is a continuous sequence of frames captured from the same camera in a video. Shot detection algorithms segment videos into shots automatically. Shot classification algorithms partitions a video stream into a set of meaningful and manageable segments.

Shots can be classified into: Long shot –Captures a global view of the field Medium shot –Shows close up view of one or more players in a specific part of the field Close shot –Shows an above-waist view of a single player Shot classification

Cinematic Features Shot Classification (contd..) A soccer field has one distinct dominant color i.e. green which varies from –Stadium to stadium –Lighting conditions In long views it has been observed that either grass dominates the entire frame or the crowd covers upper part of the framegrass dominates

Typical long views in soccer videos Grass covering entire frame Grass covering partial frame

Shot Classification (contd..) Soccer Video Sequence If dominant color is green Dominant color ratio >0.75 and <=1.0 Long Shot Medium Shot Dominant color ratio >0.5 and <=0.75 Close Shot Dominant color ratio >0.25 and <=0.5

Shot Classification Results Close Shot Medium Shot Long Shot Unclassified Shot (No of frames) Close Shot (No of frames) Medium Shot (No of frames) Long Shot (No of frames) Predicted ClassTrue Class

87.63 Close Shot Medium Shot Long Shot % of True Classification Shot Type Shot Classification Results

Cinematic Features Shot Detection Shots in sports videos can be Shots in sports videos can be : Wipe Dissolve Hard cut Fade

Proposed Shot Detection Method Extends the approach proposed by Vadivel et al. for broadcast soccer videos Combines the shot detection method by Vadivel et al. with the proposed shot classification method. Limitations of Vadivel et als method for broadcast soccer videos : Hard cuts are missed

Proposed Shot Detection Method Each frame in a shot is classified with the shot classification algorithm If a long shot is segmented into a sequence of long and medium view frames If the number of frames in the sequence is above a certain threshold Hard cut exists within the shot

Proposed Shot Detection Results Overall Recall and Precision by: Vadivel et als method: 85.43%, 89.02% Proposed method: 91.76%, 93.65%

Shot detection improved by shot classification

Object Based Features Feature extraction for grass pixels Each frame is processed in YIQ color space. It is found experimentally that grass pixels have I values ranging between 25 and 55 while Q values range between 0 and 12.grass pixels

Playfield region detected Grass pixels detected for a long view frame

Object Based Features (contd..) Playfield Line Detection A playfield line separates playfield from the non playfield background which are usually the billboards (also called advertisement boards). Hough transform is used to detect the playfield line.

Object Based Features (contd..) Midfield line is the line that divides the playfield in half along its width. Hough transform is applied to detect the midfield line. Midfield Line Detection

Ball Detection Object Based Features (contd..) Challenges : Features of the ball (color, size, shape) vary with time Relative size of the ball is very small Ball may not be an ideal circle because of fast motion and illumination conditions Objects in the field or in the crowd may look similar to a ball Field appearance changes from place to place and time to time No definite property to uniquely identify ball in a frame

Detecting Ball Candidates in Long Shots Obtain ball candidates by detecting circular regions by using circular Hough Transform Filter the non ball candidates by : – Removing candidates from channels logo – Removing candidates from gallery region – Removing candidates from midfield line – Filtering out the candidates moving against the camera Object Based Features (contd..)

Ball candidates before and after filtering Ball candidates before filteringBall candidates after filtering Object Based Features (contd..)

Detecting Players in Long Shots Challenges Features of the players (color, texture, size, motion) are neither static nor uniform Players appear very small in size Size of players changes with their position and zooming of cameras Color and texture of the jersey and shorts vary from team to team Players in the field do not have constant motion Object Based Features (contd..)

Obtain player pixels by removing non player pixels : –Removing grass pixels –Removing the broadcasting channels logo –Removing the extra field region (billboards and gallery) –Removing pixels from the midfield line Segment the image containing player pixels to isolated player regions by : –Region growing algorithm –Center of the bounding rectangle of each region is said to be the location of the player Detecting Player Regions Object Based Features (contd..)

A Long Shot View Object Based Features (contd..)

Player pixels detected Object Based Features (contd..)

Players detected in long shot views Object Based Features (contd..)

Team Identification in Soccer Videos Players in a soccer videos are classified using a supervisory classification method. Mean I and Q values of the player regions are obtained by randomly selecting a few frames The minimum and maximum I and Q values are set as the range for classifying player regions Feature Detection (Contd.)

Team Classification in Soccer Videos Experiments were performed on two different matches: Real Madrid and Manchester United (UEFA Champions League 2003) Chelsea and Liverpool (UEFA Champions League 2007) Feature detection (contd.)

Team Classification Results 72 (11.51) 6520Team B 50 (25) 3 (17.86) Unclassified 173 (16.29) 0725Team A Unclassified No of players (%) Team B No of players (%) Team A No of players (%) Predicted ClassTrue Class Real Madrid and Manchester United

Chelsea and Liverpool 72 (9.94) 6520Team B 503 (37.5) Unclassified 173 (19.27) 0725Team A Unclassified No of players (%) Team B No of players (%) Team A No of players (%) Predicted ClassTrue Class Team Classification Results (contd..)

Camera Related Feature Object Based Features (contd..) Camera Direction Estimation : 1.Optical Flow velocities and their directions are computed using Horn and Shuncks method. 2.Based on the sign of the horizontal component of the majority pixels in a frame, the direction of movement (left or right) of the camera is estimated.

Camera Direction Estimation (contd..) Optical flow velocities for the camera moving towards right

Tracking of Broadcast Video Objects Challenges Camera parameters are unknown Cameras are not fixed Cameras are zoomed and rotated Broadcast video is an edited video

Construction of a Directed Weighted Graph Objects in a frame form nodes. Between two correlated objects in two different frames an arc (edge) is formed. The measure of correlation or similarity provide the weight. Temporal direction provides the direction of the edge.

Directed Weighted Graph (contd..) Tracking of Broadcast Video Objects (contd..)

Object Trajectory Detection Given a source node, longest path of the graph obtained by dynamic programming gives the path of the object. Tracking of Broadcast Video Objects (contd..)

Ball detection results for long shots Liang et al * sequence Liang et al * sequence PrecisionRecallBall present in (number of frames) Ball identified in (number of frames) Total (number of frames) Frame Range Average Recall is % and Average Precision is % * Liang D., Liu Y., Huang Q. \& Gao W., A Scheme for Ball Detection and Tracking in Broadcast Soccer Video, Pacific Rim Conference on Multimedia, 2005, 1, LNCS 3767, Tracking of Broadcast Video Objects (contd..)

Results for ball detection in long shots (contd..)

Given a source node (player in the first frame), longest path of the graph obtained by dynamic programming gives the path of the player in the whole sequence. Tracking a Single Player Player being tracked Tracking of Broadcast Video Objects (contd..)

Tracking Multiple Players Longest path from each node (represented by players in the first frame) of the graph obtained by dynamic programming gives the trajectories of the players for the sequence of frames. Limitations : Occlusion between players Players in contact Similarity between players belonging to same team Tracking of Broadcast Video Objects (contd..)

Resolving Conflicting Player Trajectories If more than one player has more than two common nodes in its trajectory then only one amongst them is true. The path having maximum weight is said to be the true trajectory Nodes constituting the paths of correctly detected players are removed and a graph is again constructed Mistracked players are again tracked Tracking Multiple Players (contd..)

Multiple Player Detection Results Soccer Soccer Soccer Soccer Soccer Soccer 1 Precision (%) Recall (%) SOP detected SOTP detectedSOP presentNo of FramesVideo file Average Recall is % and Average Precision is 90.18% Tracking Multiple Players (contd..)

Multi - Player Tracking Results Soccer Soccer Soccer Soccer Soccer Soccer 1 Accuracy (%) SOTP tracked by tracking and retracking algorithm SOTP tracked by tracking algorithm SOP presentNo of FramesVideo file Average Accuracy is %. Tracking Multiple Players (contd..)

Occlusion Results Soccer Soccer Soccer Soccer 1 Accuracy (%) No of cases that could be solved No of occlusion and contact cases Video file Average Accuracy is %. Tracking Multiple Players (contd..)

Multi - Player Tracking Results

Multi - Player Tracking with Occlusion Results

Multi - Player Tracking with Occlusion Results (contd..)

Tracking the Mistracked Player (contd..)

Detection of States and Events The features extracted and the trajectories detected are used to detect states and events based on the proposed state based video model. States identified - Ball possession states Events detected - Ball passing events

State Chart Diagram for Ball Possession Detection of States and Events (contd..)

Play Break Detection Detection of States and Events (contd..)

State Detection Ball possession states are obtained based on Spatial proximity analysis: Distance between nearest player and second nearest player to the ball Spatial arrangements between the players and the ball

Ball Possession State Detection Ball in possession of player 1s team

Ball Possession State Detection Ball in possession of player 1s team Ball in a fight state

Ball Possession Results 8 (1.92) (3.85) Team B 5412 (17.14) 4 (5.71) Fight 8 (3.17) 2 (0.93) 206Team A Fight No of frames (% of misclassified frames) Team B No of frames (% of misclassified frames) Team A No of frames (% of misclassified frames) Predicted ClassTrue Class

Edit Distance as performance measure for ball possession states If the actual state sequence for a sequence of frames is: AAAAFFFFFFFFFFBBBB And if the state sequence obtained by the proposed algorithm is: AAAAFFFFFFFFBBBBBB Both the sequences are represented as strings S 1 and S 2. Edit distance D(S 1, S 2 ) is defined as the minimum number of point mutations required to change S 1 to S 2 where a point mutation is one of: replacing an alphabet inserting an alphabet deleting an alphabet Edit distance for the above sequence is 2. While normalized edit distance is: D(S 1, S 2 )/| S 1 |

Shot wise ball possession results

Event Detection The event detected in this work is the ball passing event. It can be: Forward pass Reverse pass

Event Detection (contd..) The ball passing event cannot be detected from state transition graphs because: Ball is usually passed between players of the same team State transition graphs show the change in ball possession states from Team A-Team B, Team B - Team A, Team B – Fight, Fight – Team B, Team A – Fight or Fight – Team A

Schematic diagram for ball passing events Ball is said to be passed in a sequence of frames, if: Nearest player in the initial frames of the sequence is the second nearest player to the ball in the subsequent frames Nearest and the second nearest players to the ball belong to the same team

Example of a ball passing event:

Example of a ball passing event (contd..)

Classifying ball passing events Forward pass: Direction of camera motion is towards the goal post of the team opposite to that of the nearest player Reverse pass: Direction of camera motion is towards the goal post of the team of the nearest player

Results for ball passing events Average Recall = 100% and Precision = 60%

Classification of ball passing events: 5-Reverse 13Forward Reverse (no of passes) Forward (no of passes) False Ball PassesTrue Ball Passes

Graphs for ball possession and ball passing Graphs illustrating ball possession states and ball passing events for Sequence 7

Graphs for ball possession and ball passing Graphs illustrating ball possession states and ball passing events for Sequence 10

Publication V. Pallavi, A. Vadivel, Shamik Sural, A.K. Majumdar, Jayanta Mukherjee, Identification of moving objects in a Soccer video, Workshop on Computer Vision, Graphics and Image Processing 2006, Hyderabad, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Shot classification in Soccer videos, Proceedings of National Conference on Recent Trends in Information Systems 2006, Kolkata, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Identification of team in possession of ball in a soccer video using static and dynamic segmentation, Proceedings of Sixth International Conference on Advances in Pattern Recognition 2007, Kolkata, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Ball detection from broadcast soccer videos using static and dynamic features, Journal of Visual Communication and Image Representation, (Accepted for a second review).