Trajectory Analysis of Broadcast Soccer Videos Computer Science and Engineering Department Indian Institute of Technology, Kharagpur by Prof. Jayanta Mukherjee
Collaborators V. Pallavi --- research scholar. Prof. A.K. Majumdar, CSE Prof. Shamik Sural, SIT
OUTLINE Motivation and Objective State Based Video Model Extraction of Features Trajectory Detection States and Event Detection
Motivation Increasing availability of soccer videos Soccer videos appeal to a large audience Processing of soccer videos to deliver it over narrow band networks Relevance of soccer videos drops significantly after a short period of time Therefore soccer video analysis needs to be made automatic and the results must be semantically meaningful
State based Video Model Video data model : representation of information contained in the unstructured video in order to support users queries. State based model: states of soccer video objects and their transitions (due to some event).
State Chart Diagram for Ball Possession
Immediate Goal Our objective is to identify these states and their transitions by analyzing the unstructured video.
In a soccer match, the ball possession states may be any of the following possession of Team A possession of Team B both the teams fighting to possess the ball ball in possession of none during a break Detection of States and Events (contd..)
Cinematic Features – Shot Transitions – Shot Types – Shot Durations Object Based Features – Players – Ball – Billboards – Field Descriptors Features Used
Cinematic Features Feature Extraction Feature Extraction (contd..) Shot is a continuous sequence of frames captured from the same camera in a video. Shot detection algorithms segment videos into shots automatically. Shot classification algorithms partitions a video stream into a set of meaningful and manageable segments.
Shots can be classified into: Long shot –Captures a global view of the field Medium shot –Shows close up view of one or more players in a specific part of the field Close shot –Shows an above-waist view of a single player Shot classification
Cinematic Features Shot Classification (contd..) A soccer field has one distinct dominant color i.e. green which varies from –Stadium to stadium –Lighting conditions In long views it has been observed that either grass dominates the entire frame or the crowd covers upper part of the framegrass dominates
Typical long views in soccer videos Grass covering entire frame Grass covering partial frame
Shot Classification (contd..) Soccer Video Sequence If dominant color is green Dominant color ratio >0.75 and <=1.0 Long Shot Medium Shot Dominant color ratio >0.5 and <=0.75 Close Shot Dominant color ratio >0.25 and <=0.5
Shot Classification Results Close Shot Medium Shot Long Shot Unclassified Shot (No of frames) Close Shot (No of frames) Medium Shot (No of frames) Long Shot (No of frames) Predicted ClassTrue Class
87.63 Close Shot Medium Shot Long Shot % of True Classification Shot Type Shot Classification Results
Cinematic Features Shot Detection Shots in sports videos can be Shots in sports videos can be : Wipe Dissolve Hard cut Fade
Proposed Shot Detection Method Extends the approach proposed by Vadivel et al. for broadcast soccer videos Combines the shot detection method by Vadivel et al. with the proposed shot classification method. Limitations of Vadivel et als method for broadcast soccer videos : Hard cuts are missed
Proposed Shot Detection Method Each frame in a shot is classified with the shot classification algorithm If a long shot is segmented into a sequence of long and medium view frames If the number of frames in the sequence is above a certain threshold Hard cut exists within the shot
Proposed Shot Detection Results Overall Recall and Precision by: Vadivel et als method: 85.43%, 89.02% Proposed method: 91.76%, 93.65%
Shot detection improved by shot classification
Object Based Features Feature extraction for grass pixels Each frame is processed in YIQ color space. It is found experimentally that grass pixels have I values ranging between 25 and 55 while Q values range between 0 and 12.grass pixels
Playfield region detected Grass pixels detected for a long view frame
Object Based Features (contd..) Playfield Line Detection A playfield line separates playfield from the non playfield background which are usually the billboards (also called advertisement boards). Hough transform is used to detect the playfield line.
Object Based Features (contd..) Midfield line is the line that divides the playfield in half along its width. Hough transform is applied to detect the midfield line. Midfield Line Detection
Ball Detection Object Based Features (contd..) Challenges : Features of the ball (color, size, shape) vary with time Relative size of the ball is very small Ball may not be an ideal circle because of fast motion and illumination conditions Objects in the field or in the crowd may look similar to a ball Field appearance changes from place to place and time to time No definite property to uniquely identify ball in a frame
Detecting Ball Candidates in Long Shots Obtain ball candidates by detecting circular regions by using circular Hough Transform Filter the non ball candidates by : – Removing candidates from channels logo – Removing candidates from gallery region – Removing candidates from midfield line – Filtering out the candidates moving against the camera Object Based Features (contd..)
Ball candidates before and after filtering Ball candidates before filteringBall candidates after filtering Object Based Features (contd..)
Detecting Players in Long Shots Challenges Features of the players (color, texture, size, motion) are neither static nor uniform Players appear very small in size Size of players changes with their position and zooming of cameras Color and texture of the jersey and shorts vary from team to team Players in the field do not have constant motion Object Based Features (contd..)
Obtain player pixels by removing non player pixels : –Removing grass pixels –Removing the broadcasting channels logo –Removing the extra field region (billboards and gallery) –Removing pixels from the midfield line Segment the image containing player pixels to isolated player regions by : –Region growing algorithm –Center of the bounding rectangle of each region is said to be the location of the player Detecting Player Regions Object Based Features (contd..)
A Long Shot View Object Based Features (contd..)
Player pixels detected Object Based Features (contd..)
Players detected in long shot views Object Based Features (contd..)
Team Identification in Soccer Videos Players in a soccer videos are classified using a supervisory classification method. Mean I and Q values of the player regions are obtained by randomly selecting a few frames The minimum and maximum I and Q values are set as the range for classifying player regions Feature Detection (Contd.)
Team Classification in Soccer Videos Experiments were performed on two different matches: Real Madrid and Manchester United (UEFA Champions League 2003) Chelsea and Liverpool (UEFA Champions League 2007) Feature detection (contd.)
Team Classification Results 72 (11.51) 6520Team B 50 (25) 3 (17.86) Unclassified 173 (16.29) 0725Team A Unclassified No of players (%) Team B No of players (%) Team A No of players (%) Predicted ClassTrue Class Real Madrid and Manchester United
Chelsea and Liverpool 72 (9.94) 6520Team B 503 (37.5) Unclassified 173 (19.27) 0725Team A Unclassified No of players (%) Team B No of players (%) Team A No of players (%) Predicted ClassTrue Class Team Classification Results (contd..)
Camera Related Feature Object Based Features (contd..) Camera Direction Estimation : 1.Optical Flow velocities and their directions are computed using Horn and Shuncks method. 2.Based on the sign of the horizontal component of the majority pixels in a frame, the direction of movement (left or right) of the camera is estimated.
Camera Direction Estimation (contd..) Optical flow velocities for the camera moving towards right
Tracking of Broadcast Video Objects Challenges Camera parameters are unknown Cameras are not fixed Cameras are zoomed and rotated Broadcast video is an edited video
Construction of a Directed Weighted Graph Objects in a frame form nodes. Between two correlated objects in two different frames an arc (edge) is formed. The measure of correlation or similarity provide the weight. Temporal direction provides the direction of the edge.
Directed Weighted Graph (contd..) Tracking of Broadcast Video Objects (contd..)
Object Trajectory Detection Given a source node, longest path of the graph obtained by dynamic programming gives the path of the object. Tracking of Broadcast Video Objects (contd..)
Ball detection results for long shots Liang et al * sequence Liang et al * sequence PrecisionRecallBall present in (number of frames) Ball identified in (number of frames) Total (number of frames) Frame Range Average Recall is % and Average Precision is % * Liang D., Liu Y., Huang Q. \& Gao W., A Scheme for Ball Detection and Tracking in Broadcast Soccer Video, Pacific Rim Conference on Multimedia, 2005, 1, LNCS 3767, Tracking of Broadcast Video Objects (contd..)
Results for ball detection in long shots (contd..)
Given a source node (player in the first frame), longest path of the graph obtained by dynamic programming gives the path of the player in the whole sequence. Tracking a Single Player Player being tracked Tracking of Broadcast Video Objects (contd..)
Tracking Multiple Players Longest path from each node (represented by players in the first frame) of the graph obtained by dynamic programming gives the trajectories of the players for the sequence of frames. Limitations : Occlusion between players Players in contact Similarity between players belonging to same team Tracking of Broadcast Video Objects (contd..)
Resolving Conflicting Player Trajectories If more than one player has more than two common nodes in its trajectory then only one amongst them is true. The path having maximum weight is said to be the true trajectory Nodes constituting the paths of correctly detected players are removed and a graph is again constructed Mistracked players are again tracked Tracking Multiple Players (contd..)
Multiple Player Detection Results Soccer Soccer Soccer Soccer Soccer Soccer 1 Precision (%) Recall (%) SOP detected SOTP detectedSOP presentNo of FramesVideo file Average Recall is % and Average Precision is 90.18% Tracking Multiple Players (contd..)
Multi - Player Tracking Results Soccer Soccer Soccer Soccer Soccer Soccer 1 Accuracy (%) SOTP tracked by tracking and retracking algorithm SOTP tracked by tracking algorithm SOP presentNo of FramesVideo file Average Accuracy is %. Tracking Multiple Players (contd..)
Occlusion Results Soccer Soccer Soccer Soccer 1 Accuracy (%) No of cases that could be solved No of occlusion and contact cases Video file Average Accuracy is %. Tracking Multiple Players (contd..)
Multi - Player Tracking Results
Multi - Player Tracking with Occlusion Results
Multi - Player Tracking with Occlusion Results (contd..)
Tracking the Mistracked Player (contd..)
Detection of States and Events The features extracted and the trajectories detected are used to detect states and events based on the proposed state based video model. States identified - Ball possession states Events detected - Ball passing events
State Chart Diagram for Ball Possession Detection of States and Events (contd..)
Play Break Detection Detection of States and Events (contd..)
State Detection Ball possession states are obtained based on Spatial proximity analysis: Distance between nearest player and second nearest player to the ball Spatial arrangements between the players and the ball
Ball Possession State Detection Ball in possession of player 1s team
Ball Possession State Detection Ball in possession of player 1s team Ball in a fight state
Ball Possession Results 8 (1.92) (3.85) Team B 5412 (17.14) 4 (5.71) Fight 8 (3.17) 2 (0.93) 206Team A Fight No of frames (% of misclassified frames) Team B No of frames (% of misclassified frames) Team A No of frames (% of misclassified frames) Predicted ClassTrue Class
Edit Distance as performance measure for ball possession states If the actual state sequence for a sequence of frames is: AAAAFFFFFFFFFFBBBB And if the state sequence obtained by the proposed algorithm is: AAAAFFFFFFFFBBBBBB Both the sequences are represented as strings S 1 and S 2. Edit distance D(S 1, S 2 ) is defined as the minimum number of point mutations required to change S 1 to S 2 where a point mutation is one of: replacing an alphabet inserting an alphabet deleting an alphabet Edit distance for the above sequence is 2. While normalized edit distance is: D(S 1, S 2 )/| S 1 |
Shot wise ball possession results
Event Detection The event detected in this work is the ball passing event. It can be: Forward pass Reverse pass
Event Detection (contd..) The ball passing event cannot be detected from state transition graphs because: Ball is usually passed between players of the same team State transition graphs show the change in ball possession states from Team A-Team B, Team B - Team A, Team B – Fight, Fight – Team B, Team A – Fight or Fight – Team A
Schematic diagram for ball passing events Ball is said to be passed in a sequence of frames, if: Nearest player in the initial frames of the sequence is the second nearest player to the ball in the subsequent frames Nearest and the second nearest players to the ball belong to the same team
Example of a ball passing event:
Example of a ball passing event (contd..)
Classifying ball passing events Forward pass: Direction of camera motion is towards the goal post of the team opposite to that of the nearest player Reverse pass: Direction of camera motion is towards the goal post of the team of the nearest player
Results for ball passing events Average Recall = 100% and Precision = 60%
Classification of ball passing events: 5-Reverse 13Forward Reverse (no of passes) Forward (no of passes) False Ball PassesTrue Ball Passes
Graphs for ball possession and ball passing Graphs illustrating ball possession states and ball passing events for Sequence 7
Graphs for ball possession and ball passing Graphs illustrating ball possession states and ball passing events for Sequence 10
Publication V. Pallavi, A. Vadivel, Shamik Sural, A.K. Majumdar, Jayanta Mukherjee, Identification of moving objects in a Soccer video, Workshop on Computer Vision, Graphics and Image Processing 2006, Hyderabad, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Shot classification in Soccer videos, Proceedings of National Conference on Recent Trends in Information Systems 2006, Kolkata, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Identification of team in possession of ball in a soccer video using static and dynamic segmentation, Proceedings of Sixth International Conference on Advances in Pattern Recognition 2007, Kolkata, India, pp V. Pallavi, J. Mukherjee, A.K. Majumdar and Shamik Sural, Ball detection from broadcast soccer videos using static and dynamic features, Journal of Visual Communication and Image Representation, (Accepted for a second review).