Low-level Motion Activity Features for Semantic Characterization of Video Kadir A. Peker, A. Aydin Alatan, Ali N. Akansu International Conference on Multimedia and Expo 2000
Introduction We want to establish connections between low-level motion activity feature of video segments and the semantic meaningful characterization of them. Two computationally simple descriptors for motion activity of a video content is used.
Motion Activity Descriptors act 0 : monotonous (steady) motion activity descriptor act 1 : non-monotonous (unsteady) motion activity descriptor
j
act 0 is sensitive to global motion such as camera pan and to objects moving very close to camera. act 1 filters out the component of motion activity that does not change from frame to frame. In contrast to act 0, act 1 is more sensitive to unsteady motion such as fickle motion of a non-rigid object in close up.
Results from Application Examples We use the two descriptors in two different application contexts. Browsing through a sports video. Retrieval from a database of shots.
Detecting Close-ups in Sports Video We observe that the difference act 1 (n)- act 0 (n) is highest for close-up shots where the irregular motion of players in view is dominant over the regular global motion. Basketball from MPEG-7 data set (10 minutes, frames,4800 P frames) A ground truth data is prepared manually, segmenting the video into wide angle and close- up shots.(59 segments, 30 being close-ups)
We expect m1 (act 0 (n)) to be high for close-up frames because zoom or when the action is close to the camera the motion vector is larger. We expect if non-monotonous activity act 1 (n) is significantly higher than act 0 (n) in a frame, then with a high probability, the frame is a close-up on a highly active object.
Frame-based detection Two threshold for m1 and m2 to select 250 P-frames. Bounding boxes are close-up segments. Positive impulses are where m2 suggest a close-up. Negative impulses are where m1 suggest a close-up.
Segment-based Detection
We find the close-up segments by sorting the segments with respect to sm1 and sm2 and choosing the top K. The retrieval using sm1 (average of act 0 over the segment) is misled by camera motion.The first retrieved segment is a fast pan segment. We find sm2 to be a more reliable detector for close-ups.
Retrieval of High Activity Shots A database of 600 shots from MPEG-7 test set, include various programs such as news, sports,entertainment, education, etc. 5 highest activity shots are retrieved using act 0, act 1 and (act 1 - act 0 ). act 0 and act 1 retrieve shots that contain fast camera motions or an objects that passes too close to the camera, which are not commonly considered high activity. (act 1 - act 0 ) get 5 shots of dancing people.
Conclusion We described two descriptors to infer whether the activity content is dominantly a monotonous, steady motion or an unsteady, inconstant motion. This kind of a characterization of the activity content can be used to detect close-up segment in a sports video or in an activity based query from a database of video shots.