Presentation is loading. Please wait.

Presentation is loading. Please wait.

Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University.

Similar presentations


Presentation on theme: "Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University."— Presentation transcript:

1 Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University {tvolkmer, saied, hugh}@cs.rmit.edu.au

2 Overview Our general approach The moving query window Details of the approach How we measure frame similarity Improvements for 2004 cut detection Detection of gradual transitions Evaluation Experimental results Conclusions

3 The Moving Query Window A moving query window consists of two equal-sized half windows, surrounding a current frame The moving query window is advanced through the video frame-by- frame Cut detection and gradual transition detection is performed with separate decision stages during a single pass

4 Frame feature representation We use one-dimensional, localised histograms with 4x4 regions in the HSV colour space (16 bins per colour component) A colour histogram represents each frame region. Corresponding regions are compared Different weights can be applied to each region during comparison

5 Cut detection We disregard the four central regions of each frame to avoid the effect of rapid activity (that is, their weight = 0) Using the remaining regions, each frame in the moving window is ranked by decreasing similarity to the current frame Frame similarity is the sum of the inter-region similarities The number of pre-frames that are ranked in the top half of the rankings is monitored When a cut is passed, the number of top ranked pre-frames (usually) rises to a maximum and falls to a minimum within a few frames We have determined an optimum window size and optimum thresholds that are effective for all our training sets Our cut detection is (now) parameter free

6 Gradual transition detection Pre-frames and post-frames are combined into two distinct sets of frames. The average distance of each set to the current frame is computed We use all frame regions (with identical weights) The ratio between the pre-frame set distance and the post-frame set distance, the PrePostRatio, is monitored The end of most gradual transitions is indicated by a peak in the PrePostRatio curve We maintain a moving average PrePostRatio for calculating a dynamic threshold to detect transitions As a final decision step, we require a minimum difference between the last frame of the previous shot and the first frame of the new shot

7 PrePostRatio in detail A schematised dissolve between a shot A and a shot B: The PrePostRatio is usually minimal at the beginning of a gradual transition and rises up to a maximum at the end of the transition

8 PrePostRatio curve example The curve shows two short gradual transitions and two cuts within a range of 1000 frames

9 Training and Evaluation We have trained on the TRECVID 2003 shot boundary test set Main parameters for gradual transition detection are The query window size The size of the history buffer for dynamic thresholding A threshold level factor Results are discussed on the next slides. (We achieve similar and better results on the 2002 and 2001 test sets in blind runs.)

10 Results at TRECVID 2004 AllCutsGradual Transitions SysIDRecallPrecisionRecallPrecisionRecallPrecision rmit10.9150.8290.9440.9220.8520.671 rmit20.9010.8500.9440.9210.8100.714 rmit30.9070.8590.9440.9210.8280.738 rmit40.8930.8700.9440.9210.7830.762 rmit50.8970.8770.9440.9210.7980.782 rmit60.8830.8850.9440.9210.7530.802 rmit70.8890.8900.9440.9210.7720.819 rmit80.8710.8930.9440.9210.7150.824 rmit90.8810.8990.9440.9210.7460.844 rmit100.8600.9000.9440.9210.6810.844

11 Overall results

12 Frame recall and precision for gradual transitions

13 Discussion Cut detection is highly effective This year, recall is 94% and precision is 92%. Improvements from 2003 due to ignoring centre region Gradual detection has improved significantly since 2003: Recall now between 68%--85%, precision 67%--84% High detection threshold favours precision, low favours recall Short detection threshold history length was found to be preferable Final decision step reduces false positives For television news, we are able to use a fixed moving query window size of 24 frames Experimented with a simple ASR technique in 10 additional runs, which removed detected transitions that coincided with spoken words. Ad hoc, very unsuccessful…

14 Conclusions Disregarding the focus area of frames for cut detection has improved our results by 3% in recall and 9% in precision Our parameter-free ranking scheme is highly effective in cut detection on a wide variety of footage Our gradual transition detection method is relatively simple and needs only few parameters The additional, final preprocessing step reduces false positives and improved results significantly The use of localised histograms and more dynamic thresholding also improved results in gradual transition detection Our approach is computationally inexpensive, simple to implement, and effective 15,500 seconds to process the video (around 4 hours, 18 minutes)

15 Questions?


Download ppt "Video Shot Boundary Detection at RMIT University Timo Volkmer, Saied Tahaghoghi, and Hugh E. Williams School of Computer Science & IT, RMIT University."

Similar presentations


Ads by Google