Text Detection in Video Min Cai
Background Video OCR: Text detection, extraction and recognition Detection Target: Artificial text Text detection: Detect the region from Single frame Refine the region by combining consecutive frames
Existing Work Feature ExtractionText Detection based on feature ColorConnected-component TextureTexture-Segmentation EdgeTop-Down Bottom-Up
Connected-component-based methods Basic idea Treat text as an uniform color (color level) and classify each pixel as text or non-text according to the color value. Combine connected text-pixels into connected components. Group collinear connected components into a text string. Advantage Can detect an arbitrary orientation text ---- with similar color and in a simple background. Disadvantage Sensitive to color variance Lossy compression of video introduces color bleeding Complex background
Texture Segmentation method Basic idea Treat text as a type of texture Use texture segmentation algorithms to detect text Gabor Filter Gaussian derivatives Advantage Can segment text areas & graphic areas in a simple background efficiently. It is usually used in document analysis. Disadvantage Time-consuming Cannot handle well a text embedded in various background.
Bottom-Up method Basic idea A seed region is defined as a small region with high edge density. Grow a seed region into successively larger components until all seed regions are reached on the image. Advantage It is a generic method to detect a homogeneous object of various shape. That is, it can detect not only a rectangular object, but also other shapes. Disadvantage Sensitive to noise. Can not handle the large range of font-size. Sensitive to the stroke density (different language).
Top-Down method Basic idea Based on run-length smoothing algorithm Analyze horizontal and vertical projection profiles Advantage Can detect the boundary of horizontal alignment text string quickly and correctly Noise insensitive Disadvantage Cannot handle diagonal alignment text. One pass of horizontal & vertical projection cannot handle the complex layout.
Analysis (1) A certain contrast against background Artificial text strings are designed to be read easily A certain stroke density Text strings always appear horizontally Spatial cohesion Characters of the same text string are of similar heights, orientation and spacing Size constraint Text strings have certain size restriction A text string appears in multiple consecutive frames and the similar position.
Analysis (2) ProblemsResolutions How to extract more useful edge?Local Thresholding How to highlight text areas?Text area recovery How to detect text regions fast and correctlyHow to detect text regions fast and correctly? Coarse-To-Fine detection
Single Threshold
Local threshold (1) Use a small kernel (red) to scan the whole image. In a bigger window (gray) surrounding the kernel, calculate the local threshold corresponding to its local histogram. a. Window move MINMAXT-local Count Edge strength 0 Low half High half b. Local threshold selection
Local threshold (2)
Text-like area recovery (1) Before recoveryAfter recovery
Text-like area recovery (2) Before recoveryAfter recovery
High pass filter
Using Top-down scheme to detect text-like areas Coarse-to-Fine detection Horizontal project Vertical project Can divide? The first region from the array Add to Processing array Initial: Add the whole Image to processing array Add to result array Yes No
Detect text-like areas b. Coarse vertical projection 1) 2) 3) 4)
Refinement Combine the neighboring text areas with similar height Using size constraints to remove unsatisfied areas
Multi-frame analysis Text region matching Find all the regions corresponding to the same text Text region enhancement Enhance the text image quality by multi-frame integration Repetitive text elimination Only record the text at its first emergence.
Thank you! End