A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong.

Slides:



Advertisements
Similar presentations
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 4 – Digital Image Representation Klara Nahrstedt Spring 2009.
Advertisements

Computational Biology, Part 23 Biological Imaging II Robert F. Murphy Copyright  1996, 1999, All rights reserved.
嵌入式視覺 Feature Extraction
Digital Image Processing
Image Segmentation Image segmentation (segmentace obrazu) –division or separation of the image into segments (connected regions) of similar properties.
EI San Jose, CA Slide No. 1 Measurement of Ringing Artifacts in JPEG Images* Xiaojun Feng Jan P. Allebach Purdue University - West Lafayette, IN.
Facial feature localization Presented by: Harvest Jang Spring 2002.
High-level Component Filtering for Robust Scene Text Detection
Image Segmentation Region growing & Contour following Hyeun-gu Choi Advisor: Dr. Harvey Rhody Center for Imaging Science.
A Generic Approach to Detect Edges, Corners and Junctions Simultaneously Jiqiang Song The Chinese University of Hong Kong.
6/9/2015Digital Image Processing1. 2 Example Histogram.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Chapter 10 Image Segmentation.
Text Detection in Video Min Cai Background  Video OCR: Text detection, extraction and recognition  Detection Target: Artificial text  Text.
Image Analysis Preprocessing Image Quantization Binary Image Analysis
Digital Image Processing
Robust Object Segmentation Using Adaptive Thresholding Xiaxi Huang and Nikolaos V. Boulgouris International Conference on Image Processing 2007.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
MSU CSE 803 Stockman Linear Operations Using Masks Masks are patterns used to define the weights used in averaging the neighbors of a pixel to compute.
Objective of Computer Vision
CS223B Assignment 1 Recap. Lots of Solutions! 37 Groups Many different approaches Let’s take a peek at all 37 results on one image from the test set.
Edge Detection Phil Mlsna, Ph.D. Dept. of Electrical Engineering
LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.
Chapter 10 Image Segmentation.
Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Objective of Computer Vision
Iris localization algorithm based on geometrical features of cow eyes Menglu Zhang Institute of Systems Engineering
Image Analysis Preprocessing Arithmetic and Logic Operations Spatial Filters Image Quantization.
Precise News Video Text Detection and Text Extraction Based on Multiple Frames Integration Advisor: Dr. Shwu-Huey Yen Student: Hsiao-Wei Chang 1.
MSU CSE 803 Linear Operations Using Masks Masks are patterns used to define the weights used in averaging the neighbors of a pixel to compute some result.
Batch VIP — A backend system of video processing VIEW Technologies The Chinese University of Hong Kong.
1 DICOM Imaging Pipeline Model Cor loef Philips Medical Systems.
A Tutorial on Object Detection Using OpenCV
Neighborhood Operations
Spatial-based Enhancements Lecture 3 prepared by R. Lathrop 10/99 updated 10/03 ERDAS Field Guide 6th Ed. Ch 5: ;
Medical Image Analysis Image Enhancement Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
Block Loss Recovery Techniques for Image Communications Jiho Park, D-C Park, Robert J. Marks, M. El-Sharkawi The Computational Intelligence Applications.
An efficient method of license plate location Pattern Recognition Letters 26 (2005) Journal of Electronic Imaging 11(4), (October 2002)
Institute of Informatics and Telecommunications – NCSR “Demokritos” TEXT EXTRACTION FROM IMAGES AND VIDEOS Ινστιτούτο πληροφορικής και τηλεπικοινωνιών.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Ron Cohen Ramy Ben-Aroya Ben-Gurion University ICBV 2009 Final Project.
Detection of nerves in Ultrasound Images using edge detection techniques NIRANJAN TALLAPALLY.
Chapter 10, Part II Edge Linking and Boundary Detection The methods discussed in the previous section yield pixels lying only on edges. This section.
Pixel Connectivity Pixel connectivity is a central concept of both edge- and region- based approaches to segmentation The notation of pixel connectivity.
-- CS466 Lecture XXI -- DOT PLOT (addl points) To Show: Repetition Regions of self similarity (same TOPIC blocks) D1D2D3D4 D1 D2 D3 D4 Increasing corpus.
23 November Md. Tanvir Al Amin (Presenter) Anupam Bhattacharjee Department of Computer Science and Engineering,
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
Histograms of Oriented Gradients for Human Detection(HOG)
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
ISAN-DSP GROUP Digital Image Fundamentals ISAN-DSP GROUP What is Digital Image Processing ? Processing of a multidimensional pictures by a digital computer.
Wonjun Kim and Changick Kim, Member, IEEE
Scene Text Extraction Using Focus of Mobile Camera Egyul Kim, SeongHun Lee, JinHyung Kim Artificial Intelligence & Pattern Recognition Lab, KAIST, Korea.
Multi-Classifier Buried Mine Detection Using MWIR Images Dr. Bo Ling Migma Systems, Inc. Mr. Anh H. Trang Mr. Chung Phan US Army RDECOM April 10, 2007.
Machine Vision. Image Acquisition > Resolution Ability of a scanning system to distinguish between 2 closely separated points. > Contrast Ability to detect.
Automatic Caption Localization in Compressed Video By Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Fellow, IEEE IEEE Transactions on Pattern Analysis and.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Introduction To Computational and Biological Vision Max Binshtok Ohad Greenshpan March 2006 Shot Detection in video.
Content Based Coding of Face Images
Medical Image Analysis
Adaptive Median Filter
Edge Detection Phil Mlsna, Ph.D. Dept. of Electrical Engineering Northern Arizona University.
Automatic Video Shot Detection from MPEG Bit Stream
Pat P. W. Chan,  Michael R. Lyu, Roland T. Chin*
CS654: Digital Image Analysis
Research Institute for Future Media Computing
Chair Professor Chin-Chen Chang Feng Chia University
Support vector machine-based text detection in digital video
Jie Chen, Shiguang Shan, Shengye Yan, Xilin Chen, Wen Gao
Presentation transcript:

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong

Related work Text Area Detection –Uncompressed domain methods Texture-based Color-based Edge-based –Compressed domain methods DCT coefficients Number of intra-coded blocks on P- / B- frames Text String Localization –Bottom-up scheme –Top-down scheme

Language-independent characteristics Contrast –An adaptive contrast threshold according to the background complexity Color –Color bleeding caused by compression Orientation –Well-defined size and orientation make it easy to understand Stationary location –Appear a certain long time

Language-dependent characteristics EnglishChinese Stroke density roughly similarvaries dramatically Min(Font size) 10-pixel high20-pixel high Min(Aspect ratio) Relatively largeRelatively small Stroke direction statistics mainly vertical vertical horizontal Left diagonal Right diagonal

Workflow Sampling & color space conversion Multi-frame comparison Video text detection and localization on every sampled frame

A sequential multi-resolution paradigm Level = 2 Level = n-1 Original image Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = 1 Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = n Final text regions with original coordinates Edge detection

Text detection Edge detection –Sobel edge detector Local thresholding –Adaptive to background complexity Text-like area recovery –Enhance the density of text areas

Local Thresholding Use a small kernel (gray) to scan the whole edge map row by row. In the bigger window surrounding the kernel, check the background type: “Clear” or “Noisy”. For Clear background and Noisy background, determined the local threshold by low and high parts, respectively, of the edge strength histogram in the bigger window. 3h3h h Window Kernel (a) Concentric kernel and window P1P1 P 3h (b) A window on the multi-line text area and the horizontal projection in it. (c) Local threshold selection MAX Count Edge strength 0 Low part High part

Thresholding result comparison Video image Local thresholding resultsGlobal thresholding results

Labeling: Classify current edge pixels as “TEXT” and “NON_TEXT” based on its local density. Recovery/Suppression: –Bring back neighboring lower-strength edge pixels of the TEXT edge pixels. –The NON_TEXT edge pixels are suppressed. Text-like area recovery Before recovery After recovery

Coarse-to-fine Text localization Projection-based top-down localization. To handle complex text layout. Divisible? Horizontal projection Vertical projection Pop the first region from the processing array Add to the processing array Initialization The whole edge map is the only region in the processing array. Add to the resulting text regions Y N Each sub-region The region Sub-regions Indivisible regions Y N If the array is empty, terminate. Divisible? Check aspect ratio Y N Discard false regions

Localization steps (1) (2) (3)(4)

Experimental results

Performance statistics Statistics of 10 news videos: Processing time per frame: 0.25 s ( PIII 1G CPU ) Detection rate = = 93.6% Detection accuracy = = 87.2% Localization accuracy = > 90%