Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999
Introduction n Many dialects in Chinese, but Chinese Characters is common in anywhere. n Many video programs have Chinese subtitles nowadays n Extract text from digital video programs can help for indexing, searching and retrieval
Features of Subtitles n Characters are in foreground n They are monochrome n They are rigid, from frame to frame n They are upright n They have size restrictions n They contrast with the background n They appear in clusters at a limited distance aligned to a horizontal line
Steps to Recognise Text n Clearing the background, removing noise n Segmenting the characters n Recognising them by pattern matching
Demo Video n A piece of news from ATV about Airport Authority Hong Kong and is reported in Cantonese n In MPEG format n Action!
MPEG Video n Consisted of a video track and an audio track n Consisted of frames n For video part, a frame is representing a static image
Steps to Remove Background Agnihotri & Dimitrova Suggested 7 steps procedures: n Channel Separation n Image Enhancement n Edge Detection n Edge Filtering n Character Detection n Text Box Detection n Text Line Detection & Enhancement
Sample Frame n The 100th frame of the demo video
Channel Separation n Use Red Channel which gives higher contrast edges n More probably that natural environment are in blue or green Green Channel Red Channel Blue Channel
Image Enhancement n To filter salt and pepper noise n To sharpen the edges n Quality of our mpeg video is quite good that we no need to take this step
Edge Detection n Find out the edges from the image n Use a 3x3 matrice mask [ ] n Use Sobel Filter instead n edges around text may be broken and not connected
Sample Edge Image
Edge Filtering n To remove areas which possibly do not contain text n Characters would give high density of objects, hence high density of edges n Finding out areas with high density of edges which give hints of where the characters located
Density of edges in horizontal lines
Filtering the Irrelevant Edges
Density of Edges in Vertical
What if the length of subtitle is short?? n Cut the image into certain parts and calculate the density of edges in those areas n Prevent the case if the subtitle is short and cannot give an overall view
Sample Image Divided in Parts
Challenges in Chinese Characters Segmentation n Square? n Not Really, they are variable in size!! Having different height and width n e.g.: ( 日, 曰 ) n Lead to some problem in Fixed- Distance Approach Segmentation n More problems if mixed with English, Numbers, and Symbols n e.g. 18 部「 IBM 」電腦
n Usually written in horizontal way, like English. n Do segmentation like English? n English: each character is horizontally linked n Chinese: may not have such linkage n e.g.: 八, 川 Challenges in Chinese Characters Segmentation
Character Recognition Pattern Matching n most straight forward n two pattern are compared n by using pattern distance
Classification for Faster Matching n By blackness (e.g. 一, 鬱 ) n By projection profiles
Possible Enhancement n Picking out the moving objects by keeping track of a number of consecutive frames n Use of lexicon to choose the most possible character
Q & A