1 Incremental Detection of Text on Road Signs from Video Wen Wu Joint work with Xilin Chen and Jie Yang
2 Acquire Text, Process Text Corpus Languag e (Text) Languag e (Text) Web Visual Speech NLP Translation IR/IE Multimedia Speech
3 Text helps to understand images
4 Why interested in text on signs? Signs are everywhere in our daily life, such as shop names, billboard, street names, etc; Like other information device, road signs are placed to convey information to human for different purposes; Text could be the most flexible way to express dynamic information. Why not make computer to understand those text and further assist human?
5 Too many signs cause problems
6 It happened in Pittsburgh too!
7 Task Automatically detect text on road signs from video.
8 Related work
9 What makes us to detect sign?
10 What do you think?
11 Vertical plane property of signs
12 Divide-and-Conquer Strategy Decompose the original task into two sub-tasks, that is, localization of road signs and detection of text; Propose algorithms for two sub-tasks respectively, integrate them by mapping corresponding feature points; Use features from not only individual 2D images but also temporal dependency between them.
13 Incremental Detection Framework
14 Why incremental? Computation requirement –Detection is a computation-expensive step; –In contrast, mapping correspondence points is a cheap step; Video resolution –Detection requires low resolution –OCR requires high resolution LocalizeDetectRecognize Time
15 System Implementation Prototype Built on a PC with Intel Pentium 4 GHz and 1GB memory, Windows XP; Data: 1) Captured by a DV camera mounted on a minivan. 2) Video frame size is 640*480. 3) The database included about 3 hours of videos, captured in different conditions, i.e., in the morning, afternoon, and dusk.
16 A Demo Demo
17 Sequences of the Demo
18 Incremental vs. Non-incremental Another demo
19 Summary of Evaluation 22 video sequences with different driving situations; Vehicle ’ s speed varies from 20 to 55 MPH Testing data contain ~90 road signs and > 300 words. # of signsHit rateFalse hits %17.9% Hit rateFalse hitsSpeed Non-Incre-80.2%85.6%2-6 fps Incre-88.9%9.2%8-16fps Table 1. Sign localization performanceTable 2. Text detection performance
20 Contributions Proposed a unified framework for automatically detecting text on road signs from video based on the natural characteristics of the task; Exploited features for text detection not only from individual 2D images but also from temporal dependency in video; Made connection between understanding visual information and understanding language (text).
21 Conclusions & Future Work Automatic detection of text on road signs could be very useful in various applications; Experiments have shown that the new framework could significantly improves robustness and efficiency of any existing text detection algorithm; Future work: Apply various language methods to detected texts in video, e.g., translation, IR, etc.
22 Question ? Thank You