OCRdroid : A Framework to Digitize Text Using Mobile Phones Authors Mi Zhang, Anand Joshi, Ritesh Kadmawala, Karthik Dantu, Sameera Poduri, and Gaurav Sukhatme University of Southern California Presenter Mi Zhang
Outline What is OCRdroid ? Related Work Design Considerations System Architecture Experimental Results Summary
What is OCRdroid ? Why? Huge demand for recognizing text in camera-captured pictures Mobile phones are Ubiquitous and Powerful What? OCRdroid = OCR + Mobile Phone Two Applications PocketPal: Personal Receipt Management Tool PocketReader: Personal Mobile Screen Reader
Related Work Design and implementation of a Card Reader based on build-in camera. X.P. Luo, J. Li, and L.X. Zhen Automatic detection and recognition of signs from natural scenes. X. Chen, J. Yang, and A. Waibel A morphological image preprocessing suite for OCR on natural scene images. M. Elmore, and M. Martonosi
Design Considerations Real-Time Processing Lighting Conditions Text Skew Perception Distortion (Tilt) Text Misalignment Blur (Out – Of - Focus)
Real-Time Processing Issues : Limited memory Relative Low processing power Require quick response Our Solutions : Multi-Thread System Architecture Image Compression Computationally Efficient Algorithms
Lighting Conditions Issues : Uneven Lighting (Shadows, Reflection, Flooding, etc.)
Lighting Conditions Our Solution : Local Binarization : Fast Sauvola’s Algorithm
Text Skew Issues : When perspective is not fixed, text lines may get skewed from their original orientation
Text Skew Our Solution : Branch-and-Bound text line finding algorithm + Auto-rotation
Perception Distortion (Tilt) Issues : When the text plane is not parallel to the imaging plane Mobile phones are susceptible to tilts Small Perception Distortion causes OCR to fail
Perception Distortion (Tilt) Our Solution : Use Embedded Orientation Sensor (Pitch and Roll) Calibration
Text Misalignment Issues : Camera screen covers a partial text region Irregular shapes of text characters
Text Misalignment Our Solution : Step#1 : Modified version of Sauvola’s algorithm Top Border Right Border Left Border Bottom Border
Text Misalignment Our Solution : Step#1(Cont) : Routes to perform Sauvola’s algorithm
Text Misalignment Our Solution : Step#2 : Noise Reduction Right Border Left Border Bottom Border Top Border W W
Blur (Out Of Focus) Issues : OCR needs sharp edge response
Blur (Out Of Focus) Our Solution : Android autofocus mechanism
Internet.. OCR Engine – Tesseract Web Server 1. Photo of a receipt 2. Front end processing 3. Upload image 4. Perform Backend Processing & OCR 5. Return OCR Results 6. Results returned 7. Information Extraction Android Phone System Architecture
Camera Preview Orientation Handler Alignment Checker Image Upload OCR Data Receiver Information Extraction Mobile Database Internet Capture Improper Alignment Detected Proper Alignment Detected Front-End Architecture
Back-End Architecture Store Image Skew Detection & Auto-rotation OCR Text Output Binarization Internet Tesseract OCR Engine Sends Results back to Mobile Device Internet
Experimental Results Test Corpus Ten distinct black & white images Three distinct lighting conditions Normal : Adequate light Poor : Dim Flooding : Light source focus on a particular portion of image Performance Metrics Character Accuracy Word Accuracy Timing
Experimental Results Binarization: (Measured by Character Accuracy) Normal: Around 97% Poor: Around 60% Flooding: Around 60% Skew tolerance: Up to 30 degrees Perception Distortion: Up to 10 degrees
Experimental Results Misalignment Detection: Timing Performance: Misalignment Detection: Less Than 6 seconds Overall Process: Less Than 11 seconds
More Information Project scf.usc.edu/~ananddjo/ocrdroid/index.phphttp://www- scf.usc.edu/~ananddjo/ocrdroid/index.php Test Cases & Results Demo Video Paper Presentation Slide Tools Information (Mobile Phone + Software)
Summary OCRdroid – A Generic Framework for Developing OCR- based Applications on Mobile Phones Six Design Considerations & Our Solutions Especially, we advance a new real-time computationally efficient algorithm for text misalignment detection Experimental Results
Questions ?
Thank You