Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laserfiche Clinic 2006-2007 Liaison HMC, Sept. 12 th, 2006 Adam Field Stephen Smith Ben Tribelhorn, PM Aaron Wolin Advisor: Zach Dodds.

Similar presentations


Presentation on theme: "Laserfiche Clinic 2006-2007 Liaison HMC, Sept. 12 th, 2006 Adam Field Stephen Smith Ben Tribelhorn, PM Aaron Wolin Advisor: Zach Dodds."— Presentation transcript:

1 Laserfiche Clinic 2006-2007 Liaison Luncheon @ HMC, Sept. 12 th, 2006 Adam Field Stephen Smith Ben Tribelhorn, PM Aaron Wolin Advisor: Zach Dodds

2 The Problem To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies. Project goal: raw imageOCR-able image

3 The Problem To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies. Project goal: Some important cases: presence of paperclips and/or staples varied/confusing backgrounds (including stacks of papers) one or more edges off the edge of the image knowing when the system has failed camera perspective issues - documents not images head-on (?) other important cases? raw imageOCR-able image

4 Approach taken by previous clinic Finding document corners Unwarping to 8.5 x 11" Possible approach taken by current clinic First analyzing text-line boundaries Then unwarping to straighten them Approaches Outside - InInside - Out ?

5 Lu and Tan. “Camera Document Restoration for OCR.” http://www.m.cs.osakafu-u.ac.jp/cbdar/proceedings/papers/O1-3.pdf VSBs Camera Document Restoration for OCR Several algorithms use VSBs to detect and correct the image Able to detect the type of distortion or severity of the warping Uses “Vertical Stroke Boundaries” VSBs of characters

6 Lu, Chen, and Ko. “Perspective rectification of document images using fuzzy set and morphological operations.” http://vlab.ee.nus.edu.sg/~bmchen/papers/ivc.pdf Tip point tracing process. Finding Vertical Stroke Boundaries Connected components first Find the "top" and "base" lines for a line of text Scan between the top and base lines, searching for pixels that form relatively orthogonal and straight lines

7 Avila and Lins. “A Fast Orientation and Skew Detection Algorithm for Monochromatic Document Images.” http://delivery.acm.org/10.1145/1100000/1096631/p118-avila.pdf A Fast Orientation and Skew Detection Algorithm Uses connected components and nearest neighbors to find document skew Places the text line angles into two histograms from ±90º Precisions are 1.0º and 0.1º The skew angle is the histogram peak

8 Hand- writing Geometric PerspectiveSkew Magazines/ Newspaper Forms Problem Taxonomy Mostly text documents warp severity document difficulty

9 Hand- writing Geometric PerspectiveSkew Magazines/ Newspaper Forms Problem Priorities ? Mostly text documents primary focus secondary focus warp severity document difficulty

10 Pair 1's plan Finding character strokes Estimating warp severity Thresholding picture from ben and stephen

11 Least-sq. line-fitting Visualizing the processing Finding skew estimates Two-tier assessment 1) reasonable? 2) OCR accuracy picture from aaron & adam Pair 2's plan

12 Tentative Schedule Weekly conference calls with Ed Heaney Accessible codebase and performance updates Other deliverables ? Th 9/21 (11:30 am) Call - progress update T 9/26 Initial presentation @ Harvey Mudd Th 9/28 Prototype of each algorithm F 10/6 ? Site visit and presentation @ Laserfiche

13 Comments?

14 Other Papers

15 Hand Writing Image Warping GeometricPerspectiveSkew Magazin es Forms Plain Text

16 Hand- writing Geometric PerspectiveSkew Magazines/ Newspaper Forms Taxonomy Mostly text documents


Download ppt "Laserfiche Clinic 2006-2007 Liaison HMC, Sept. 12 th, 2006 Adam Field Stephen Smith Ben Tribelhorn, PM Aaron Wolin Advisor: Zach Dodds."

Similar presentations


Ads by Google