UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California.

UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center

UC Berkeley CS294-9 Fall 20004- 2 The course so far…. Reminder: All course materials are online: http://www- inst.eecs.berkeley.edu/~cs294-9/ Overview of the DIA Research Field Some applications (Postal Addresses, Checks): Research Objectives: more systematic modeling, design Some basic engineering

UC Berkeley CS294-9 Fall 20004- 3 Some disclaimers: we are not experts contrast w/ computer vision, psychophysical image processing contrast w/ Gestalt theory, human reading, psychophysics of reading

UC Berkeley CS294-9 Fall 20004- 4 Do we attempt to emulate humans by programming? (Ha & Bunke paper) Image acquisition Image transformation Image segmentation feature extraction No, but we reach for similar goals

UC Berkeley CS294-9 Fall 20004- 5 Psychophysical questions Biological, especially human, vision represents the existence proof of algorithms that solve our problems How do brains learn to see/ connect to visual system? (the wiring is not encoded in genes): Self organization seems key.

UC Berkeley CS294-9 Fall 20004- 6 Psychophysical Reading How fast can one read? What about comprehension (typically, above 200wpm comprehension declines) What do we read? words not letters. How is reading disability related with processing (e.g. dyslexia) What if anything has this to do with DIA?

UC Berkeley CS294-9 Fall 20004- 7 Computer Vision: different emphasis from DIA See for example, David Forsyth’s Computer Vision text Computer Vision text recognition of objects, scenes, faces, patterns, visual memory; attention; and visual (and cognitive) pleasure change, motion, relationship to motor activities.

UC Berkeley CS294-9 Fall 20004- 8 Computer Vision: relations Solving CV would solve DIA Solving DIA (more likely in some senses) might serve as a paradigm for CV. At least if we did it in some respectable fashion. Actually recent activity in Speech Understanding seems to be relevant to DIA...

UC Berkeley CS294-9 Fall 20004- 9 Gestalt Theory I Fundamentally, the issue is one of understanding invariance: How can an object, say a square or a triangle, can be recognized regardless of its rotation, translation scale contrast outline or solid rendering texture, motion…

UC Berkeley CS294-9 Fall 20004- 10

UC Berkeley CS294-9 Fall 20004- 11 Gestalt Theory II Biological vision handles these easily. This suggests that invariance is fundamental to our visual representation. E.g. In the case of rotation invariance, perhaps we separately perceive/encode: –structure –orientation

UC Berkeley CS294-9 Fall 20004- 12 Gestalt Theory III We keep track of objects when we turn our head or walk Translation and rotational constancy of the perceived world vs. what received Whatever the computational mechanism it has to account for these issues (+ and -) A A A a a a Context: Univ. of Illinois, Chapter III, 3.l4l59 durnptruck

UC Berkeley CS294-9 Fall 20004- 13 Examples of post-acquisition image analysis Preparation for OCR Not symbol- or character- based (We acknowledge that this is feedforward, and not optimal, but so it goes.)

UC Berkeley CS294-9 Fall 20004- 14 What can we do? Transform the image by local morphological computation Look for more global attributes (e.g. texture and FFT) If possible, do transformations on compressed form.

UC Berkeley CS294-9 Fall 20004- 15 Can we find some tools Finding connected components Boundaries Morphological transforms Thinning or “Skeletonization” (gray-scale) contour following Edge encoding/ vectorization Recursive X-Y cuts

UC Berkeley CS294-9 Fall 20004- 16 e.g. Removing rotation (skew) Some excellent methods (e.g. HSB) Humans notice skew of even a fraction of a degree; it doesn’t inhibit our reading but it DOES make trouble for OCR. Removing skew approximately:  

UC Berkeley CS294-9 Fall 20004- 17 Deskewing / matrix transform    True rotation Again, at 90 degrees Side slip

UC Berkeley CS294-9 Fall 20004- 18 Remove noise (many models) More later (HSB) A few for now –Salt & Pepper –Too much ink (blurred, touching) –Too little ink (broken characters)

UC Berkeley CS294-9 Fall 20004- 19 Removing slant from characters Mask out horizontal lines (optional) Look for “best slant” ABCDEFGHIJKLMN

UC Berkeley CS294-9 Fall 20004- 20 Erode, Dilate, Open, Close Erosion: remove 1 layer of boundary Dilate: add 1 layer of boundary Open: E then D Close: D then E Hit/Miss

UC Berkeley CS294-9 Fall 20004- 21 SE3 3 SE3SE3

UC Berkeley CS294-9 Fall 20004- 22 Objectives: SE1: looks for 3 horizontal dots SE2: identify SE3 & SE4: identify corners SE6: isolate lines 6 units apart.. Or..

UC Berkeley CS294-9 Fall 20004- 23 Segmentation by recursive X/Y Cuts “top down”

UC Berkeley CS294-9 Fall 20004- 24 Segmentation by Smearing Smear horizontally until letters touch, more until words touch  lines Smear vertically until lines touch  paragraphs

UC Berkeley CS294-9 Fall 20004- 25 Smearing example character word line paragraph

UC Berkeley CS294-9 Fall 20004- 26 Canonicalize elongated objects by thinning A A A These should all be “the same” Not useful for squares, circles Perhaps most useful for handwritten data Huge literature, far in excess of what it deserves (relative to usefulness) Nevertheless…

UC Berkeley CS294-9 Fall 20004- 27 Skeletonization Requirements Connected image regions  connected lines Result is minimally 8-connected Approximate “medial lines” Extraneous spurs should be minimized Loss of information makes it not always advisable.

UC Berkeley CS294-9 Fall 20004- 28 Medial Axis Computation For every point P in the object, locate the closest point on the boundary. If there are two such points (at the minimum distance) then P is on the Medial Axis Alternatively, think of pixels as point sources of a wave front. 2 waves meet at the MA.

UC Berkeley CS294-9 Fall 20004- 29 Medial Axis Computation Medial axis and skeletons with 4-distance, 8-distance, Euclidean distance (JR Parker)

UC Berkeley CS294-9 Fall 20004- 30 The computation is fragile The T-shaped object but with one pixel missing

UC Berkeley CS294-9 Fall 20004- 31 Iterative Morphological Thinning

UC Berkeley CS294-9 Fall 20004- 32 Hypermedia image processing reference © http://www.cee.hw.ac.uk/hipr/html/thin.ht mlhttp://www.cee.hw.ac.uk/hipr/html/thin.ht ml

UC Berkeley CS294-9 Fall 20004- 33 Other approaches Cellular automata more generally Geometric computation (voronoi diagrams) Stroke based decomposition/ syntactic generation Computation based on compressed version (RLE boundary), skew on CCs

UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California.

Similar presentations

Presentation on theme: "UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California.

Similar presentations

Presentation on theme: "UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California."— Presentation transcript:

Similar presentations

About project

Feedback