Download presentation
Presentation is loading. Please wait.
Published byJean Barrett Modified over 9 years ago
1
Development of an OCR System Nathan Harmata TJHSST Computer Systems Lab 2007-2008
2
What is OCR? Optical Character Recognition Font and handwriting based
3
Goals of My Project Generic recognition for Latin-based fonts Proper handling of most formatting System built from scratch
4
Overview of Idocrase System
5
Image Processing
6
Transformations Attribute Character Model
7
Transformations Sector Vector - image is parsed into parts that pass the vertical line test - then each part is transformed into a collection of line segments Gap Vector - gaps, if any, are found on the four sides of the image
8
Transformations Pixel Concentration Vector – which sides, if any, have a higher concentration of pixels
9
Character Recognition GCDD – Generic Character Definition Database Averages of Character Models for every character from many different fonts 0 PixelConcentrationVector balanced balanced SectorVector 4 3 GapVector
10
Character Recognition For a single character: For words, dictionary and grammar references are used.
11
Idocrase Application
12
Results -Mediocre word recognition -Doesn’t handle formatting well -Doesn’t handle small letters well -Fairly accurate single character recognition (93.7%)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.