Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optical Data Capture: Optical Character Recognition (OCR)

Similar presentations


Presentation on theme: "Optical Data Capture: Optical Character Recognition (OCR)"— Presentation transcript:

1 Optical Data Capture: Optical Character Recognition (OCR)
Intelligent Character Recognition (ICR) Intelligent Recognition UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

2 Bangkok, Thailand, 15-19 September 2008
Summary Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR Advantages and Disadvantages Intelligent Recognition (IR) Commercial Suppliers UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

3 Definition/Concept of OCR
Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters. Images of the machine printed characters are extracted from a bitmap of the scanned image UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

4 Definition/Concept of ICR
Gives scanning and imaging systems the ability to turn images of hand written characters into machine readable characters Images of the hand written characters are extracted from a bitmap of the scanned image UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

5 OCR and ICR Differences
OCR is less accurate than OMR but more accurate than ICR ICR will require editing to achieve high data coverage UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

6 Bangkok, Thailand, 15-19 September 2008
Forms OCR/ICR has less strict form design compared to OMR No timing tracks Has Registration Marks ICR requires hand printed boxes filled one alphanumeric character per box UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

7 Bangkok, Thailand, 15-19 September 2008
OCR Forms OCR/ ICR is more flexible since: no timing tracks are required The image can float on a page The use of drop color reduces the size of the scanner’s output and enhances the accuracy ICR/OCR technology often uses registration mark on the four-corners of a document, in the recognition of an image UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

8 Bangkok, Thailand, 15-19 September 2008
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

9 OCR/ICR Scanners and Software
Forms can be scanned through a scanner and then the recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters). Users can scan up without doing the OCR Speeds Range from: sheets/min (dependent on the recognition engine) UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

10 OCR/ICR Storage Characteristics
Storage/Retrieval Images are scanned and stored and maintained electronically There is no need to store the paper forms as long as you safeguard the electronic files With OCR/ICR technologies, images can be scanned, indexed, and written to optical media UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

11 Ideal OCR/ICR Accuracy Thresholds
Accuracy achieved by data entry clerks (~99.5%) are approximately equal to OCR/ICR in in perfect tuning (~99.5%) Up to 99.9% accuracy with editing (like OMR) The recognition engine must be tuned, tested and validated very carefully UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

12 Bangkok, Thailand, 15-19 September 2008
OCR/ICR Advantages Advantages Recognition engines used with imaging can capture highly specialized data sets OCR/ICR recognize machine-printed or hand-printed characters. Scanning and recognition allowed efficient management and planning for the rest of the processing workload Quick retrieval for editing and reprocessing UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

13 OCR/ICR Disadvantages
Technology is costly May require significant manual intervention Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting Characters must be hand-printed/machine-printed with separate characters in boxes ineffective when dealing with cursive characters UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

14 Bangkok, Thailand, 15-19 September 2008
OMR-OCR/ICR Compared UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

15 OCR/ICR Challenges/Issues
Has corresponding issues with OMR Algorithm development (Preparation of memory dictionary) Processing time considerations due to recognition engine Development costs UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

16 Definition/Concept of IR
State of the art recognition technology Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters Images of the hand written and cursive characters are extracted from a bitmap of the scanned image The ability to capture cursive make this method unique UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

17 Definition/Concept of IR
eight elements that make up the trajectories of all cursive letters (figure 1) Photo: Parascript LLC UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

18 Definition/Concept of IR
Intelligent Recognition dynamically uses context context is used during the recognition process, improving the accuracy of results Contexts helps to identify letters where the symbol segmentation of an image is ambiguous Photo: Parascript LLC UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

19 Bangkok, Thailand, 15-19 September 2008
Technology Evolution FORM TYPES TEXT STYLES No special form design No constraining boxes or combs Condensed strings Cursive Dirty & Noisy forms Bad quality paper Legacy Forms Bad quality machine print Unconstrained Handprint Specially designed for automatic recognition Constrained Handprint Constraining boxes or combs Drop out ink for preprinted text & boxes Machine Print Intelligent Recognition OCR ICR TECHNOLOGY EVOLUTION Illustration: Conference on Technology Options for 2011 Census UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

20 Major Commercial Suppliers
Top Image Systems (TIS) ( ReadSoft ( Teleform ( Scanner Suppliers Fujitsu, Canon, Bell & Howell, Kodak UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008

21 Bangkok, Thailand, 15-19 September 2008
THANK YOU! UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving Bangkok, Thailand, September 2008


Download ppt "Optical Data Capture: Optical Character Recognition (OCR)"

Similar presentations


Ads by Google