Presentation is loading. Please wait.

Presentation is loading. Please wait.

UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition.

Similar presentations


Presentation on theme: "UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition."— Presentation transcript:

1 UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition Christoph Steinl Vice Director International Enterprise Content Management

2 OCR Optical Character Recognition
Agenda OCR Optical Character Recognition ICR Image Character Recognition DFR Dynamic Form Recognition 11/14/2018

3 OCR = optical character recognition
Technology was first invented in 1929 Gustav Tauschek obtained a patent on OCR in Germany Mechanical device that used templates First commercial system was installed at Readers Digest in 1955 Years later donated to the Smithsonian Institution Today Recognition of machine written text is now considered largely a solved problem Accuracy rates exceed 99% 11/14/2018

4 OCR Beta Systems well experienced with this recognition engines in Banks in Germany OCR A ⑁ Chair ⑀ Hook ⑂ Fork Austria OCR B Plus 11/14/2018

5 ICR Image Character Recognition
The technique is far ahead of OCR because of ongoing development of ICR Handwriting recognition system Allows different styles of handwriting to be learned by a computer during / before processing to improve accuracy and recognition rates 11/14/2018

6 ICR Process: Capturing the image with Scanners
Processing by (ICR) and/or (OCR) Segmentation is a very important step Decision if the homogenous criteria belong to the foreground or to the background Human editors can do that depending on the context Comparable to computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture With the first step only a suitable starting point (sets of pixels) is possible The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence) 11/14/2018

7 ICR Process: Pre-processing Deskew Shift, rotate Stretch 11/14/2018

8 Recognition – Image Pre-processing
Skewed document ... …after alignment Stand / Bearbeiter: (Sing / Kleber / Reim) Status: freigegeben Einsatzbereich: Beta Systems 11/14/2018 8 1

9 ICR Process: Less / More Contrast
Enhance Less / More Contrast Clean up (de-noise, halftone removal) to enable the recognition engine to give best results 11/14/2018

10 Recognition: Noise and box removal
11/14/2018

11 Classification A one was written 90 % = 1 8 % = 7 2 % = 4 ICR Process:
90 % = 1 8 % = 7 2 % = 4 11/14/2018

12 ICR Algorithm: Neural Network Using kNN k-Nearest Neighbour SVM Support Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers 11/14/2018

13 ICR Process: After different classification alternatives the appropriate confidence will be provided Recognition Limitation only for most probable characters e.g. if only characters 3,6,0 are possible the engine can also be limited to this set and the results are much better Voting Machine Usability: security, efficiency and Accuracy 11/14/2018

14 Dynamic Field Recognition
No fixed position is required If form is only ½ available still ½ readable No special Forms are required No timing tracks are necessary on the forms for OMR but results are also available the same time no cleaning of LEDs in the scanner necessary Robust against vertical / horizontal stretching, shrinking and displacement (e.g. Variation in printing) 11/14/2018

15 Dynamic Field Recognition
Recognizes: features (word as pixel cloud) boxes, lines and symbols 11/14/2018

16 11/14/2018

17 Hardware- / Software - Requirement
Scanner PC Network Disc Storage necessary for re-processing and if images are needed for audit purposes Software Scan Software One Recognition and Voting Software for OMR, OCR, ICR, Barcode 11/14/2018

18 Cost Comparatives in general OMR/ICR from dedicated OMR Scanner
OMR/ICR from image OMR/ICR from dedicated OMR Scanner Forms Design Same Forms Production - Up to 50% More Enumerator Training Up to double the cost Scanners PC Low cost PC PC Operators Servers Cost of more/new flexibility low high 11/14/2018

19 ICR Advantages Better than: Manual keying
90 % (plus) correct keys Manual = higher substitution rate than automated recognition Time consuming Deliberate manipulation possible OMR, because OMR is space consuming OCR, because OCR is machine written and therefore of limited use 11/14/2018

20 ICR Advantages Clear accuracy for OMR because of dirt removal by software depending on the mark size and figure Can detect line and can ignore dirt Clear result 11/14/2018

21 ICR Advantages Barcode, OCR OMR, and ICR Recognition with one Software
11/14/2018

22 ICR Advantages Pro: Only rejected characters/fields need correction Rest of the form untouched With new technologies open for future faster, better quality With standardized correction mode Handwriting of the corresponding country will be recognized The previously mentioned advantages do not have to be repeated here again 11/14/2018

23 ICR Advantages : Capture Process
SORM Scan Once Read Multiple Images are Scanned once and stored for re-processing. (disk space is cheap) In several serial sessions parts of the data is collected from the Image (important fields first). Example: SORM Session 1: Fields Age, Sex and Nationality -> provisional partial results SORM Session 2: All other numeric fields SORM Session 3: Alphanumeric fields that need more manual coding (Occupation -> Occupation Code) Each Session Updates the Data files / Database until all data is captured. Faster preliminary results. Less political stress. Faster data for PES planning Analysis of Session 1 results is possible in parallel to Recognition, Coding and Editing of Session 2 Data lifting on different batching levels is possible. (EA, settlement) 11/14/2018

24 Process Stages of Census Surveys
Christoph J. Steinl, Vice Director int. ECM December 2008, Minsk

25 Capture Process Store In (EA Batch Header Creation – EA Paper store Database) Scanning Recognition Verifying Processes The solution Data capture Census Process internal Census data flow Quality assurance 11/14/2018

26 Scanning Kleindienst SC80HC
11/14/2018

27 Scanning Simultaneous creation of up to six images
Optical lens > 10 mm, sharpness-depth-area 3 mm Optical und ultrasonic double feed control Energy saving / live cycle extending Mode Consistent jam handling: no document is lost or double captured due to physical jams cleanness check program: detects white - and black dirt spots 11/14/2018

28 Scanning Pockets 2 – 12 why:
if the document is scanned skewed or de-skewed if the very important questions are filled / are readable (if OMR, OCR, Barcode) if there are fingerprints on the questionnaire or not if the Barcode/OCR/OMR numbers (not ICR) numbers are in the given range if there are double entries – we check the unique number if there are (colour) copies used if there are mismatches in quantity: Batch header shows 50 and only 40 are scanned Transport stop can be programmed to clarify the issue 11/14/2018

29 Scanning Customer: I have a printer and print since long my own questionnaire …I learnt from the internet that it is just a matter of software … Before printing we should be consulted to give best advice, we will test and optimize. Single side printing or higher scaled paper is necessary (shine through factor = opacity) Paper should be white without any spots inside Discuss different methods before making big investments 11/14/2018

30 Census Process internal
Form Type Analysis Structure Analysis ICR voting Batch Job Processing ICR 1 ICR 2 Editing / Coding Logical Result Analysis ICR Result Analysis Output Assembly 11/14/2018

31 The Path to Recognition
Analyze the structure of documents for identification 11/14/2018

32 The Path to Recognition
Perform proper clean-up and image pre-processing Analyze individual page layout Dynamically locate fields of interest Character recognition :numeric handwriting voting of two ICR Engines and also with OMR. Compile results 11/14/2018

33 Verifying Processes Unique Number – double Scan check
Double feed check Check if Copy Trace of all editing work Logical checks Completeness checks Reports 11/14/2018

34 The Solution: SC80HC + FC Census
FC Census recognition DevInfo Work Data Storage & DB Data + Images Preparation: cut & jogg CSPro Export Batch Header Archive Data Storage & DB Paper Archive Formx Redatam TAPE Editing Local reports 11/14/2018

35 Data capture Data Processing Centres in different locations
Peak period 3 shifts, average 1-2 shifts Local operators trained by our supervisors Supervisors local Central support from Lab Training & documentation realised in advance Help to design the documents 11/14/2018

36 Thank you for your attention
11/14/2018


Download ppt "UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition."

Similar presentations


Ads by Google