UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition.

Slides:



Advertisements
Similar presentations
Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
Advertisements

By: Mani Baghaei Fard.  During recent years number of moving vehicles in roads and highways has been considerably increased.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
INTRODUCTION ABOUT OMR. INDEX  Concept/Definition  Form Design  Scanners & Software  Storage  Accuracy  OMR Advantages  Commercial Suppliers.
Commercial Data Processing Lesson 2: The Data Processing Cycle.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
AUTOMATIC DATA CAPTURE  a term to describe technologies which aim to immediately identify data with 100 percent accuracy.
Input devices, processing and output devices Hardware Senior I.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in data capture.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census.
1 Use of scanning technology for data capture ICR System (Intelligent Character Recognition) Information and Communication Technology Center National Statistical.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Scanner Optimization Dan Krahn Hardware Products Manager.
 By the end of this, you should be able to state the difference between DATE and INFORMAITON.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Sterling Chadee Director of Statistics. The processing of the data from the field enumeration began in July 2011 until September All data processors.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
IN THE MEANTIME…. INTERIM SOLUTIONS TO AUTOMATED DATA CAPTURE.
Using OCR for Census Data Capture in China National Bureau of Statistics of China.
Scanning Technology and Its Application in Ethiopia Yakob Mudesir Deputy Director General Central Statistical Agency of Ethiopia
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
CDP Standard Grade1 Commercial Data Processing Standard Grade Computing Studies.
© Beta Systems Software AG Process Stages of Census Surveys Richard J. Lang, International Manager September 2008, Bangkok.
Data Collection Methods Libby Wise Contents Content Questionnaires Optical Mark Recognition Sensors Optical Character Recognition Bar codes Quick Response.
Data Capture Overview United Nations Statistics Division
UNSD Census Workshop Day 2 - Session 9 Data Capture: Process Stages Andy Tye – International Manager DRS are Worldwide specialists in data capture from.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
CREATING TEMPLATES CREATING CUSTOM CHARACTERS IMPORTING BATCH DATA SAVING DATA & TEMPLATES CREATING SERIES DATA PRINTING THE DATA.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
0 Paper rocess Scanner Throughput P eople PP P Effective Scanner Throughput Consider KOFAX – VRS (Virtual Re-Scan) Increase Productivity.
The Dark Side of Document Imaging: ‘The Hidden Cost of Capture’
Test and Review chapter State the differences between archive and back-up data. Answer: Archive data is a copy of data which is no longer in regular.
By Blake Stratton. Data Chapter The questionnaire is Printed on paper. People write or tick the boxes. Someone needs to type it in the computer. Some.
General Purpose Packages DATA TYPES. Data Types Computer store information in the form of data. Information has meaning. Eg 23 May 2005 Data has no meaning.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
Describe direct data entry and associated devices, e. g
Slide 1 A Free sample background from © 2003 By Default! HANDLING DATA IN INFORMATION SYSTEM 19 July 2005 Tuesday Lower 6.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
OMR Scanner vs. Image Scanner + OMR Software. Data Collection Systems OMR Scanners OMR Software.
Unit 2 Technology Systems
DATA COLLECTION Data Collection Data Verification and Validation.
Business Scanner Proposition Epson Workforce DS-30
S.Rajeswari Head , Scientific Information Resource Division
UNSD Census Workshop Data Capture: Optical Mark Recognition
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING
LECTURE Course Name: Computer Application
Databases.
Databases.
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Optical Data Capture: Optical Character Recognition (OCR)
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Data Capture Process Stages
United Nations Regional Workshop on the 2020 World Programme
Data Capture - ICR Typical Workflow
UNSD Census Workshop Day 2 - Session 6
Optical Data Capture: Optical Mark Recognition (OMR)
Input and Output devices in a Computer
Presentation transcript:

UN Workshop on Data Capture, Minsk Session 15 Data Capture Process with Optical Character Recognition Image Character Recognition Intelligent Recognition Christoph Steinl Vice Director International Enterprise Content Management

OCR Optical Character Recognition Agenda OCR Optical Character Recognition ICR Image Character Recognition DFR Dynamic Form Recognition 11/14/2018

OCR = optical character recognition Technology was first invented in 1929 Gustav Tauschek obtained a patent on OCR in Germany Mechanical device that used templates First commercial system was installed at Readers Digest in 1955 Years later donated to the Smithsonian Institution Today Recognition of machine written text is now considered largely a solved problem Accuracy rates exceed 99% 11/14/2018

OCR Beta Systems well experienced with this recognition engines in Banks in Germany OCR A ⑁ Chair ⑀ Hook ⑂ Fork Austria OCR B + Plus 11/14/2018

ICR Image Character Recognition The technique is far ahead of OCR because of ongoing development of ICR Handwriting recognition system Allows different styles of handwriting to be learned by a computer during / before processing to improve accuracy and recognition rates 11/14/2018

ICR Process: Capturing the image with Scanners Processing by (ICR) and/or (OCR) Segmentation is a very important step Decision if the homogenous criteria belong to the foreground or to the background Human editors can do that depending on the context Comparable to computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture With the first step only a suitable starting point (sets of pixels) is possible The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence) 11/14/2018

ICR Process: Pre-processing Deskew Shift, rotate Stretch 11/14/2018

Recognition – Image Pre-processing Skewed document ... …after alignment Stand / Bearbeiter: 03.03.03 (Sing / Kleber / Reim) Status: freigegeben Einsatzbereich: Beta Systems ---------------------------- 11/14/2018 8 1

ICR Process: Less / More Contrast Enhance Less / More Contrast Clean up (de-noise, halftone removal) to enable the recognition engine to give best results 11/14/2018

Recognition: Noise and box removal 11/14/2018

Classification A one was written 90 % = 1 8 % = 7 2 % = 4 ICR Process: 90 % = 1 8 % = 7 2 % = 4 11/14/2018

ICR Algorithm: Neural Network Using kNN k-Nearest Neighbour SVM Support Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers 11/14/2018

ICR Process: After different classification alternatives the appropriate confidence will be provided Recognition Limitation only for most probable characters e.g. if only characters 3,6,0 are possible the engine can also be limited to this set and the results are much better Voting Machine Usability: security, efficiency and Accuracy 11/14/2018

Dynamic Field Recognition No fixed position is required If form is only ½ available still ½ readable No special Forms are required No timing tracks are necessary on the forms for OMR but results are also available the same time no cleaning of LEDs in the scanner necessary Robust against vertical / horizontal stretching, shrinking and displacement (e.g. Variation in printing) 11/14/2018

Dynamic Field Recognition Recognizes: features (word as pixel cloud) boxes, lines and symbols 11/14/2018

11/14/2018

Hardware- / Software - Requirement Scanner PC Network Disc Storage necessary for re-processing and if images are needed for audit purposes Software Scan Software One Recognition and Voting Software for OMR, OCR, ICR, Barcode 11/14/2018

Cost Comparatives in general OMR/ICR from dedicated OMR Scanner   OMR/ICR from image OMR/ICR from dedicated OMR Scanner Forms Design Same Forms Production - Up to 50% More Enumerator Training Up to double the cost Scanners PC Low cost PC PC Operators Servers Cost of more/new flexibility low high 11/14/2018

ICR Advantages Better than: Manual keying 90 % (plus) correct keys Manual = higher substitution rate than automated recognition Time consuming Deliberate manipulation possible OMR, because OMR is space consuming OCR, because OCR is machine written and therefore of limited use 11/14/2018

ICR Advantages Clear accuracy for OMR because of dirt removal by software depending on the mark size and figure Can detect line and can ignore dirt Clear result 11/14/2018

ICR Advantages Barcode, OCR OMR, and ICR Recognition with one Software 11/14/2018

ICR Advantages Pro: Only rejected characters/fields need correction Rest of the form untouched With new technologies open for future faster, better quality With standardized correction mode Handwriting of the corresponding country will be recognized The previously mentioned advantages do not have to be repeated here again 11/14/2018

ICR Advantages : Capture Process SORM Scan Once Read Multiple Images are Scanned once and stored for re-processing. (disk space is cheap) In several serial sessions parts of the data is collected from the Image (important fields first). Example: SORM Session 1: Fields Age, Sex and Nationality -> provisional partial results SORM Session 2: All other numeric fields SORM Session 3: Alphanumeric fields that need more manual coding (Occupation -> Occupation Code) Each Session Updates the Data files / Database until all data is captured. Faster preliminary results. Less political stress. Faster data for PES planning Analysis of Session 1 results is possible in parallel to Recognition, Coding and Editing of Session 2 Data lifting on different batching levels is possible. (EA, settlement) 11/14/2018

Process Stages of Census Surveys Christoph J. Steinl, Vice Director int. ECM December 2008, Minsk

Capture Process Store In (EA Batch Header Creation – EA Paper store Database) Scanning Recognition Verifying Processes The solution Data capture Census Process internal Census data flow Quality assurance 11/14/2018

Scanning Kleindienst SC80HC 11/14/2018

Scanning Simultaneous creation of up to six images Optical lens > 10 mm, sharpness-depth-area 3 mm Optical und ultrasonic double feed control Energy saving / live cycle extending Mode Consistent jam handling: no document is lost or double captured due to physical jams cleanness check program: detects white - and black dirt spots 11/14/2018

Scanning Pockets 2 – 12 why: if the document is scanned skewed or de-skewed if the very important questions are filled / are readable (if OMR, OCR, Barcode) if there are fingerprints on the questionnaire or not if the Barcode/OCR/OMR numbers (not ICR) numbers are in the given range if there are double entries – we check the unique number if there are (colour) copies used if there are mismatches in quantity: Batch header shows 50 and only 40 are scanned Transport stop can be programmed to clarify the issue 11/14/2018

Scanning Customer: I have a printer and print since long my own questionnaire …I learnt from the internet that it is just a matter of software … Before printing we should be consulted to give best advice, we will test and optimize. Single side printing or higher scaled paper is necessary (shine through factor = opacity) Paper should be white without any spots inside Discuss different methods before making big investments 11/14/2018

Census Process internal Form Type Analysis Structure Analysis ICR voting Batch Job Processing ICR 1 ICR 2 Editing / Coding Logical Result Analysis ICR Result Analysis Output Assembly 11/14/2018

The Path to Recognition Analyze the structure of documents for identification 11/14/2018

The Path to Recognition Perform proper clean-up and image pre-processing Analyze individual page layout Dynamically locate fields of interest Character recognition :numeric handwriting voting of two ICR Engines and also with OMR. Compile results 11/14/2018

Verifying Processes Unique Number – double Scan check Double feed check Check if Copy Trace of all editing work Logical checks Completeness checks Reports 11/14/2018

The Solution: SC80HC + FC Census FC Census recognition DevInfo Work Data Storage & DB Data + Images Preparation: cut & jogg CSPro Export Batch Header Archive Data Storage & DB Paper Archive Formx Redatam TAPE Editing Local reports 11/14/2018

Data capture Data Processing Centres in different locations Peak period 3 shifts, average 1-2 shifts Local operators trained by our supervisors Supervisors local Central support from Lab Training & documentation realised in advance Help to design the documents 11/14/2018

Thank you for your attention 11/14/2018