Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Introduction to Optical.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

SADC Course in Statistics The Use of Optical Character Recognition Technology In National Statistical Offices.
WORKCENTER 7755 – Walk-Up Training COPY Cheat Sheets.
INTRODUCTION ABOUT OMR. INDEX  Concept/Definition  Form Design  Scanners & Software  Storage  Accuracy  OMR Advantages  Commercial Suppliers.
Commercial Data Processing Lesson 2: The Data Processing Cycle.
Integrated Imaging and Document Management System Product Demonstration.
TRACK 2™ Version 5 The ultimate process management software.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Input to the Computer * Input * Keyboard * Pointing Devices
AUTOMATIC DATA CAPTURE  a term to describe technologies which aim to immediately identify data with 100 percent accuracy.
1 Mobilizing Resources for Censuses: Strategies for Reducing Census Costs/ Perspectives of Donor Countries Based on Japanese Experience Takehiro Fukui.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
1 Introduction to Computers Day 2. 2 Input Devices Input devices are used to feed data and instructions to the computer systems.They consist of a range.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
1 Use of scanning technology for data capture ICR System (Intelligent Character Recognition) Information and Communication Technology Center National Statistical.
Unit 30 P1 – Hardware & Software Required For Use In Digital Graphics
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
 Optical Scanners Optical Scanners  Scanners Scanners  Electronic Tablet/Pen Electronic Tablet/Pen  Digital Camera Digital Camera  Webcam Webcam.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Sterling Chadee Director of Statistics. The processing of the data from the field enumeration began in July 2011 until September All data processors.
AS Module 2 Information; Management and Management and Manipulation or what to do with data, how to do it, and……... ensure it provides useful information.
Input Devices Manual and Automatic By Laura and Gracie.
True OMR Second Darkest Mark Detection For Erasure Analysis.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
Scanning Technology and Its Application in Ethiopia Yakob Mudesir Deputy Director General Central Statistical Agency of Ethiopia
© Beta Systems Software AG Process Stages of Census Surveys Richard J. Lang, International Manager September 2008, Bangkok.
System Analysis and Design
Data Capture Overview United Nations Statistics Division
Topics Covered: Data processing and its need Data processing and its need Steps in data processing Steps in data processing Objectives of data processing.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Data Capture Understand the concept of data encoding. Describe methods of data capture and identify appropriate contexts for their.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
0 Paper rocess Scanner Throughput P eople PP P Effective Scanner Throughput Consider KOFAX – VRS (Virtual Re-Scan) Increase Productivity.
Census Data Processing: Contemporary Technologies for Data Capture Bangkok, Thailand September, 2008 By Jatan Kumar Saha Systems Analyst Bangladesh.
Status of Data Capture Technology in Population and Housing Censuses in the ESCAP region Statistics Division ESCAP.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
Slide 1 A Free sample background from © 2003 By Default! HANDLING DATA IN INFORMATION SYSTEM 19 July 2005 Tuesday Lower 6.
 ReadSoft 2004 Processing census forms.  ReadSoft 2004 ReadSoft Corporate Profile n Swedish company - founded1991 n Listed in Stockholm stock exchange.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
Memory Random Access Memory (RAM) and Read Only Memory (ROM)
OMR Scanner vs. Image Scanner + OMR Software. Data Collection Systems OMR Scanners OMR Software.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 How to Structure, Design.
 At the end of the class students should:  distinguish between data and information.  explain the characteristics and forms of Information Processing.
DATA COLLECTION Data Collection Data Verification and Validation.
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Databases.
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Optical Data Capture: Optical Character Recognition (OCR)
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Data Capture Process Stages
Data Capture - ICR Typical Workflow
Optical Data Capture: Optical Mark Recognition (OMR)
DATA RECORDS & FILES By Sinkala.
The ultimate in data organization
Presentation transcript:

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Introduction to Optical Character Recognition (OCR)

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Summary  Overview of OCR  System Requirements  Advantages and Disadvantages  Operation and Management  Questionnaire Design and Preparation  OCR Field Operation  OCR Country Outlook

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR (Optical Character Recognition)  Function & Features of OCR/ICR  ICR, OCR and OMR Compared Optical Mark Reader (OMR) OCR/ ICR

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR (Optical Character Recognition)  Also referred to as Optical Character Reader  “…a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form.” (UNESCAP, Pop- IT project, )  Intelligent Character Recognition (ICR) is used to describe the process of interpreting image data, in particular alphanumeric text.  Sometimes OCR is known as ICR

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Functions & Features of OCR  Forms can be scanned through a scanner and then the recognition engine of the OCR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).  The technology provides a complete form processing and documents capture solution.  Allows an open, scaleable and workflow.  Includes forms definition, scanning, image  pre-processing, and recognition capabilities.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 ICR,OCR and OMR Differences  ICR and OCR are recognition engines used with imaging;  OMR is a data collection technology that does not require a recognition engine.  OMR cannot recognize hand-printed or machine-printed characters.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Optical Mark Reader (OMR)  Forms An OMR works with a specialized document and contains timing tracks along one edge of the form to indicate scanner where to read for marks which look like black boxes on the top or bottom of a form. The cut of the form is very precise and the bubbles on a form must be located in the same location on every form.  Storage With OMR, the image of a document is not scanned and stored.  Accuracy OMR is simpler than OCR. designed properly, OMR has more accuracy than OCR.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR/ ICR  Forms OCR/ ICR is more flexible since no timing tracks or block like form IDs required. The image can float on a page. ICR/ OCR technology uses registration mark on the four- corners of a document, in the recognition of an image. Respondents place one character per box on this form. The use of drop color reduces the size of the scanner’s output and enhances the accuracy.  Storage/ retrieval If the document needs to be electronically stored and maintained, then OCR/ ICR is needed. OCR/ICR technologies, images can be scanned, indexed, and written to optical media.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OMR-OCR/ICR Compared

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 System Requirements  Minimum capacity PC Requirements: Processor: Pentium 200 MHz RAM: 32 MB Disk: 4 GB Form modules are designed to operate in a batch processing; Run under LAN and PC based platforms and take full advantage of the graphical user interface and 32 bit processing power available with most Windows versions.  Software:  OCR with ICR capability software  Questionnaire Design Software

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 System Requirements (cont.)  Scanner OCR scanners with minimum capacity: Duplex scanning Speed: 60 sheets/ min Automatic Document Feeder (ADF): Scanning can take a significant amount, and the system lets user scan up without doing the OCR.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Advantages and Disadvantages  Advantages of Using Images Rather Than Paper Quicker processing; no moving or storage of questionnaires near operators Savings in costs and efficiencies by not having the paper questionnaires Scanning and recognition allowed efficient management and planning for the rest of the processing workload Reduced long term storage requirements, questionnaires could be destroyed after the initial scanning, recognition and repair Quick retrieval for editing and reprocessing Minimizes errors associated with physical handling of the questionnaires

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Advantages and Disadvantages  Disadvantages of Using Images Rather Than Paper Accuracy  While OCR technology can be effective in converting handwritten or typed characters, it does not give as high accuracy as of OMR for reading data, where users are actually marking forms  Additional workload to data collectors OCR has severe limitations when it comes to human handwriting  Characters must be hand-printed with separate characters in boxes

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management  OCR Process Stages Document Scanning process  Scanning speed will be determined by the quality of the scanner machines, the size of non-drop out color. Paper quality, cleanness, weights. Recognizing process  The recognizing process is to interpret images. The right memory (dictionary) and the configuration threshold will determine the accuracy of interpretation of the ICR. Verifying Process  To compare the value of the interpreted image with the real image of the form.  Processing can be in geographic order or in random order.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management (cont.)  Image Manipulation Electronic questionnaires can be sent to specialist operators then back to the original operator if necessary Same questionnaire can be worked on simultaneously by two or more persons Electronic questionnaires are readily available for post census analysis (easier access to questionnaires) Parts of various questionnaires on screen at once for inter record editing Able to view the relevant field book entry on screen in conjunction with questionnaires which is helpful for coding and editing

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management (cont.)  Coding Assistance The problems are simpler for the operator to identify Can use images of questions that will not be captured (scanned but not recognized) to help the coding process. ex, light pencil. Operator can magnify images to read characters not discernible to the naked eye Appropriate software ensures that the data is validated as the forms are read. Checks to ensure selections on a form are filled in. Possible to distinguish between intended marks and marks that have been erased.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management (cont.)  OMR Scanner Speed  Factors Skew: Each document is moved from an automatic feeder into ascanner and angle of skew is sometimes introduced. De-skew: Analyze the image bit- map, calculates and returns the angle of skew up to +/-25. Example. De-skew often refer to %, which is the pixel shift. 10% is a 20-pixel shift in a line of 200 pixels or one tenth of an inch in an inch long line.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management (cont.)  Landscape Detection and Auto Rotation: landscape detection will automatically detect and rotate appropriate images 90 degrees.  White Page Detection: Normally, a double-sided scanner creates two images per scanners page. However, if the back or front page is blank, there is no need to store this image.  White page detection Allows the user to avoid storing blank page.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Operation and Management (cont.)  Other Factors Automatic Image Registration De-Speckle and Shade Removal Character Enhancer Cost Savings Automatic processes to improve recognition rates Voting techniques, Multiple engines, Learning

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Questionnaire Design and Preparation  Drop Out Color Usually red- the color facility in OCR system that allows the system to pick up only the meaningful information from an OCR form. The system doesn't need to know the values including tick boxes written in the drop out color. The OCR system only needs to see the black parts, and compares them to specifications to see parts that are filled or written.  Characters or Marks Considering the speed of the data capture process and to reduce rates, it is advisable to use marks or “ticks” as much as possible

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Questionnaire Design and Preparation (cont.)  How to Obtain Good Results of Scanning Select adequate paper quality; Reliable printing press. Appropriate ink, considering drop out color, for the questionnaires paper heavier than 80 grams per square meter can help avoid paper crashes or over read the other side of a single page.  Form Design Advise Number items to be included in a form; Design size of boxes for each character answer carefully. Define drop out color properly; use registration marks. Pre-print the codes near the place where the box for ticks are located Maintain consistent pattern in which the information to be collected will be located. Do not disturb the visibility of the ticks and marks with titles, labels or instructions. Avoid putting "answers" of one field to another page of the questions; Avoid using open ended questions

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Field Operation  Training for Collection and Processing Staff Basic software, scanner operations, including installation and troubleshooting. Applications with emphasis on the development of custom applications including: configuring nonstandard forms Pre-marking of forms, use of overprinting customize forms Processing of surveys Crating custom outputs file formats

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Field Operation (cont.)  Reasons of Error- Reading of OCR  Bad condition of the form because of dirt, folded, crumple, etc.  Forms fed into OCR scanner are not straight (at an angle); Incompletely filled  Reduce Error-Reading of OCR  Checking the questionnaires for completeness and consistencies; Preparation of own memory (dictionary); Defining permissible margins of OCR reading errors  Particular Care in Writing Numbers or Alphabetic  One box contains only one character; Characters should not extend outside designated boxes; Unnecessary lines of characters such as points, decorative strokes, hooks, etc. are prohibited. Strokes should not be ended with flourishes or extensions.  All lines should be connected without breaks; All lines or dots should be pressed with the same pressure.  Value Checking Steps: Verify that the information captured by OMR is the same with the questionnaire  Control for Blank: If the information is blank, what type of control must be taken.  Control steps should be taken if the information image is partial or no information to assure the quality of generated files.  Missing Questionnaire; Make sure that the entire questionnaires are scanned  completely, no missing and no duplication as well.  Therefore control procedures including to produce control tables to compare with manual work.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Country Outlook  Countries using optical mark recognition (Greece)  Countries using optical character recognition (Croatia- in use for the next census round) (Japan-out-sources entire process and in use for the next census round)  Countries using both Belgium  Countries planning to use OCR Tajikistan (Tonga) looking to introduce and use OCR for our next Census

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Country Outlook  Common device/scanner and software used by NSOs (Croatia) KODAK DS3520 bitonal scanners, IBM IFP (intelligent Forms Processing) (Greece) OMR- devices/scanners were ‘’axm 990/995’’ with FORM/ AXF/ ADELE+ software (New Zealand) Kodak scanners i830 and i scanning and raw data capture process (recognition aspect) were outsourced.- For the next census -end scanning and data capture process will more than likely be outsourced but it really is a variation to a current supplier agreement. (Belgium) AGFA (high resolution) scanner

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR in Use  Editing method used for the census (Japan) cold-deck method, hot-deck method, etc. (Croatia) in house developed – logical checking and automatic and manual correcting (Greece) via PC- editor (officer of N.S.S.G.) confirms or rejects a non-accurate value or inputs a missing one. (New Zealand) mixture of micro and macro editing practices. Individual responses may have range or validity edits, inter- field edits and also inter-form edits (within a household). Macro editing is particularly used during the data evaluation process and data may be reprocessed as a result of this

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Country Outlook  Common commercial or free software used in OCR (Croatia) Use ACTR (automated coding by text recognition) for coding -software developed by Statistics Canada. (Greece) Commercial software, after an open bidding, according to the budgetary plan of the population census (New Zealand) IBM Intelligent Forms Processing (IFP) system through an established user agreement. (Belgium) IRIS (Image Recognition Integrated Systems)

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 OCR Country Outlook Concerns/issues with the use of optical character recognition for data capture for the census? (Japan) Speed of data capture and recognition, recognition accuracy of Japanese characters, etc. (Greece) OMR -related to the optical recognition of numbers, the rapidity of optical recognition itself and the electronic storage of the questionnaires. (Tajikistan) Getting equipment and training. (Samoa) Not enough financial support and technical human resources.

Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 THANK YOU!