Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008.

Slides:



Advertisements
Similar presentations
Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
Advertisements

SADC Course in Statistics The Use of Optical Character Recognition Technology In National Statistical Offices.
AQA INFO 1 SECTION 4 Selection & Use of Input devices and media tcowling 2009 from Mott, Leaming & Williams.
TIS in the Finance Department. Cashflow management –DSO v DPO –Visibility of information –Manage accruals Resource management –Staff turnover –Risk, training.
Software Engineering CSE470: Process 15 Software Engineering Phases Definition: What? Development: How? Maintenance: Managing change Umbrella Activities:
INTRODUCTION ABOUT OMR. INDEX  Concept/Definition  Form Design  Scanners & Software  Storage  Accuracy  OMR Advantages  Commercial Suppliers.
HARDWARE INPUT DEVICES ITGS. Strand 3.1 Hardware Input Devices Keyboards Pointing devices: Mice Touch pads Reading tools: Optical mark recognition (OMR)
Commercial Data Processing Lesson 2: The Data Processing Cycle.
Data Capture Methods. In this topic, we will be looking at: Methods of data capture When it would be appropriate to use each method Advantages and disadvantages.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
AUTOMATIC DATA CAPTURE  a term to describe technologies which aim to immediately identify data with 100 percent accuracy.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
Input devices, processing and output devices Hardware Senior I.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Introduction to Optical.
Hardware, Software & Automatic input devices LO: Recognise hardware, software. Learning outcome: Correctly identify hardware and software. Recognise and.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Android Core Logging Application Keith Schneider Introduction The Core Logging application is part of a software suite that is designed to enable geologic.
INPUT DEVICES. KEYBOARD Most common input device for a computer.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
1 Use of scanning technology for data capture ICR System (Intelligent Character Recognition) Information and Communication Technology Center National Statistical.
OCR GCSE ICT DATA CAPTURE METHODS. LESSON OVERVIEW In this lesson you will learn about the various methods of capturing data.
 By the end of this, you should be able to state the difference between DATE and INFORMAITON.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
True OMR Second Darkest Mark Detection For Erasure Analysis.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
IN THE MEANTIME…. INTERIM SOLUTIONS TO AUTOMATED DATA CAPTURE.
Using OCR for Census Data Capture in China National Bureau of Statistics of China.
© Beta Systems Software AG Process Stages of Census Surveys Richard J. Lang, International Manager September 2008, Bangkok.
DATA COLLECTION METHODS CONTENT PAGE How data is collected via questionnaires. How data is collected via questionnaires. How data is collected with mark.
Data Capture Overview United Nations Statistics Division
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
0 Paper rocess Scanner Throughput P eople PP P Effective Scanner Throughput Consider KOFAX – VRS (Virtual Re-Scan) Increase Productivity.
The Dark Side of Document Imaging: ‘The Hidden Cost of Capture’
© 2006 Formic Wednesday 7th November 2007 Formic Scoop Training Mikey Desai.
By Blake Stratton. Data Chapter The questionnaire is Printed on paper. People write or tick the boxes. Someone needs to type it in the computer. Some.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Paolo Valente - UNECE Statistical Division Slide 1 Technology for census data coding, editing and imputation Paolo Valente (UNECE) UNECE Workshop on Census.
Presenter: Tracy Wessler June 5, 2007 The Use of High Speed Data Processing to Capture Census Data U.S. Census Bureau Decennial Response Integration System.
UNSD Workshop Tanzania June 2008 JOHN GOMERSALL ANDY TYE.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Sri Lanka. History  First Population & Housing Census : 1871  139 years ago  Last Population & Housing Census : 2001  After a lapse of 20 years 
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
© 2008 Lockheed Martin Corporation. All Rights Reserved. Capture Data Quality Session 9 – Data Capture: Process Stages UNSD-ESCWA Regional Workshop on.
 ReadSoft 2004 Processing census forms.  ReadSoft 2004 ReadSoft Corporate Profile n Swedish company - founded1991 n Listed in Stockholm stock exchange.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
Submitted by: Siddharth Jain (08EJCIT075) Shirin Saluja (08EJCIT071) Shweta Sharma (08EJCIT074) VIII Semester, I.T Department Submitted to: Mr. Abhay Kumar.
Unit 2 Technology Systems
DATA COLLECTION Data Collection Data Verification and Validation.
Kofax Express and CaptureBites
UNSD Census Workshop Data Capture: Intelligent Character Recognition
OCR GCSE ICT Data capture methods.
OCR GCSE ICT Data capture methods.
Optical Data Capture: Optical Character Recognition (OCR)
Data Capture Process Stages
Data Capture - ICR Typical Workflow
Optical Data Capture: Optical Mark Recognition (OMR)
Presentation transcript:

Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008

Optical Mark Recognition (OMR) 1991 and 1996 A quick, cost effective and high quality way of capturing households and persons within households Most Census information can be appropriately collected via tick-boxes Technology has become more reliable over time

Optical Mark Recognition (OMR) Quality of printed forms is critical to success Need the right drop out colour and form designed to meet scanner requirements Quality assure OMR process Routine preventative maintenance Need in-house expertise (not a black box) Data Reformat – business rules

Imaging 2001 introduced imaging Images replaced paper forms for all post data capture processing Efficiency gains in Editing and Computer Assisted Coding (CAC) particularly Supports evaluation and validation Efficient reprocessing for errors

Imaging Developed application to capture part of image for a question (“snippets”) Call up form snippets relevant for editing or coding process OHAS issues with intensified screen-based work

ICR and Beyond 2001 and 2006 use ICR (IBM Intelligent Form Processing) for capture of write-in responses Aim is efficiency through Automatic Coding of captured text responses Critical to design form to suit ICR

ICR 1.Recognition engine attempts to recognise characters 2.Run (usually partially) recognised text string against Automatic Coding indexes of anticipated responses 3.If no match go to character repair (computer assisted) 4.Run repaired text against AC indexes 5.If no match go to CAC

ICR - Tactical Deciding the appropriate probability for accepting characters from ICR Deciding what is an acceptable match in AC Deciding when to go to character repair and when to go straight to CAC Understanding and optimising algorithm for AC matching –ABS developed a customised AC algorithm for multi- field coding (eg addresses)

ICR – Quality Assurance Need to understand the solution well to manage quality – develop in-house expertise Substantial investment required in Auto Coding Indexes Comprehensive testing of ICR engine, AC algorithms/indexes and outputs from current “real world” responses Extensive validation effort to detect and correct systematic false positives – will be very visible in output

ICR - Costs 2006 Census purchased 12 Inotec Scamax 510 document scanners. A total of 65.5 million pages were scanned over 71 working days. The images from the scanners were transferred to 18 form processing pcs where the IFP application processed the images using ICR technology. The resultant data and associated images were then loaded into Oracle and image data stores IBM contract and scanners cost $A2.5M

Where to for 2011 Further refine ICR, Auto Coding algorithm and AC indexes to improve AC match rates >>>> less CAC More macro approach to quality assurance Promote eCensus response channel