CENSUS DATA ANALYSIS TOOLS, AREAS, ISSUES & NEEDS Neena Sharma, IAS Director of Census Operations, Uttar Pradesh Office of the Registrar General & Census Commissioner, India
1 Data Collection 2 Data Processing 3 Data Analysis 4 Data Dissemination Stages in Census Operations-2011
DATA COLLECTION
Census of India 2011-Data Collection Census 2011 is the 15 th Census of India since 1872 Census 2011 was held in two phases: – Houselisting & Housing Census (April to September 2010) – Population Enumeration (9 th to 28 th February 2011) Reference Date: 0:00 Hours of 1 st March 2011 – In Snow Bound areas the Population Enumeration was conducted from 11 th to 30 th September 2010 Reference Date: 0:00 Hours of 1 st October 2010
Some Facts about Census 2011 CostUSD 445 Mn Cost per personUSD 0.37 No. of Census Functionaries2.7 Mn No. of Languages in which Schedules were canvassed 16 No. of Languages in which Training Manuals prepared 18 No. of Schedules Printed340 Mn No. of Training Manuals Printed5.4 Mn Paper Utilised8,000 MTs Material Transported10,500 MTs
Yes. We have been counted !!!!!!
DATA PROCESSING
Indian Census - Always been in the forefront of using latest technology 1961 Census – Unit Record machines used 1971 Census – Key-punching (electrical cum mechanical) machines used – An IBM 1401 computer with IBM card Reader used 1981 Census – Data Entry made using Key to Disk machines. Processing by HP 1000 CD-Cyber 730 & NEC Computer System at NIC Capturing Information and Processing huge volume of Census Data
1991 Census - Medha 930 Main Frame Computer System used for Data processing. Unix based dumb terminals used for data entry 2001 Census – First large country to use image based Automatic Form Processing Technology, High Speed Duplex Scanners used for image capturing 2011 Census – Using more developed ICR Technology with advanced features. Capturing Information and Processing huge volume of Census Data
Scanning ASCII FILE Prepare Batch Recognition Tiling Completion Exception Export / Archival Census Data Processing-17 locations
The unique TILE module optimize data accuracy with a systemized display of characters grouped together to allow easy identification Possible to identify which characters are correct and which are not and allows to mark as reject. Makes the completion more accurate Tile
TILING STATION IMAGE BASED FORMS PROCESSING
DATA ANALYSIS
Provisional Population Totals for India and States compiled from Enumerator’s Abstract manually declared within about four weeks Population, 0-6 population, No. of literates Filled-in Schedules are collected, scanned and processed in two phases – Houselisting & Housing Census and Population Enumeration Extensive quality check and data validation undertaken CSPro software used for tabulation More than 300 tables to be published on Census 2011 at National, State, District levels including Primary Census Abstracts Data Analysis
Administrative Units in India Country State District Sub-districtC D Block Town Village Ward Panchayat Village
Number of Administrative Units in Census 2011 States/UTs35 Districts640 Sub-districts5,924 Towns7,935 Villages0.64 million Number of Administrative Units in India
Population Age Marital Status Scheduled Castes Scheduled Tribes Mother Tongue & Language Religion Village Directory and Town Directory Literacy & Educational Status Economic activity Migration & Urbanization Fertility & Mortality Disability Housing Availability of amenities. Census - Not merely a head count Biggest source of comprehensive data with information on
Census creates two separate databases Houselisting & Housing Census Data (at Household level) (April to September 2010) Population Enumeration Data (at individual member of the Household level) (February 2011) In Census 2011, attempt is being made to link these two databases to cross-tabulate information (an issue in the past to be tested now) Possible to tabulate cross tabs on Condition of Housing with Economic Condition, etc Issues in Data Analysis
Boundary of the Enumeration Areas (EA) kept unchanged during the two phases of operation Provision made in the Household Questionnaire (Phase 2 Operation) to record the Household Number marked in the Phase 1 Operation The EA and HH Numbers to serve as link fields in the two databases Issues in Data Analysis
Generating time series tables from the previous censuses As boundary of Enumeration Areas (EAs) are not permanent – it is not possible to link the EA from one census to the next EAs are carved out on the basis of population size and therefore if the population changes the number of EAs carved out also varies Consequently, every Census has generated stand- alone databases Issues in Data Analysis
New districts, sub-districts, towns and villages have been created and has impeded time series analysis Number of these administrative units have changes significantly over the last three Censuses An attempt is underway to link the databases available since 1991 Census on jurisdictional changes up to Town and Village levels. Issues in Data Analysis
In Census 2001, 1%/5% micro-data files on housing census released – India and States (1% data) – States and Districts (5% for large states and 10% for smaller states) Sample micro-data files from census on population enumeration not released in public domain – Planning to make available micro-data files for research in institutions/universities through work-stations Issues in Data Analysis
Needs Linking of files pilot-tested Enhancing capacity of staff members in data processing and analysis unit in SPSS, SAS etc. at national and state levels Organizing jurisdictional changes (redistricting) for trend analysis
Developing architecture for data warehousing and mining to enable trend and in-depth analysis – Feasibility study to be undertaken Support in setting up work-stations for research in micro-data (anonymized) - good practices from other countries Needs
Thank you