Session 8 Data Processing

1 Session 8 Data Processing
United Nations Technical Meeting on Use of Technology in Population and Housing Censuses 28t h - 1st December 2016 Statistics Estonia Diana Beltadze

New dimensions Using data sources – some data is taken from registers and not asked from respondents personally, some data is asked personally. Response method – self-enumeration, personal interview and proxy interview are used at the same time. Response technology – via internet and using computers, paper questionnaires are prepared as backup in order to reduce risks.

3 Different prognoses Only experience in Estonia – Doctors’ career study (pilot) >50% Estonian pilot census (2009—2010) % Statistical prognosis – 27% Expert prognoses – from 10% up to – 75% Statistics Estonia’s official prognosis was 25%. Using this prognosis the number of enumerators was decided. Using this prognosis the technical parameters of data channels were calculated

PHC 2011: methodology For the first time in Estonia, PHC 2011 used a combined methodology for the census. Use of registers: In the preparation of the census; In the conduct of the census; In the processing of census data. e-census Interview census better quality of the Census. Persons covered by the Census comprise those whose permanent place of residence on 31 December 2011 is in Estonia, who for the moment of Census have been living or are going to live in Estonia for at least 12 months (incl. illegals and persons with residence permits, except for the diplomatic staff of foreign countries, military and marine personnel of foreign countries and family members living with them in foreign countries), as well as persons who have left Estonia for a foreign country by the moment of Census, but who are not planning to stay away from Estonia more than 12 months; also the diplomatic staff of the Republic of Estonia and military and marine personnel and family members living with them in foreign countries during the Census. During the Census, Statistics Estonia co-operates with 2,236 institutions to enumerate members of institutional households and homeless people. To get better results, all Census employees are trained, because as mentioned already at the beginning of the article, "Everyone counts".

5 The additional opportunities resulting from the use of computers:
By taking into account the information entered in real time, the length of questionnaires to be filled in by individual users can be minimized, as people will not see the questions that they do not have to answer; Implementation of logical controls enables immediate detection of mistakes in answers and presentation of suggestions for correction; The amount of information shown to a person can be optimized by placing additional information in help texts which can be opened only if needed; Classifications displayed in the form of menus can be used for questions with a long list of response options.

6 When the texts were used?
Occupation (description of the work) Area of activity (EMTAK) All other variables: 3—7 values (answers) in questionnaire Long list (classificator) Possibility to add text, when did not find from list In occupation sometimes ironic or humoristic answers were given (internet). Texts were better written in the case of internet

7 Data collection process 5
UNECE meeting in Amman

8 Data collection on the Web 6
UNECE meeting in Amman

9 Progress of the interview census
The enumerators were on schedule: 23.02 – 8 % 11 % 01.03 – 30 % 38 % 08.03 – 60 % 66 % 29.03 – 100 % 99 % Officially, there were 32 people who refused to be interviewed.

10 Logical controls Hard controls (father is younger than son).
You cannot continue without correction! Soft control (difference of spouse’s ages is more than 19 years) Is this true? Possibility to continue. Also a control (in most cases soft): You did not answer the question… Internet version did not have the answers „Do not know“ and „No answer“, interviewers had such buttons.

11 Helpdesk statistics UNECE meeting in Amman

12 Changes in data collection
Pre-filling was used in the e-census. Practically paperless – during the interview census, enumerators entered the answers on a laptop and the data were sent to a raw database. This meant saving on data entry resources. Logical controls were used during the census to prevent most of the possible entry mistakes. The room coordinates of each residential building were determined using GPS.

13 Current issues after data collection
Identification of persons (missing personal codes) Identification of addresses Elimination of duplicates Encoding Majority of identification and encoding exercises are completed by the end of April Data processing up to the 31 of October 2012.

14 Data arrangement UNECE meeting in Amman

15 Our system Population register VVIS Buildings register, Land Board
Data collection system, primary data correction Education register VAIS Databases for EUROSTAT and local customers Data correction system, imputations, controls etc

Data processing The first stage of data processing, primary data arrangement, is a parallel activity during data collection and immediately after that and as a result the collected data will be prepared for the next stages of data processing. The primary data arrangement includes automatic data control and manual processing by operators (handling of duplicates, coding). A separate step is controlling the eliminated questionnaires, which is carried out by quality control managers.

17 Data arrangement Automatically and manual
arrangement of addresses - do all the entered addresses exist identification of persons - is the personal identification number of every person valid coding separating duplicates Separating incomplete and interrupted questionnaires from complete questionnaires Adding incomplete and interrupted questionnaires to the enumerators work tasks Data arrangement started on the first week of eCensus and was proceeded during the whole eCensus period,

18 Quality of raw data Response rate of variables more than 99%.
In general, in internet higher than in the case of interview.

19 Primary data correction team
Data correction manager Operator for ID codes Coders Operator for addresses Operator for duplicates Coder Senior coder UNECE meeting in Amman

20 Data correction manager
shares roles and tasks in IT system shares tasks by dictionary between coders helps operators to solve more difficult problems activates automatic controls for finding duplicates

Different operators The operators are responsible for resolving any conflicts in the received data: ID operators check and add personal ID codes from Population register data; address operators check and correct addresses of buildings; duplicate operators solve automatically unsolvable duplicate tasks; coders encode text answers according to prescribed rules.

22 Encoding of text answers
Variable Manual % Automat % Occupation 98,1 1,9 EMTAK 91,9 8,1 Religion 68,0 32,0 VS-RTK 50,4 49,6 Citizenship 71,4 28,6 Language 0,4 99,6 Nationality 73,3 26,7 Dialect 52,9 47,1 Address 100,0 0,0

23 Conclusion:different ways to avoid and correct mistakes
using data from population register controls in time of filling in questionnaire using classifications in questionnaire identification persons and addresses coding removing duplicates additional controls imputation

Thank You!

25

