Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov eugen.bolshov@gmail.com Marina.

Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov Marina Shpiker

Theoretical approaches to Fraud detection

Interviewers’ frauds in face-to-face surveys: Ukrainian context
Decreasing response rate in face-to-face surveys 50% all surveys in Ukraine are done via face-to-face (Paniotto, Kharchenko 2011) due to low home phone penetration Ukrainian research companies (especially field staff) are overloaded with work due to political context (majoritarian elections in 225 districts, a lot of candidates, strong need for polls); Interviewers’ lack of professional ethics; As a result, risk of falsifications increases.

Weakness of traditional method of field control
Usual method of fraud detection and quality control consists of selective re-interviewing based on random sample of interviewers; This method helps to prove the fact that interview was conducted, but ineffective if only part of questions were asked; Chances to detect all falsifiers are relatively low.

Alternative approach There is a scope of theoretical research including experimental data analysis, which contain a number of methods to detect frauds. Methods are based on assumption that forgers are unable to reproduce patterns and certain characteristics of true data. Analysis of collected data results in list of “at risk” interviewers who need to be checked by re-interviewing.

Benefits of alternative methods of fraud detection
Increase the share of detected frauds May be used separately or all together Easily implemented and applied May be partially programmed (automated)

Resume of alternative methods of fraud detection (1)
Hypothesis Analysis of distributions Forgers don’t know a real distribution of certain variables in population, so their results are significantly different from that of honest interviewers. Analysis of variation Forgers avoid extreme alternatives and tend to choose moderate answers. As a result, they produce lower variation. Logical control False interviews contain more logical mistakes and contradictive answers than real. Analysis of path Forgers tend to skip parts of questionnaire with the help of filters. Alternative hypothesis: falsifiers may skip less questions than honest interviewers. Anyway, the number of chosen filters tends to be different for real and false interviews.

Resume of alternative methods of fraud detection (2)
Hypothesis Analysis of “other” and “difficult to say” alternatives Forgers use “difficult to say”/ “don’t know” / “no answer” alternatives less often. They also go simple and avoid “other” answers which suggest additional writing. However, we witnessed a case when falsifier filled more open questions than expected. Analysis of questions with multiple alternatives Forgers choose significantly more or less alternatives in multiple questions than honest interviewers. Benford’s law Leading digits of fiscal and some other numbers are distributed according to Benford’s law. Leading digits in falsified number sequences are distributed differently. Analysis of patterns and relations Forgers reproduce relations between variables, which exist in real data, worse.

Practical implementation: case of Ukrainian Electoral study

Electoral study: description of a case
“The opinions and views of residents of Kievsky region” – electoral study in anticipation of Supreme Rada elections. Time: April – May 2012. Sample: 350 interviews on the average in each of 9 electoral districts, interviews in total. 24 interviewers engaged. Maximum number of interviews per interviewer: 647 (153 on average). Contained the following sets of questions: Evaluation of a governor’s and local authorities’ performance; Political knowledge and attitudes on national and local level; Mass-media; Attitudes to national reforms and programs of local authorities; Demographic questions.

Applied methods Analysis of distributions Analysis of variation
Logical control Analysis of path Analysis of “other” and “difficult to say” alternatives Analysis of questions with multiple alternatives

Analysis of distributions
We explored the distributions of all questions split by interviewers looking for impossible or just strange answers. Procedure is time consuming, but seems to be the most effective. Extremely unusual distributions, which are far from reality, are the most obvious evidence of fraud.

Analysis of distributions: example
“If the presidential elections are held the nearest Sunday, who will you vote for?” “Normal” interviewers (typical answers) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Number of interviews 101 78 53 42 Julia Tymoshenko 21% 17% 32% 50% Viktor Yanukovitch 6% 10% 13% 5% Vitaliy Klychko 14% 8% Arseniy Jatsenyuk 4% 7% …others… Against all candidates 0% 1% 15% 12% Do not vote 9% Difficult to say 28% 19% No answer 2%

Logical control We detected the logically contradictive sequences of answers (for example, a person is going to vote for a candidate whose name he / she does not know). The number of mistakes per interview was counted. Average number of mistakes per interview was then computed for every interviewer. High amount of mistakes may indicate that falsifier answered the questions mechanically without thinking.

Logical control: example
“Normal” interviewers (typical means) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Mean of all logical mistakes per interview 0,32 0,36 2,25 2,18 Mean of critical logical mistakes per interview 0,06 0,03 0,78 0,88

Analysis of “other” and “difficult to say” alternatives
2 variables were made for each interview: number of chosen “do not know / difficult to say / no answer” alternatives and number of chosen “other” alternatives (open questions). Average of these 2 variables for each interviewer was computed. “At risk” interviewers tend to avoid the alternatives of this type because 1) they underestimate “undecided”, 2) they do not want to invent detailed responses.

Analysis of “other” and “difficult to say” alternatives: example
“Normal” interviewers (typical means) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Mean number of “difficult to say” alternatives per interview 11,54 12,97 2,68 4,94 Mean number of “other” alternatives per interview 0,35 0,23 0,03

Analysis of questions with multiple alternatives
For each multiple response question a single variable was calculated. This variable indicates the number of alternatives chosen. Average of these variables for each interviewer was computed. It is difficult for a falsifier to guess the correct average of chosen alternatives in multiple questions. In our case, they tend to estimate improperly the respondents’ knowledge of national and local politicians. And yet, some “at risk” interviewers guessed correctly.

Analysis of questions with multiple alternatives: example
“Normal” interviewers (typical means) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Mean number of national politicians known per interview 10,90 11,35 8,08 8,25 Mean number of local politicians known per interview 5,26 5,95 4,83 4,70

Analysis of path We computed a variable which indicated a number of chosen alternatives that lead to the skip of the following question(s). Average for each interviewer was calculated. Though common hypothesis states that forgers tend to skip questions, our case shows an opposite tendency. “At risk” interviewers skip less.

Analysis of path: example
“Normal” interviewers (typical means) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Mean number of chosen alternatives-”filters” per interview 4,19 4,53 3,0 2,25

Analysis of variation Standard deviation of a chosen set of questions was calculated for every interviewer. Lower standard deviation is very typical for “at risk” interviewers. “Normal” interviewers (typical means) “At risk” interviewers Interviewer A Interviewer B Interviewer C Interviewer D Standard deviation 2,03 2,11 1,5 1,55

Results of re-interviewing
According to the results of foregoing analysis, “at risk” interviewers in every district were listed. The list was compared with the results of traditional field control (random re-interviewing). Analysis of collected data Re-interviewing 7 suspicious interviewers, 5 of them – highly suspicious. 6 of suspicious interviewers produced at least one “black” (unconfirmed) interview. 1 was not checked.

Conclusion Described methods of detecting falsified interviews are a promising way to improve the effectiveness of field control. Though no one demonstrated all suspicious patterns at once, possible falsifiers usually have several atypical characteristics. Therefore, proposed methods work better in conjunction. In case of CAPI, when the data are collected on server gradually during field stage, falsifications may be detected earlier.

Thank you for attention!

Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov eugen.bolshov@gmail.com Marina.

Similar presentations

Presentation on theme: "Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov eugen.bolshov@gmail.com Marina."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov eugen.bolshov@gmail.com Marina.

Similar presentations

Presentation on theme: "Statistical methods for detection of falsified interview in surveys: experience from election polls in Ukraine Eugen Bolshov eugen.bolshov@gmail.com Marina."— Presentation transcript:

Similar presentations

About project

Feedback