Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005 Stephen Fisher, Jane Holmes, Nicky Best, Sylvia.

Similar presentations


Presentation on theme: "Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005 Stephen Fisher, Jane Holmes, Nicky Best, Sylvia."— Presentation transcript:

1 Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005 Stephen Fisher, Jane Holmes, Nicky Best, Sylvia Richardson Department of Sociology, University of Oxford Department of Epidemiology and Biostatistics Imperial College, London Joint work with Nicky Best and Sylvia Richardson from Imperial And also Stephen Fisher from the Sociology department at Oxford

2 Outline Question of interest Model we will use Analysis
First I will tell you about the data we are interested in and what we want to find out I will then describe the model we use, and show how it is motivated by the data Then I will give the results of the various analyses we carried out

3 Ecological regression
Target analysis Individual exposure Aggregate exposure Individual outcome yij xij Zi, Xi Aggregate exposure Yi Aggregate outcome Ecological regression Zi, Xi Aggregate outcome Individual exposure Aggregate exposure Individual outcome yij xij Yi Hierarchical Related Regression (HRR) Zi, Xi Ecological studies analyse data defined at a group level but aim to make inferences about the individuals within the groups The aim of an analysis is usually to make inference at the individual-level so ideally we would use individual data in an individual analysis The groups could be areas, so here i denotes the area, j the subject within the area, y is the outcome, e.g. whether the subject within a constituency votes Labour or not, and x is the individual’s exposure, such as whether or not they are Muslim And we might also have contextual effects, z and xbar But we often only have aggregate data, for both the exposure and outcome, so we might only know the number of people who vote Labour within a constituency and the proportion of Muslims in that area, but not know how the individuals themselves voted. I’m going to show you how we can fit a better aggregate model that can be interpreted at the individual level, and how to combine both sources of data to get a better analysis if we have both sources of data. I’m going to motivate the model by an application, which I will now describe

4 A decline in ethnic minority support for Labour?
From 1974 to 2001 around 80% of ethnic minorities vote Labour Between 2001 and 2005 there were Islamic terrorist attacks US and UK led invasions of Afghanistan and Iraq Heightened security and suspicion of non-whites Unlawful detention of foreign terror suspects Convictions of British soldiers for Iraqi prisoner abuse These and other events are thought to have undermined support for Labour among ethnic minorities. On the other hand, harsh stance on immigration in Conservative 2005 election campaign may have alienated ethnic voters We are interested in ethnic minority support for Labour, in particular whether it has declined. The reason for this is because historically ethnic minorities have been Labour voters. Up to 2001 about 80% of ethnic minorities voted Labour. But in between 2001 and 2005 there were READ FROM SLIDE TO END except last point On the other hand, the harsh stance on immigration taken by the Conservatives in their election campaign in 2005 may have alienated ethnic votes

5 A decline in Muslim support for Labour?
Initially We found that the gap in Labour vote between whites and non-whites narrowed between 2001 and 2005. Results presented at PSA 2009 Audience opinion was interesting, but really wanted to know whether the same was true of Muslims So We tested whether the gap in Labour vote between Muslims and non-Muslims narrowed between 2001 and 2005. To start with we looked at non-whites compared with whites. We found … READ FROM SLIDE PSA = political studies association

6 Individual-level model
British Election Study post-election survey (BES) Cross-sectional survey carried out after every general election For subject j in constituency i, yij = voted Labour (1) / didn’t vote Labour (0) xij = Muslim (1) / non-Muslim (0) But 1,898 subjects with validated data, only 20 Muslims Area-level random effect Probability subject j votes Labour Log odds ratio of Muslim voting Labour compared with non-Muslim After every general election, a cross-sectional survey is carried out – the British Election Study post-election survey (BES). And it is supposed to be representative of the UK electorate So this gives us individual-level data and an individual-level model. So for subject j, Yij is 1 if the subject voted Labour, 0 o/w X is 1 if they are Muslim Then the probability that the person votes Labour is given by a Bernoulli distribution with p_ij, and we model p_ij with a logistic link function, so logit p_ij is a linear function of an area level random effect mu_i and an effect of being Muslim beta. This is a standard multi-level logistic regression model and accounts for the correlation in the outcome among individuals who live in the same area, and quantifies unexplained between area variability in the risk of outcome. Although the study has around 3,000 respondents, some of these can’t be validated, for instance they don’t appear to be registered or they might have a wrong address. But out of nearly 2,000 with validated data, there are only 20 Muslims. We can’t do a very good analysis with only 20 Muslims.

7 Aggregate data However, we have data at the aggregate level for entire population 2001 Census data on % who are Muslim Number of people who vote Labour in each constituency from General election results Data viewed as a 2x2 table. For constituency i: yi = number who vote Labour ni = number who are eligible to vote xi = number who are Muslim Vote Labour Don’t vote Labour Non-Muslim ? 1- xi Muslim xi yi yi - ni ni However, we are talking about a general election, so we know the number of people who vote Labour in every constituency from the general election results And we have the proportion who are Muslim from the UK Census in 2001 And we can view this data as a 2x2 table. For constituency i, our outcome, y_i, is the number who vote labour and our exposure is x_i, the number who are Muslim. The total electorate is n_i – that’s not the whole population, only those who are eligible to vote We want to fill in the question marks, either with numbers, proportions, or probabilities

8 Ecological bias Standard analysis of this data will probably lead to biased results Bias in ecological studies can be caused by: Confounding Confounders can be area-level (between-area) or individual-level (within-area)  include control variables and/or random effects in model Non-linear covariate-outcome relationship, combined with within-area variability of covariate No bias if covariate is constant in area (contextual effect) Bias increases as within-area variability increases … unless models are refined to account for this hidden variability

9 Improving ecological inference
Alleviate bias associated with within-area covariate variability Data at area-level, for constituency i: Area-level outcome yi = number of people who vote Labour Area-level predictor = proportion who are Muslim Then yi ~ Binomial(ni , pi ) where the area-level probability pi is calculated by integrating individual-level probabilities given by individual-level model with respect to the within-area joint distribution fi(x) of all individual-level predictors pi =  pij(x) fi(x) dx pi is average group-level probability (of voting Labour) pij(x) is individual-level probability given covariates x fi(x) is distribution of covariate x within area i So we want a model that alleviates the bias associated with the within-area covariate variability So our data at the area level consist of an area-level outcome – the number who vote Labour, and an area-level predictor – the proportion who are Muslim Then we can model this with a Binomial distribution with area level probability p_i To get the right aggregate probability we need to integrate the individual-level probabilities wrt the within-area joint distribution of the individual-level predictor, so that’s this integral here So p_i is the average group-level probability (of voting Labour in our case)

10 The model for a single binary covariate
Consider a single binary covariate x, e.g. Muslim/non-Muslim fi(x) is the proportion of individuals with x = 1 in each area, i.e. the proportion Muslim in each constituency Individual-level model pij = g(i + xij), where g() = e/(1+e) pij = g(i) if person j is non-Muslim pij = g(i + ) if person j is Muslim Integrated group-level model = proportion Muslim in constituency i (mean of xij) pi = average probability (proportion) of voting Labour in area i To be a bit more explicit, consider a single binary covariate – in our case whether the subject is Muslim or not Then f(x) is the proportion of individuals with x=1 in each area, i.e. the proportion Muslim in each constituency Remember from before for our individual-level model we are assuming a logistic link function, so the individual probability of voting Labour p_ij is a function of the area level random effect and the effect of being Muslim For the integrated group-level model, we need the average probability of voting Labour in constitiency i, and because the covariate is binary, the integral becomes a sum over the 2 values that x can take – either Muslim or not So p_i = (prob. a non-Muslim votes Labour) * (prob. of being non-Muslim) + (prob. a Muslim votes Labour) * (prob. of being Muslim) So we have an area-level outcome which is Binomial with probability of voting Labour given by the p_i given here Prob. of being Muslim Prob. non-Muslim votes Labour Prob. of being non-Muslim Prob. Muslim votes Labour

11 Hierarchical Related Regression
The parameters of the aggregate model have been derived from an underlying individual-level model So the exposure-outcome relationship is assumed to be the same in both the aggregate data and the individual-level data This means that the individual and aggregate data can be used simultaneously to make inference on the underlying individual-level model. The likelihood for the combined data is simply the product of the likelihoods of each set of data This combined model is termed a hierarchical related regression (HRR). (Jackson, Best and Richardson, 2006) The parameters of the aggregate model have been derived from an underlying individual-level model So the exposure-outcome relationship is assumed to be the same in both the aggregate data and the individual-level data This means that the individual and aggregate data can be used simultaneously to make inference on the underlying individual-level model Remember that there weren’t enough Muslims in our individual data to carry out a good analysis, but we can combine this individual data with our constituency level data to improve the aggregate analysis The likelihood for the combined data is simply the product of the likelihoods of each set of data This combined model is termed a hierarchical related regression So we can make use of all of our data with an HRR model and do a better analysis than we would with either dataset by itself

12 Individual-level data
Recap Question of interest How do Muslims vote? And did they change their voting behaviour between the 2001 and 2005 general elections? i denotes constituency, j denotes subject within a constituency Individual-level data Aggregate data Outcome yij = 1 if subject j votes Labour 0 if don’t vote Labour yi = number who vote Labour ni = electorate Explanatory variable xij = 1 if subject j is Muslim 0 if subject j is not Muslim = proportion who are Muslim Just to recap The question of interest is How do Muslims vote? And did they change their voting behaviour between the 2001 and 2005 general elections?

13 Proportion of electorate who voted Labour in 2001 and 2005, by constituency
This is our data Lines are lowess smooths We actually only looked at England and Wales because we want finally to compare with the election in 2005, and the constituency boundaries changed in Scotland between the 2 elections, but not in England and Wales. And actually there aren’t many Muslims living in Scotland so we’re not losing that much information by doing this. There are 569 constituencies in England and Wales Analysis is actually quite challenging because of the very low number of Muslims in the individual data and the fact that at the area level, the proportions of Muslims in each constituency is low 93% of constituencies have less than 10% Muslims living there

14 Analyses To start, various models are fit to the 2001 general election only Simple model with only an individual Muslim effect To start with, we fitted various models to the 2001 general election only First model was a simple model with only an individual Muslim effect Looking at the data it doesn’t look like this will be a very good model, but it is a good idea to start simple

15 Analyses To start, various models are fit to the 2001 general election only Simple model with only an individual Muslim effect Add a contextual effect of Muslim as well as an individual effect Then we added a contextual effect of Muslim as well as an indivvdual effect This model is not identifiable with only aggregate data

16 Analyses To start, various models are fit to the 2001 general election only Simple model with only an individual Muslim effect Add a contextual effect of Muslim as well as an individual effect Add an interaction term Then we added an interaction term

17 Analyses To start, various models are fit to the 2001 general election only Simple model with only an individual Muslim effect Add a contextual effect of Muslim as well as an individual effect Add an interaction term Include socio-economic status as a confounder Partly motivated by the apparent interaction Socio-economic status coded as manual/non-manual We want to see if social class can explain the interaction term. Socio-economic status is coded roughly as manual/non-manual Here we are adding socio-economic status as an individual effect, and adding another individual-level covariate (i.e. not contextual) complicates things

18 More than one individual-level binary covariate
For the integrated group-level model, when we have more than one binary covariate we need to know the cross-classification of individuals between covariate categories within each area, e.g. number of Muslims who have a manual job Then average probability of voting Labour in area i, Estimate p(xij, zij) by proportion in area i with covariates xij, zij Census does not contain these cross-classifications Estimate by product of the 2 marginals, Lasserre et al For the integrated group-level model, when we have more than one binary covariate we need to know the cross-classification of individuals between covariate categories within each area, e.g. our 2nd covariate we’ve defined as having a manual or non-manual job, so we need to know for each constituency, the number of Muslims who have a manual job, the number of Muslims who have a non-manual job, and the same for non-Muslims Then the average probability of voting Labour in each area is a sum over all covariate combinations. So Prob a Muslim who has a manual job votes Labour * prob of being a Muslim with a manual job, etc. We can estimate the prob. distr. of covariates p(x,z), by the proportion in each area with covariates x, and z However, Census doesn’t contain these cross-classifications But Lasserre et al demonstrated that, in a typical case with 2 covariates, bias is negligible when the joint covariate distribution is estimated by the product of the 2 marginal distributions, even when the covariates are correlated So i.e. we can estimate the probability of being a Muslim with a manual job, by the prob of being Muslim * the prob. Of having a manual job These of course are in the Census and so we are ok

19 Odds ratio of voting Labour for Muslims = 9.45 (3.20, 19.81)
So this is a plot of the predictions of voting Labour by proportion Muslim in a constituency for all the models The odds ratios of voting Labour for Muslims compared to non-Muslims is 9.5, so Muslims are more likely to vote Labour than non-Muslims

20 Comparison of voting behaviour in 2001 and 2005
What we are really interested in is whether Muslims changed their voting behaviour between the 2001 and 2005 general elections Individual model for 2001 election Individual model for 2005 election But what we are really interested in is whether Muslims changed their voting behaviour between the 2001 and 2005 general elections These are the individual level models fitted for 2001 and 2005, but we fitted the models with the aggregate data as well. So both elections share a common effect for the contextual effect of Muslim and individual socio-economic status, but have different effects for the individual effect of being Muslim and the random effects We are interested in the difference between the beta01 and beta05 parameters

21 Results – odds ratios Individual Muslim effect, 2001
8.32 (3.99, 16.47) Individual Muslim effect, 2005 3.55 (1.48, 6.73) Difference in individual Muslim effect 2.51 (1.18, 4.61) Socio-economic status 0.52 (0.45, 0.59) Here we can see that in both elections Muslims are more likely to vote Labour than non-Muslims – these are the beta01 and beta05 parameters And that in 2005 Muslims were less likely to vote Labour than in 2001, with an odds ratio of 2.5 This is the difference in the beta01 and beta05 parameters Also, people with a non-manual job are less likely to vote Labour than those with a manual job

22 Conclusions Muslims are more likely to vote Labour than non-Muslims
Muslims did significantly change their voting behaviour between 2001 and 2005 In 2005 they were less likely to support Labour than in 2001 We need to find and include more individual Muslim data in our analysis Jackson, C. H, Best, N. G. and Richardson, S. (2006). Improving ecological inference using individual-level data. Statist. Med., 25, Lasserre, V., Guihenneuc-Jouyaux, C. and Richardson, S. (2000). Biases in ecological studies: utility of including with-area distribution of confounders. Statist Med., 19, 45-59 So to conclude, Muslims are more likely to vote Labour than non-Muslims Muslims did significantly change their voting behaviour between 2001 and 2005 – in 2005 they were less likely to support Labour than in 2001 There aren’t that many Muslims in our individual-level data, we would like to find and include more Muslims in our analysis


Download ppt "Combining individual and aggregate data to improve estimates of ethnic voting in Britain in 2001 and 2005 Stephen Fisher, Jane Holmes, Nicky Best, Sylvia."

Similar presentations


Ads by Google