Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Geography, University of Leeds

Similar presentations


Presentation on theme: "School of Geography, University of Leeds"— Presentation transcript:

1 School of Geography, University of Leeds
The effects of small cell adjustment on origin-destination data in the 2001 census Oliver Duke-Williams School of Geography, University of Leeds Additional contributions to be made by Eileen Howes of the GLA

2 Effects of SCAM On the SMS On the SWS / STS

3 Questions Does SCAM affect the o-d data?
Does it affect it in a different way to other outputs from the 2001 Census? What can users do? Are there any better approaches?

4 Assumptions about SCAM
Cells with initial values of 1 or 2 are adjusted A 1 may become 0 or 3, with 0 being more likely A 2 may become 0 or 3, with 3 being more likely We can’t distinguish between ‘genuine’ 0s or 3s, and cells that have been adjusted Cells with initial values of 4 or more remain the same The actual method is not published, therefore all work has to be based on assumptions about the method employed

5 SMS Level 3 Output area to output area flows 1,799,061 flows
Each flow is presented as a single age-by-sex table 1.8million non-zero flows; the total number of potential flows is all OAs to all OAs. This is a vast number: ^2 = 49 billion. 1.8 million therefore indicate very sparse data – about 0.004% of the matrix of possible flows.

6 Table MG301 The table framework for MG301 is very simple

7 SMS Level 3 Average flow total is 3.58 persons
Distribution of flow totals reveals obvious effects of SCAM Average flow of 3.58 across the 1.8million non-zero flows. This average is for total migrants e.g. cell 1 in the framework

8 Distribution of flow sizes, MG301
Note log scale on y axis Large number of small flows; small number of large flows

9 Distribution of flow sizes, MG301
This expands part of the previous graph; focuses on flows of up to 50 migrants The mode is at 3, but there are also pronounced 6, 9 and (just visible) 12 – i.e. multiples of 3 Note that other values (1,2…) also occur. MG301 is not modified in Scotland, therefore these values remain.

10 SMS Level 3 Average flow total is 3.58 persons
Distribution of flow totals reveals obvious effects of SCAM But… SCAM only affects interior cells Previous slides have mentioned totals; but note that SCAM affects the interior cells only (i.e. those base cells within the framework from which totals and subtotals are calculated)

11 Table MG301 Highlights interior cells

12 SMS Level 3 Average flow total is 3.58 persons
Distribution of flow totals reveals obvious effects of SCAM But… SCAM only affects interior cells Average of interior cells is 0.60 persons The average of these cells is v low – they would seem to be susceptible to SCAM

13 Is this important? The averages are affected by large numbers of small flows – are most migrants in large flows? Number of flows for which all interior cells are equal to 4 or more is…. 2 Is this really a problem. Recall from earlier slide that there are a small number of large flows. What proportion of migrants are in such flows? Pretty much all migrants are in fairly small flows: 99% are in flows of 16 or fewer migrants How many migrants are in flows for which all interior cells are above the assumed SCAM threshold? Turns out to be only 2 flows!

14 So, how many are slightly SCAMMED?
Number of unmodified interior cells Number of flows Number of migrants 1,761,815 5,852,646 1 31,973 218,929 2 4,626 78,013 3 457 8,428 4 182 6,435 5 6 388 157 1.76mill in flows in which 0 interior cells have the value 4 or higher… i.e. all cells are potentially modified

15 SMS Level 2 Ward to ward flows 1,275,067 flows
Each flow is presented in 5 disaggregate tables Age by sex Moving groups Ethnic group by sex Moving groups by NS-SEC of group reference person Moving groups by tenure There are fewer ward to ward flows, but this is in a smaller matrix (10,608^2). The sparsity is about 1.1% (i.e. 1.1% of the matrix is filled)

16 SMS Level 2 Using the two tables of ‘migrants’, the average flow is:
4.91, according to table MG201 4.86, according to table MG203 Distribution also varies

17 MG201 Note that clustering around multiples of 3 is less pronounced for MG203, which builds the total from fewer internal cells MG203

18 SMS Level 2 The number of migrants can also be determined by summing across moving group tables MG202 allows the total to be constructed with the fewest components

19 Table MG202 Average flow is 4.62
Distribution of values is less clustered Can generate total migrants from cells 3 and 4 The average flow (i.e. total migrants) gets slightly lower as the number of component cells is reduced – is this coincidence or a pattern?

20 MG201 MG203 Top two graphs as as before; note that clustering is further reduced in the bottom graph. MG202

21 SMS Level 1 ‘District’ to ‘district’ flows 133,490 flows
Flow total averages are all similar MG101 – average 46.36 MG102 – average 46.38 MG103 – average 46.38 MG104 – average 46.37 However, distribution of flows is still clustered

22 Table MG101, total flows Note that this graph includes a large number of observations of 0. This is because it is based across a set of flows for which any of the tables MG101-MG110 are present In many cases MG101 (or any given table) is not present. Note dominance of multiples of 3

23 SMS Level 1 Definition of total migrants with fewest components is using table MG106 Cells

24 Effects on the SMS Distribution of flow totals are clustered
Does this matter if the data are spatially aggregated?

25 Level 3 data aggregated to Level 1
The effects of adding together lower level data. The top graph is Level 3 (OA) data aggreated to LA level. The effects of SCAM do not ‘even out’ the line. Level 1 data

26 Effects of SCAM Problems persist through aggregation of small units to large areas

27 Effects on SWS Similar problems arise with the SWS
Pattern of clustering is different

28 SWS Level 3 Output area to output area 5,951,376 flows
Each flow is presented in one table (method of transport to work)

29 Table W301 Note extreme dominance of flows of 3

30 SWS Level 3 85% of flows are shown as 3
This is likely to exacerbate aggregation problems i.e. 85% of all OA-OA flows have a total of 3.

31 SWS Level 2 Ward to ward flows 2,108,999 flows
Each flow is presented in 6 tables Average total flow is around 11.5 35% of flows have a total of 3

32 SWS Level 2 All 6 tables have same definitions
This suggests a possible solution Use average of all 6 totals, rounding up to nearest integer

33 SWS Level 2, average flow This is the distribution of average flows calculated as ceil(average of all separate totals) (ceil = round up to next integer) It certainly looks more like what we would expect, but is it useful?

34 Rounded average The distribution of values looks ‘better’ to users of data However, the average is in appropriate to many tables: Too high or too low to generate appropriate rates Is such a value useful?

35 Alternatives to small cell adjustment
Provision of independent total Provision of asymmetric flows Don’t try to provide OA level data? 1 – Get ONS et al. to provide a non modified (or modified only on the basis of itself, it the former is not possible) count solely of the total flow 2 – Provide assymetric flows e.g. ward to OA. This was done in the past and was unpopular, but I believe software is more suited to such data now. 3 – If the data are left as is, was it worth it?


Download ppt "School of Geography, University of Leeds"

Similar presentations


Ads by Google