Numbers, places, decisions Connecting statistical and geographic data Bill Roberts, Swirrl @billroberts
Objective: ‘analysis ready data’ George Percivall mentioned this morning the theme from the September workshop – ‘analysis ready data’ This is very much the objective here, where the scope is ‘all available statistical data about the places I’m interested in’
We’re definitely talking about geodata We need it to be linked. Is it big? In the volume/velocity/variety dimensions, it is ’biggest’ in the sense of variety and that’s what makes it challenging. Naturally volume and velocity are increasing. A few hundred billion statistical observations Over 100s of government organisations Getting bigger – finer grained in space and time Big in terms of organisational effort – must be distributed
Many organisations providing data – finding the data you want – dealing with inconsistency - Joining the data together
Different audiences want different things
As well as different ‘technical levels’ for how people want to use it, there are all kinds of niches in terms of the topics that user communities are interested in. There is a role for an intermediary to gather and present data to communities like this.
We’re generally talking about multidimensional statistical data – data cubes – and of these the geospatial dimension is usually the richest and most important. Data points are typically referred to an area or point – so there is one spatial dimension, not 3. There is a lot of ‘structure’ in the spatial dimensions.
M1 1EZ 53.481751, -2.234625 Manchester 384526, 398362 Near the station Use standard spatial techniques and indexing techniques to add some intelligence to this Near the station Dale Street
Edinburgh, City of City of Edinburgh Edinburgh 230 S12000036 And of course the data is messy – lots of ways to refer to the same thing.
Office for National Statistics Dataset catalogue National Health Service Dataset downloads Developers Correlation tool Department of Environment Analysts/Researchers Poverty map Department for Work and Pensions Data journalism Fact-finders Sheep database Local Government So –what’s the answer. Decouple presentation method from the data Distributed, consistent, standards-based publishing – on the web Not just one data portal Not just one way to access the data JSON API etc SPARQL query Etc
“The future is already here — it's just not very evenly distributed” William Gibson
Standards and governance Knowledge and skills Tools What’s the solution? Standards and governance Knowledge and skills Tools Culture and mindset Think ‘supply chain’ This is mostly not a technological issue Standards for: metadata, data model, vocabularies, codelists Setting and disseminating what the standards are – allowing for evolution and special cases Knowledge about the web, skills to use web-based data Tools for data creation and data consumption – validation etc Thinking about data as something valuable in itself – that producing good data is part of your job Supply chain – at the moment most of this data is not ‘analysis ready’. Every analyst has to get their own data ready. It’s a cottage industry – if you’re a furniture maker, you have to go out to the forest and chop down your own tree. Want to industrialise this process and allow people to specialise in different parts of it.