Download presentation
Presentation is loading. Please wait.
Published byBranden Stewart Crawford Modified over 8 years ago
1
Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton
2
Background ONS Big Data Project: This is one of four pilots exploring the use of big data for official statistics Users tweeting from a smartphone have an option to provide a GPS location 300,000-plus such tweets sent daily within GB Data is relatively accessible Can these data be used to infer residence and mobility patterns?
3
Age Distribution of UK Twitter Users
4
Data Acquisition Target data: All geolocated tweets sent within Great Britain between (1 April 2014 to 31 October 2014) Combination of Twitter API and procured data (GNIP) 81.4 million tweets Stored as JSON files in MongoDB
5
Distribution of user activity
6
Distribution of persistence levels User frequency count Users with geolocated tweets on just one day not shown
7
Geo-located Twitter volumes by Device Type Great Britain, 15 August to 31 October 2014
8
Lots of activity in different places but where does this person* live? * This example is based a real data but has been altered to prevent identification
9
DBSCAN DBSCAN (Density Based Spatial Clustering Algorithm with Noise) i = distance (radius) minpts = minimum points to define a cluster Developed by Ester et al (1996)
10
Raw Data Cluster Centroid Noise Cluster_idNorthingEastingCountType 60033_1105?31530?0228 Residential 60022_2104?41530?944 Residential 60033_6182?46532?1013 Commercial 60033_13104?56531?173 Commercial 60033_15179?30533?953 Commercial 60033_21165?47532?513 Commercial Most likely lives here: “Dominant Residential Cluster”
11
Time of day profile by address type
12
Geolocated penetration rates* by local authority * Dominant residential cluster with date range of at least one month
13
Student mobility
14
Conclusions Twitter may be useful for identifying short-term mobility patterns DBSCAN can identify anchor points and AddressBase can classify them Results are indicators NOT estimates - may be possible to produce new de-facto based population statistics Twitter could help inform public policy but we need to be extremely alert to source changes.
15
Next Steps Technical Report to be published shortly Developing methods for inferring socio- demographic characteristics Development of an estimation framework (including a benchmarking survey)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.