Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton.

Slides:



Advertisements
Similar presentations
1 STESWP 2006 China – Current status of knowledge of STS in China.
Advertisements

Welsh Health Survey Anne Kingdon Welsh Assembly Government Health Promotion Division.
Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
Adding geographical detail to social surveys: Estimating local disability prevalence Alan Marshall ESDS Government 15 th April 2010.
The Census Area Statistics Myles Gould Understanding area-level inequality & change.
DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Construction output in Scotland Stephen Curtis, Office for National Statistics.
Canadian Centre on Substance Abuse Heather Clark, April 21 st, 2008 A Coordinated Approach to Student Drug Use Surveys.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair),
MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng,
EGM – Population & Housing Censuses Eurostat / UNECE - Geneva - 24/25 May 2012 Beyond 2011 The future of population statistics (England & Wales) Alistair.
S tadsmonitor A msterdam. Contents What does S tadsmonitor A msterdam bring about? Why S tadsmonitor A msterdam? How does S tadsmonitor A msterdam operate?
Neighborhood Walkability and Bikeability Andrew Rundle, Dr.P.H. Associate Professor of Epidemiology Mailman School of Public Health Columbia University.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Aggregate data Also called summary data, tabular data Counts of things for places (e.g. counties) or entities Examples: –census volumes –HSUS –ICPSR files.
Fig. 2 – Test results Personal Memory Assistant Facial Recognition System The facial identification system is divided into the following two components:
國立雲林科技大學 National Yunlin University of Science and Technology 11 Discovering Personal Gazetteers: An Interactive clustering Approach Changqing Zhou, Dan.
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Dynamics of diversity: evidence for West Yorkshire from the 2011 Census Dr Stephen Jivraj & Dr Nissa Finney Centre on Dynamics of Ethnicity, University.
Ubiquitous Advertising: the Killer Application for the 21st Century Author: John Krumm Presenter: Anh P. Nguyen
Household projections for Scotland Hugh Mackenzie April 2014.
Developments in Population Statistics Sarah Crofts Pete Large April 2013.
Tree-Based Density Clustering using Graphics Processors
111 American Community Survey Fundamentals 2009 Population Association of America ACS Workshop April 29, 2009.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Estimating Personal Transfers Planning a Household survey in the UK Stuart Brown Office for National Statistics, UK June 2009.
Scotland’s Labour Market – Latest Developments Denise Patrick Lifelong Learning Analytical Services Division 25 th May 2010.
Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University.
Addressed Based Sampling as an Alternative to Traditional Sampling Approaches: An Exploration May 6, 2013.
Population Movements from Anonymous Mobile Signaling Data An Alternative or Complement to Large- Scale Episodic Travel Surveys?
C2ER 52 nd Annual Conference & LMI Training Institute Annual Forum Regional Socioeconomic Statistics Update on U.S. Census Bureau Programs June 8, 2012.
Presented by: Marcela D. Rodríguez CICESE/UABC, Ensenada, México 1st International Workshop on Ubiquitous Mobile Instrumentation.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
2 2. Towards a Pan- European Monitoring System on THB “Pan-EU THB MoSy” Project submitted to the targeted call for proposals: “Prevention And Fight Against.
Update on the American Community Survey (ACS) and Geographic Products 2012 PA SDC Data User Conference September 20,2012 Noemi Mendez Eliasen Geographer.
Use of Aerial Videography in Habitat Survey and Computers as Observers Leonard Pearlstine University of Florida.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
A STEP Expected Yield of Planets … Survey strategy The CoRoTlux Code Understanding transit survey results Fressin, Guillot, Morello, Pont.
Geographic profiling applied to testing models of bumble- bee foraging by Nigel E Raine, D. Kim Rossmo, and Steven C Le Comber Interface Volume 6(32):
Page 1 CSISS Center for Spatial Information Science and Systems CWIC Metrics: Current and Future Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun Kang Center.
UNSD/STATISTICS KOREA International Seminar on Population and Housing Censuses: Beyond the 2010 Round Seoul, November 2012 Beyond 2011: The future.
Providing User Context for Mobile and Social Networking Applications A. C. Santos et al., Pervasive and Mobile Computing, vol. 6, no. 1, pp , 2010.
Presented at The 129th Annual Meeting of the American Public Health Association Atlanta, GA, October 21–25, 2001 Presented by Amanda A. Honeycutt Linda.
Integrating Geographic Information Systems (GIS) into your Curriculum Teaching American History Meg Merrick & Heather Kaplinger Year 2 GIS Inservices.
Can We Trust Data Users to Consider Data Quality? Presented at the 2008 European Conference on Quality in Official Statistics.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Using mobile network big data for land use classification Kaushalya Madhawa, Sriganesh Lokanathan, Danaja Maldeniya, Rohan Samarajiva CPRsouth 2015 Taipei.
Campaigns and Data Analytics Why it is important to know YOU.
UN ECE Seminar on New Frontiers for Statistical Data Collection 31 Oct – 2 Nov 2012 Beyond 2011 The future of population statistics Andy Teague, Office.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Nico Heerschap, Luxembourg, 2015 Mobile positioning and other ‘big’ data for tourism statistics Experience Statistics Netherlands.
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Travel in the Twenty-First Century: Peak Car and beyond David Metz Centre for Transport Studies University College London.
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University.
Free for All! Assessing User Data Exposure to Advertising Libraries on Android Campbell Foskin.
Computing and Data Analysis
Trends in my profession, Information Technology
Misogyny on Twitter By: Li Tong.
Open Source Social Media: #Conflict
Twitter as a novel source of mobility indicators
Working Group European Statistical Data Support 21 April 2016
European Examples of the Use of Big Data for Producing Statistics
Pete Benton , Beyond 2011 Programme Director
Presentation transcript:

Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Background ONS Big Data Project: This is one of four pilots exploring the use of big data for official statistics Users tweeting from a smartphone have an option to provide a GPS location 300,000-plus such tweets sent daily within GB Data is relatively accessible Can these data be used to infer residence and mobility patterns?

Age Distribution of UK Twitter Users

Data Acquisition Target data: All geolocated tweets sent within Great Britain between (1 April 2014 to 31 October 2014) Combination of Twitter API and procured data (GNIP) 81.4 million tweets Stored as JSON files in MongoDB

Distribution of user activity

Distribution of persistence levels User frequency count Users with geolocated tweets on just one day not shown

Geo-located Twitter volumes by Device Type Great Britain, 15 August to 31 October 2014

Lots of activity in different places but where does this person* live? * This example is based a real data but has been altered to prevent identification

DBSCAN DBSCAN (Density Based Spatial Clustering Algorithm with Noise) i = distance (radius) minpts = minimum points to define a cluster Developed by Ester et al (1996)

Raw Data Cluster Centroid Noise Cluster_idNorthingEastingCountType 60033_1105?31530?0228 Residential 60022_2104?41530?944 Residential 60033_6182?46532?1013 Commercial 60033_13104?56531?173 Commercial 60033_15179?30533?953 Commercial 60033_21165?47532?513 Commercial Most likely lives here: “Dominant Residential Cluster”

Time of day profile by address type

Geolocated penetration rates* by local authority * Dominant residential cluster with date range of at least one month

Student mobility

Conclusions Twitter may be useful for identifying short-term mobility patterns DBSCAN can identify anchor points and AddressBase can classify them Results are indicators NOT estimates - may be possible to produce new de-facto based population statistics Twitter could help inform public policy but we need to be extremely alert to source changes.

Next Steps Technical Report to be published shortly Developing methods for inferring socio- demographic characteristics Development of an estimation framework (including a benchmarking survey)