Estimating Individual Behaviour from Massive Social Data for An Urban Agent-Based Model Nick Malleson & Mark Birkin School of Geography, University ESSA.

Slides:



Advertisements
Similar presentations
The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group ( University.
Advertisements

1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair),
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1 Smart Crime Pattern Analysis Using the Geographical Analysis Machine Ian Turton, Stan Openshaw, James Macgill CCG, University of Leeds
Modelling Crime: A Spatial Microsimulation Approach Charatdao Kongmuang School of Geography University of Leeds Supervisors Dr. Graham Clarke, Dr. Andrew.
Understanding and preventing crime: A new generation of simulation models Nick Malleson and Andy Evans.
VISIT: Virtual Intelligent System for Informing Tourists Kevin Meehan Intelligent Systems Research Centre Supervisors: Dr. Kevin Curran, Dr. Tom Lunney,
Understanding Population Trends and Processes: Links between internal migration, commuting and within household relationships Oliver Duke-Williams School.
CHAPTER 4 ANALYTICS, DECISION SUPPORT, AND ARTIFICIAL INTELLIGENCE
Geographical Information Systems for historical research Achievements and methodologies Dr. Ian Gregory, Associate Director Centre for Data Digitisation.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
GENESIS Web 2.0 Agent City Simulation: Establishing a user community and enabling collaborators to manipulate simulations and develop models Andy Turner.
School of something FACULTY OF OTHER School of Geography FACULTY OF ENVIRONMENT Modelling Individual Consumer Behaviour
1 Adaptive Kalman Filter Based Freeway Travel time Estimation Lianyu Chu CCIT, University of California Berkeley Jun-Seok Oh Western Michigan University.
The Future of GeoComputation Ian Turton Centre for Computational Geography University of Leeds.
Individual and Household Level Estimates Based on 2001 UK Human Population Census Data Andy Turner CSAP Seminar on Microsimulation: Problems and Solutions.
CCG 1 MoSeS Introduction and Progress Report Andy Turner
Presentation of approach and pilot results Mannheim, March 20-22, 2015 You walk, you travel, you use your phone – differently!
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
Secondary Data Analysis Using the Census Stephen Drinkwater WISERD School of Business and Economics Swansea University.
Challenge 2: Spatial Aggregation Level Multi-tier Modeling in Ohio Attempts to Balance Run Time and Forecast Granularity Gregory Giaimo, PE The Ohio Department.
School of something FACULTY OF OTHER School of Geography FACULTY OF ENVIRONMENT GeoCrimeData Understanding Crime Context with Novel Geo-Spatial Data Nick.
Data Mining Techniques
Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.
SPATIAL MICROSIMULATION: A METHOD FOR SMALL AREA LEVEL ESTIMATION Dr Karyn Morrissey Department of Geography and Planning University of Liverpool Research.
Capturing Criminological Spaces with Agent-based Models Andy Evans Nick Malleson Alison Heppenstall Linda See Mark Birkin Centre for Applied Spatial Analysis.
6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Multimedia Databases (MMDB)
Zhiyong Wang In cooperation with Sisi Zlatanova
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Indiana GIS Conference, March 7-8, URBAN GROWTH MODELING USING MULTI-TEMPORAL IMAGES AND CELLULAR AUTOMATA – A CASE STUDY OF INDIANAPOLIS SHARAF.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Modeling Destination Choice in MATSim Andreas Horni IVT ETH Zürich July 2011.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Geodemographic modelling collaboration Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Wang-Chien Lee i Pervasive Data Access ( i PDA) Group Pennsylvania State University Mining Social Network Big Data Intelligent.
Knowledge Discovery from Mobile Phone Communication Activity Data Streams Fergal Walsh Data Stream Research presented in this poster was funded by a Strategic.
Managing Complexity with Multi-scale Travel Forecasting Models Jeremy Raw Office of Planning Federal Highway Administration May 11, 2011.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Introduction to Spatial Microsimulation Dr Kirk Harland.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Siyuan Liu *#, Yunhuai Liu *, Lionel M. Ni *# +, Jianping Fan #, Minglu Li + * Hong Kong University of Science and Technology # Shenzhen Institutes of.
Understanding Sampling
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial International Symposium on Grid Computing Taipei, Taiwan, 7 th March 2010.
Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
GeoSpatial and GeoTemporal Informatics for dynamic and complex systems May Yuan.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Location Choice Modeling for Shopping and Leisure Activities with MATSim: Utility Function Extension and Validation Results A. Horni IVT ETH Zurich.
Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Injecting Data into Simulation: Can Agent-Based Modelling Learn from Microsimulation? Samer Hassan Juan Pav ó n Nigel Gilbert Universidad Complutense de.
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Mobile Robot Laboratory.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
ILUTE A Tour-Based Mode Choice Model Incorporating Inter-Personal Interactions Within the Household Matthew J. Roorda Eric J. Miller UNIVERSITY OF TORONTO.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Implementing Dynamic Data Assimilation in the Social Sciences Andy Evans Centre for Spatial Analysis and Policy With: Jon Ward, Mathematics; Nick Malleson,
BIG Geospatial Data. WHAT IS SPATIAL BIG DATA?  Defined in part by the context, use-case  Data too big, complex for traditional desktop GIS  Often.
Applications in Mobile Technology for Travel Data Collection 2012 Border to Border Transportation Conference South Padre Island, Texas November, 13, 2012.
Summary Presented by : Aishwarya Deep Shukla
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
School of Geography FACULTY OF ENVIRONMENT
Twitter as a novel source of mobility indicators
Geovisual analysis of VGI for understanding people's behaviour in relation to multifaceted context Natalia and Gennady Andrienkoa,b, Siming Chena, Dirk.
Presentation transcript:

Estimating Individual Behaviour from Massive Social Data for An Urban Agent-Based Model Nick Malleson & Mark Birkin School of Geography, University ESSA 2012

Outline Research aim: develop a model of urban-dynamics, calibrated using novel crowd-sourced data. Background: Data for evaluating agent-based models Crowd-sourced data Data and study area: Twitter in Leeds Establishing behaviour from tweets Integrating with a model of urban dynamics

Agent-Based Modelling Autonomous, interacting agents Represent individuals or groups Usually spatial Model social phenomena from the ground-up A natural way to describe systems Ideal for social systems

Data in Agent-Based Models Data required at every stage: Understanding the system Calibrating the model Validating the model But high-quality data are hard to come by Many sources are too sparse, low spatial/temporal resolution Censuses focus on attributes rather than behaviour and occur infrequently Understanding social behaviour How to estimate leisure times / locations? Where to socialise?

Crowd-Sourced Data for Social Simulation Movement towards use of massive data sets Fourth paradigm data intensive research (Bell et al., 2009) in the physical sciences “Crisis” in “empirical sociology” (Savage and Burrows, 2007) New sources Social media  Facebook, Twitter, Flikr, FourSquare, etc. Volunteered geographical information (VGI: Goodchild, 2007)  OpenStreetMap Commercial  Loyalty cards, Amazon customer database, Axciom Potentially very useful for agent-based models Calibration / validation Evaluating models in situ (c.f. meteorology models)

New Paradigms for Data Collection (Successful) mobile apps to collect data Offer something to users New methodology for survey design (?) E.g. mappiness Ask people about happiness Relate to environment, time, weather, etc.

Data and Study Area Data from Twitter Restricted to those with GPS coordinates near Leeds ‘Streaming API’ provides real-time access to tweets Filtered non-people and those with < 50 tweets Before Filtering 2.4M+ geo-located tweets (June 2011 – Sept 2012). 60,000+ individual users Highly skewed: 10% from 32 most prolific users After Filtering 2.1M+ tweets 7,500 individual users Similar skew (10% from 28 users)

Prolific Users

Temporal Trends Hourly peak in activity at 10pm Daily peak on Tuesday - Thursday General increase in activity over time Old data…

Identifying actions Need to estimate what people are doing when they tweet (the ‘key’ behaviours) Analyse tweet text Automatic routine Keyword search for: ‘home’, ‘shop’, ‘work’ ‘Home’ appears to be the area with the highest tweet density Unfortunately even tweets that match key words are most dense around the home

Spatio-temporal text mining Individual tweets show why keyword search fails: Work: “Does anyone fancy going to work for me? Don’t want to get up” Home: “Pizza ordered ready for ones arrival home” Shop: “Ah the good old sight of The White Rose shopping centre. Means I’m nearly home” But still potential to estimate activity. E.g. “I’m nearly home” Combination of spatial and textual analysis is required Parallels in text mining (e.g. NaCTeM) and other fields (e.g. crime modus operandi or The Guardian analysis of recent British riots) New research direction: “Spatial text mining” ?

Analysis of Individual Behaviour – Anchor Points Spatial analysis to identify the home locations of individual users Some clear spatio- temporal behaviour (e.g. communting, socialising etc.). Estimate ‘home’ and then calculate distance from home at different times Journey to work?

More important than aggregate patterns, we can identify the behaviour of individual users Estimate ‘home’ and then calculate distance at different times Could estimate journey times, means of travel etc. Very useful for calibration of an ABM Spatio-Temporal Behaviour

Activity Matrices (I) Once the ‘home’ location has been estimated, it is possible to build a profile of each user’s average daily activity The most common behaviour at a given time period takes precedence ‘Raw’ behavioural profiles Interpolating to remove no- data At Home Away from Home No Data User a b c d e f g h i User a b c d e f g h i

Activity Matrices (II) Overall, activity matrices appear reasonably realistic Peak in away from home at ~2pm Peak in at home activity at ~10pm. Next stages: Develop a more intelligent interpolation algorithm (borrow from GIS?) Spatio-temporal text mining routines to use textual content to improve behaviour classification

Towards A Model of Urban Dynamics (I) Microsimulation Simulation are: Leeds and a buffer zone Microsimulation to synthesise individual-level population ~80M people in Leeds 2.08M in simulation area Iterative Reweighting Useful attributes (employment, age, etc.) Data: UK Census Small Area Statistics Sample of Anonymised Records

Towards A Model of Urban Dynamics (II) Commuting Estimate where people go to work from the census Model parameters determine when people go to work and for how long. Sample from a normal distribution Parameters can vary across region

Towards A Model of Urban Dynamics (II) Calibration Calibrate these parameters to data from Twitter (e.g. ‘activity matrices’) Large parameter space 4 * NumRegions Use (e.g.) a genetic algorithm

Prototype Model

Videos.. A video of the prototype model running is available online:

Computational Challenges Handling millions of agents… Memory Runtime  Especially in a GA! History of actions Managing data Spatial analysis  GIS don’t like millions of records  E.g. hours to do a simple data calculation Storage (2Gb+ database of tweets) Simple analysis become difficult  Too much data for Excel

Data and Ethical Challenges Data bias: Sampling  1% sample (from twitter)  <10% sample (from GPS)  Who’s missing? Enormous skew  Large quantity generated by small proportion of users Similar problems with other data (e.g. rail travel smart cards, Oyster) Some solutions?  Geodemographics  Linking to other individual-level data sets? Ethics

Conclusions Aim: develop a model of urban-dynamics, calibrated using novel crowd-sourced data. New “crowd-sourced” data can help to improve social models (?) Possibly insurmountable problems with bias, but methods potentially useful in the future Particularly in terms of how to manage the data/computations Improved identification of behaviour New ways to handle computational complexity In situ model calibration

Thank you Nick Malleson & Mark Birkin, School of Geography, University of Leeds