Human Mobility Modeling at Metropolitan Scales Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky,

Slides:

Advertisements

Similar presentations

By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )

Advertisements

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,

Authors: J.A. Hausman, M. Kinnucan, and D. McFadden Presented by: Jared Hayden.

Energy-Efficient Computing for Wildlife Tracking: Design Tradeoffs and Early Experiences with ZebraNet Presented by Eric Arnaud Makita

The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23,

G. Alonso, D. Kossmann Systems Group

Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters

Analysis and Multi-Level Modeling of Truck Freight Demand Huili Wang, Kitae Jang, Ching-Yao Chan California PATH, University of California at Berkeley.

Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY Roger Piqueras Jover AT&T Security.

Monte Carlo Localization for Mobile Robots Karan M. Gupta 03/10/2004

Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,

SLAW: A Mobility Model for Human Walks Lee et al..

Pattern Recognition and Machine Learning

CENTRE Cellular Network’s Positioning Data Generator Fosca GiannottiKDD-Lab Andrea MazzoniKKD-Lab Puntoni SimoneKDD-Lab Chiara RensoKDD-Lab.

An Analysis of the Optimum Node Density for Ad hoc Mobile Networks Elizabeth M. Royer, P. Michael Melliar-Smith and Louise E. Moser Presented by Aki Happonen.

On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Subcenters in the Los Angeles region Genevieve Giuliano & Kenneth Small Presented by Kemeng Li.

Radial Basis Function Networks

Performance Evaluation of Vehicular DTN Routing under Realistic Mobility Models Pei’en LUO.

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University.

Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.

Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.

Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

Where did you come from? Where did you go? Robust policy relevant evidence from mobile network big data Danaja Maldeniya, Amal Kumarage, Sriganesh Lokanathan,

Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.

Using ArcView to Create a Transit Need Index John Babcock GRG394 Final Presentation.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.

CENTRE CEllular Network Trajectories Reconstruction Environment F. Giannotti, A. Mazzoni, S. Puntoni, C. Renso KDDLab, Pisa.

A review of M. Zonoozi, P. Dassanayake, “User Mobility and Characterization of Mobility Patterns”, IEEE J. on Sel. Areas in Comm., vol 15, no. 7, Sept.

On the Cost/Delay Tradeoff of Wireless Delay Tolerant Geographic Routing Argyrios Tasiopoulos MSc, student, AUEB Master Thesis presentation.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

Managerial Economics Demand Estimation & Forecasting.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,

Connectivity-Aware Routing (CAR) in Vehicular Ad Hoc Networks Valery Naumov & Thomas R. Gross ETH Zurich, Switzerland IEEE INFOCOM 2007.

June 10, 1999 Discrete Event Simulation - 3 What other subsystems do we need to simulate? Although Packets are responsible for the largest amount of events,

BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.

Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

MA354 An Introduction to Math Models (more or less corresponding to 1.0 in your book)

Model City Model City A study of the value of simulation and modeling as a tool for urban planners, politicians, and citizens A research proposal submitted.

© 2008 Frans Ekman Mobility Models for Mobile Ad Hoc Network Simulations Frans Ekman Supervisor: Jörg Ott Instructor: Jouni Karvo.

1 CLUSTER VALIDITY  Clustering tendency Facts  Most clustering algorithms impose a clustering structure to the data set X at hand.  However, X may not.

Mobility Models for Wireless Ad Hoc Network Research EECS 600 Advanced Network Research, Spring 2005 Instructor: Shudong Jin March 28, 2005.

Chapter 14 : Modeling Mobility Andreas Berl. 2 Motivation  Wireless network simulations often involve movements of entities  Examples  Users are roaming.

A Spatial-Temporal Model for Identifying Dynamic Patterns of Epidemic Diffusion Tzai-Hung Wen Associate Professor Department of Geography,

Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.

Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.

Introduction to statistics I Sophia King Rm. P24 HWB

Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.

U of Minnesota DIWANS'061 Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and.

Paging Area Optimization Based on Interval Estimation in Wireless Personal Communication Networks By Z. Lei, C. U. Saraydar and N. B. Mandayam.

MA354 Math Modeling Introduction. Outline A. Three Course Objectives 1. Model literacy: understanding a typical model description 2. Model Analysis 3.

David Peterson UP206a – GIS (Estrada) December 6, 2010 Source: Ecotality.

Group #3: Mobility Models and Mobile Testbeds. The Models Motion, Traffic, Network.

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.

Kobe Boussauw – 15/12/2011 – Spatial Planning in Flanders: political challenges and social opportunities – Leuven Spatial proximity and distance travelled:

Virtual University of Pakistan Lecture No. 35 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.

Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.

Mobility Trajectory Mining Human Mobility Modeling at Metropolitan Scales Sibren Isaacman 2012 Mobisys Jie Feng 2016 THU FIBLab.

Floating Content in Vehicular Ad-hoc Networks

Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of

Sibren Isaacman Loyola University Maryland

Data Pre-processing Lecture Notes for Chapter 2

Presentation transcript:

Human Mobility Modeling at Metropolitan Scales Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky, Walter Willinger Princeton University, Princeton AT&T Labs, Florham Park, NJ, USA Mobisys 2012 Junction 101.07.09

Introduction Human mobility model Mobile sensing, opportunistic networking, urban planning, ecology, and epidemiology Aim to produce accurate models of how large populations move within different metropolitan areas Generate sequences of locations and associated times that capture how individuals move between important places in their lives, such as home and work. Aggregate the movements of many such individuals to reproduce human densities over time at the geographic scale of metropolitan areas. Take into account how different metropolitan areas exhibit distinct mobility patterns due to differences in geographic distributions of homes and jobs, transportation infrastructures and other factors

Shortages of previous works Random motion -> unrealistic motion Lack of memory about recurring movement patterns Lack of spatiotemporal realism about population densities Without diverse population and geographic concerns Small geographic area (e.g. campus) Too universal -> fail to adopt to different geographic areas

Sources Call Detail Records (CDRs) Creating human mobility models from such data Maintained by cellular network operators Information -> Sporadic samples of the approximate locations of the phone’s owner The time a voice call was placed or a text message was received The identity of the call tower with which the phone was associated Difficulties: Whether associated locations corresponding to home, work, or other important places for particular cellphone users. Both the spatial and temporal granularity of CDR data is quite coarse Spatially, the granularity of cell-tower spacing Temporally, only generated when phones are actively involved

Overall view WHERE modeling approach (Work and Home Extracted Regions) Human movement (important locations, commute distances .. ) => probability distributions => generate synthetic CDRs for an arbitrary number of synthetic people

Parameters for mobility modeling Human mobility is tightly coupled to the geography of the city people live. => should take into account both the area geography and individual user mobility patterns. Spatial information: important locations Spatiotemporal information: hourly population densities Temporal Information: calling patterns

Spatial: Important Locations A full 60% of mobility can be accounted for just the top two cellphone towers with which a user is associated. Important locations: home and work Probability distribution home locations: Home CommuteDistance : d Work locations: Work

Spatiotemporal: Hourly population densities Heavily residential area is likely to be more populated at night Commercial district is likely to be more populated during the day Hourly population density: simply reflects the probability of people being at a particular location at a particular time (Hourly) Assumption: the spatial densities of telephone calls is approximately equivalent to the spatial density of people 7 ~ 8 p.m. on weekdays Over a 3-month period

Temporal: calling patterns A user’s daily call volume characteristics from distribution: PerUserCallsPerDay ~ N(mean, sd) Temporal patterns of when those calls are made: 2 classes (clustered by k-mean method) How many calls each user makes during each hour of the day 24-dimensional vector Hourly call probability distribution: CallTime

Algorithm for model generation WHERE2 => Two-place model: Work and Home Makes use of the fact that most people spend the majority of their time either at home or at work.

WHERE2: Work and Home Users’ movement: occurs based on synthetic CDRs representing calls made at different locations at different times

Extension to Additional Places WHERE3: increasing the number of important locations Tradeoff: model complexity <-> fidelity of the synthetic trace

“distance”, the linear mapping used in EMD. Evaluation Earth Mover’s Distance (EMD) A “good” synthetic trace has the synthetic user population distributed in a very similar way in space as the real trace, at any point during the day. A measure for comparing two spatial probability distributions Attempt to find the minimum amount of energy required to transform one probability distribution into another. This energy is given by the “amount” of probability to be moved and the “distance” to move it. “distance”, the linear mapping used in EMD.

Evaluation Comparison models Random Waypoint (RWP) Each user selects a random destination from all possible destinations in the area to be simulated. Once a destination is selected, the user moves at a random velocity toward the destination. When the selected destination is reached, the user waits for a random amount of time and then selects a new destination and new velocity to begin the procedure. Weighted Random Waypoint (WRWP) The destination is chosen from a location probability distribution Here, use the distribution “Hourly”

Evaluation Sources for input probability distributions Real call detail records (CDRs) ZIP codes within a 50-mile radius of the LA and NY centers Billing addresses lie within the metropolitan regions of interest Phone id, starting time, duration, cell towers locations Data Validation (CDR can accurately represent the mobility patterns) The number of sampled phones in each ZIP code is proportional to the population of that ZIP code The maximum pairwise distance between any 2 cell towers contacted by a phone in one day is a close approximation for how far the phone’s owner traveled that day. Applying certain clustering and regression techniques produces accurate estimates of important locations in people’s lives, in particular home and work.

Evaluation Sources for input probability distributions Census Data Need to buy CDRs! Publicly available data regarding home and work locations, as well as commute distances, for large populations Provide little or no information about the hourly probabilities of a given location Make an assumption about the hourly distributions Combination of public data and CDRs home, work, and commute distances come from the census Other distributions are drawn from CDR data

Evaluation Artificial Test Cases To reason about the expected behavior of the model and its strengths and weaknesses Two locations: The model must be able to place synthetic users at home and work locations and move them according to time of day since we emulate CDRs with discrete call locations, the synthetic users must only exist at these locations The entire world is populated by users that move predictably between two locations at highly regimented intervals. From 7am to 7pm on weekdays, all of the probability is concentrated in a single “work” location. At all other times, the probability is clustered in a second “home” location.

Validation Two locations test case WHERE2 is able to correctly model the call distribution information both during “home” and “work” phases. WRWP doesn’t have sufficient temporal input to distinguish how 9am mobility probabilities should differ from 6am ones. Idea result, in which probability is a spike at the home location at 6am, and a spike at work location at 9am. RWP clearly differs greatly, because it is given so little input information regarding spatial or temporal patterns to model. Two locations test case

Evaluation Artificial Test Cases To reason about the expected behavior of the model and its strengths and weaknesses If multiple possible works exist for a home, the model must select a realistic one. has a single user travel to each of the four possible locations, resulting in a signifﬁcantly worse EMD. WHERE2 detects that users make calls only from one of two locations and thus correctly positions users.

Validation: Large Scale Real Data Modeling based on real CDRs WRWP improves over original RWP by 4 times WHERE out performs WRWP by additional 20% WHERE3 improves further (NY by additional 3 miles accuracy)

Validation: Large Scale Real Data Modeling based on real CDRs

Validation: Large Scale Real Data Modeling based on Census Data Using all-public data from the US Census With average error of 8 miles in NY Using hybrid of census data with some CDR information An average error of 6.8 miles Using all-CDR information in WHERE3 Reduces average error to some 3 miles

*. WHERE2: median is within 0.8 miles of the true value. Example Uses Daily range: maximum distance a person travels in one day serves as an important metric for verifying correctness of the generated models The EMD metric measures the aggregate behavior of the users, but daily range displays the results at a per-user granularity. Comparing NY and LA: LA with only 1 mile error applicability across cities with very different mobility patterns and geographic characteristics. (2, 25, 50, 75, 98) percentile *. WHERE2: median is within 0.8 miles of the true value. *. WHERE3 0.7 miles error => Adding a third point to the model provides very little benefit for this use.

Example Uses Message Propagation: relevant in opportunistic networking Social contacts, epidemiology and data carrying Simulator for epidemic routing As user meet they exchange all messages (0.5 radius) Delivery percentage and message delay rate

Example Uses Hypothetical Cities: what-if scenario regarding mobility in NY Create parameterized model of cities and user behavior patterns City planner to experiment with the effects of modifications that are being considered 10% of the people whose original work location were in the borough of Manhattan were instead given work locations that matched their home location.

Conclusion Human mobility modeling Refinements Capture the motion of individuals among important places in their lives Aggregating that motion to reproduce human densities over time at the scale of a metropolitan area. Accounting for differences between metropolitan areas. Refinements To produce not only sequences of locations with associated times, but also routes taken between those locations. CDRs and census tables are too coarse to provide route information Sequences of Cell towers passing by