PCFG Based Synthetic Mobility Trace Generation S. C. Geyik, E. Bulut, and B. K. Szymanski Department of Computer Science, Rensselaer Polytechnic Institute.

Slides:



Advertisements
Similar presentations
Fuzzy Angle Fuzzy Distance + Angle AG = 90 DG = 1 Annual Conference of ITA ACITA 2009 Exact and Fuzzy Sensor Assignment Hosam Rowaih 1 Matthew P. Johnson.
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Modeling Maze Navigation Consider the case of a stationary robot and a mobile robot moving towards a goal in a maze. We can model the utility of sharing.
Leveraging the Absence of Observations: Pattern Recognition in Spatiotemporal Behavioral Data for Site and Purpose Discovery Michael J. Haass, Mark H.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Dynamic Bayesian Networks (DBNs)
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.
University of Athens, Greece Pervasive Computing Research Group Predicting the Location of Mobile Users: A Machine Learning Approach 1 University of Athens,
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
THE TITLE OF YOUR PAPER Your Name Communication Networks Laboratory School of Engineering Science Simon Fraser University.
Choosing an Accurate Network Model using Domain Analysis Almudena Konrad, Mills College Ben Y. Zhao, UC Santa Barbara Anthony Joseph, UC Berkeley The First.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Speed and Direction Prediction- based localization for Mobile Wireless Sensor Networks Imane BENKHELIFA and Samira MOUSSAOUI Computer Science Department.
Performance Evaluation of Vehicular DTN Routing under Realistic Mobility Models Pei’en LUO.
Integrated Social and Quality of Service Trust Management of Mobile Groups in Ad Hoc Networks Ing-Ray Chen, Jia Guo, Fenye Bao, Jin-Hee Cho Communications.
Detecting Node encounters through WiFi By: Karim Keramat Jahromi Supervisor: Prof Adriano Moreira Co-Supervisor: Prof Filipe Meneses Oct 2013.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Chapter 1 Introduction to Simulation
WIRELESS SENSOR NETWORKS: POWER RELATED ISSUES Jay Wiley, # CSE 7344,Computer Networks and Distributed Systems II Dr. Golla.
Efficient Gathering of Correlated Data in Sensor Networks
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Context-aware Adaptive Routing for Delay Tolerant Networking Mirco Musolesi Joint work with Cecilia Mascolo Department of Computer Science University College.
1 Meeyoung Cha and DK Lee Advisor - Sue Moon (Korea Advanced Institute of Science and Technology) IEEE INFOCOM 2005 Student Workshop Split-n-Save : Path.
A Cognitive Substrate for Natural Language Understanding Nick Cassimatis Arthi Murugesan Magdalena Bugajska.
Some Probability Theory and Computational models A short overview.
Inferring High-Level Behavior from Low-Level Sensors Don Peterson, Lin Liao, Dieter Fox, Henry Kautz Published in UBICOMP 2003 ICS 280.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Prophet Address Allocation for Large Scale MANETs Matt W. Mutka Dept. of Computer Science & Engineering Michigan State University East Lansing, USA IEEE.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
A Cross-Sensor Evaluation of Three Commercial Iris Cameras for Iris Biometrics Ryan Connaughton and Amanda Sgroi June 20, 2011 CVPR Biometrics Workshop.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
NextPlace: A Spatio-Temporal Prediction Framework for Pervasive Systems Salvatore Scellato1, Micro Musolesi, Cecilia Mascolo1, Vito Latora, and Andrew.
Wireless communications and mobile computing conference, p.p , July 2011.
A Sociability-Based Routing Scheme for Delay-Tolerant Networks May Chan-Myung Kim
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Project Lachesis: Parsing and Modeling Location Histories Daniel Keeney CS 4440.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Node Reclamation and Replacement for Long-lived Sensor Networks Bin Tong, Wensheng Zhang, and Chuang Wang Department of Computer Science, Iowa State University.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Copyright © 2007 OPNET Technologies, Inc. CONFIDENTIAL - RESTRICTED ACCESS: This information may not be disclosed, copied, or transmitted in any format.
Time-Space Trust in Networks Shunan Ma, Jingsha He and Yuqiang Zhang 1 College of Computer Science and Technology 2 School of Software Engineering.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Chapter 14 : Modeling Mobility Andreas Berl. 2 Motivation  Wireless network simulations often involve movements of entities  Examples  Users are roaming.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Prophet/Critic Hybrid Branch Prediction B B B
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Zijian Wang, Eyuphan Bulut, and Boleslaw K. Szymanski Center for Pervasive Computing and Networking and Department of Computer Science Rensselaer Polytechnic.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Context-aware Adaptive Routing for Delay Tolerant Networking
Mohsen Riahi Manesh and Dr. Naima Kaabouch
N-Gram Model Formulas Word sequences Chain rule of probability
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Visual Programming Languages ICS 539 Icon System Visual Languages & Visual Programming, Chapter 1, Editor Chang, 1990 ICS Department KFUPM Sept. 1,
Presentation transcript:

PCFG Based Synthetic Mobility Trace Generation S. C. Geyik, E. Bulut, and B. K. Szymanski Department of Computer Science, Rensselaer Polytechnic Institute Methodology Probabilistic Context Free Grammars (PCFG)  Context free grammars with probabilities assigned to productions Ex: START  a START (0.4) | a (0.6) Probability of “ a a”  (P(START  a START) x P(START  a)) = 0.4 x 0.6 = 0.24  Modeling behavior sequences by constructing PCFGs from training data Only positive data is enough for Probabilistic CFGs ! Automated Construction of PCFGs  The algorithm is introduced in [1].  Two stages in Construction: (i) Data Incorporation: All sentences are introduced to the initial grammar as rules of the START non-terminal. One non-terminal for each terminal with probability 1.0.  Chunking: Generate a new non-terminal for a string and replace all occurrences of this string with the non-terminal, updating its frequency  Merging: Combine two non-terminals and replace all the occurrences of each non-terminal with this non-terminal (Generalization)  Goodness of grammar: Bayesian posterior probability P(G|D) of the grammar G given the training data D is defined as: P(G|D) = P(G) x P(D|G) P(D)  P(G) is related to description length and P(D|G) is related to sentence probabilities of training data. Time complexity of O(D 2 log(D)) in (1). (ii) Application of Operators: Chunk and Merge operators applied at each step to generalize and miniaturize the grammar. Motivation Need for long-term mobility data for initial testing of protocols, before actual deployment  It is difficult to collect realistic long-term traces  A method is needed to increase the amount of the collected mobility traces, keeping the characteristics Contributions A Probabilistic Context Free Grammar (PCFG) based method to generate synthetic mobility traces  A PCFG is learned from a real mobility trace where each sentence that can be produced by this PCFG represents a movement sequence in this real trace  We provide extensions to the PCFG model to capture spatial and temporal mobility characteristics  Once a PCFG is constructed, many sentences can be produced, dictating the movements of an entity, hence providing the synthetic mobility data Extensions to the PCFG Model Spatial mobility properties are modeled by representing each location with a terminal symbol in the PCFG  Example: a node starts at location l A, then goes to l B after 40 seconds and to l C after 25 seconds : l A, 40 l B 25 l C If t represents 25 seconds, this movement sequence can be integrated into the grammar training as: l A t t l B t l C !!! Notice the approximation of 40 seconds with two time tokens !!!  Trade-off between the time interval of the time token (resolution) and the complexity of the grammar Terminal mobility properties are modeled by the time tokens (t) representing a preset amount of time between the location symbols (please note that an alternative is to have the time token represent a distribution and always use a single time token, but this is left for future work). Trace Generation Algorithm  The algorithm creates a sentence from the constructed grammar which represents a movement sequence for a single mobile entity. Once the sequence is over, another sentence is generated. The sequences should be filtered according to the current location of the entity. This research was sponsored by US Army Research laboratory and the UK Ministry of Defence and was accomplished under Agreement Number W911NF and under Cooperative Agreement Number W911NF The views and conclusions contained in this document are those of the authors, and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defense, or the UK Government. The US and UK Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Acknowledgment [1] Geyik, S. C., Szymanski, B., Event Recognition in Sensor Networks by Means of Grammatical Inference, IEEE INFOCOM 2009, Rio de Janeiro, Brazil, March 2009, Page(s): [2] Zhang, X., Kurose, J., Levine, B., N., Towsley, D., Zhang, H., Study of a bus-based disruption tolerant network: Mobility modeling and impact on routing, In Proc. ACM Annual Intl. Conf. on Mobile Computing and Networking (Mobicom), Page(s): , [3] Piorkowski, M., Sarafijanovoc-Djukic, N., Grossglauser, M., A Parsimonious Model of Mobile Partitioned Networks with Clustering, The First International Conference on COMmunication Systems and NETworkS (COMSNETS), 2009, Bangalore, India. References  A PCFG-based method is provided to generate long-term synthetic mobility traces, crucial for protocol testing.  The method performs better than Markov models in closely modeling the mobility characteristics of the real world traces.  Future work includes future movement prediction for mobile entities and application of the PCFG-based method to other domains. Conclusions and Future Work Evaluations SF Cab Mobility Trace [3]  Taxi movements in San Francisco  Evaluated based on how much the synthetic traces coincide with actual traces (i.e. how much time passes between location change or how accurate the next location is)  We consider two metrics. One is how accurate are the synthetic traces in simulating the next movement (next location or next mobile entity encountered) given the previous k movements (e.g. Cons 2 means the next movement given just the previous movement (2-1=1 history)). The other one is how accurate are the synthetic traces in simulating the time it takes for the next movement given the previous k movements (e.g. Intern 2 is analogous to Cons 2).  We used Euclidean distances and used weighting to calculate the scores We compared the closeness of the PCFG and 2-Level Markov Model generated traces to the actual trace  Previous work suggests goodness of Markov Models on prediction and modeling, furthermore shows that more than two levels does not perform better.  Time complexity of building a 2-Level Markov Model is O(D log(D)) where D is the size of the trace. This can be calculated by taking the worst-case number of entries in the state table to be D and updating the statistics by each transition in the trace to be log(D) to find the necessary entry.  Bus to bus meeting data collected in Amherst, MA. Evaluated based on how much time is between consecutive meetings and how accurate is the set of buses met by a bus in a certain route. DieselNet Trace [2]