IDENTIFYING USERS PROFILES FROM MOBILE CALLS HABITS August 12, 2012 - Beijing, China B Furletti, L. Gabrielli, C. Renso, S. Rinzivillo KddLab, ISTI – CNR,

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Data Mining Tools Overview Business Intelligence for Managers.
Mining Mobile Group Patterns: A Trajectory-based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu NSYSU, Taiwan Ee-Peng Lim NTU, Singapore.
[ §4 : 1 ] 4. Requirements Processes I Overview 4.1Fundamentals 4.2Elicitation 4.3Specification 4.4Verification 4.5Validation Requirements Definition Document.
Alessandro Vespignani Science, Vol July 2009 (Prepared by Hasan T Karaoglu)
The role of Domain Knowledge in a large scale Data Mining Project Kopanas I., Avouris N., Daskalaki S. University of Patras.
Self Organization: Competitive Learning
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
AI Week 22 Machine Learning Data Mining Lee McCluskey, room 2/07
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Tracking Video Objects in Cluttered Background
Recommender systems Ram Akella November 26 th 2008.
Data Mining – Intro.
Chapter 5 Data mining : A Closer Look.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi Efficient distributed computation of human mobility aggregates through user mobility profiles.
Chapter 11 LEARNING FROM DATA. Chapter 11: Learning From Data Outline  The “Learning” Concept  Data Visualization  Neural Networks The Basics Supervised.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
CENTRE CEllular Network Trajectories Reconstruction Environment F. Giannotti, A. Mazzoni, S. Puntoni, C. Renso KDDLab, Pisa.
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
CS654: Digital Image Analysis
A Study of Smartphone User Privacy from the Advertiser's Perspective Yan Wang 1, Yingying Chen 1, Fan Ye 2, Jie Yang 3, Hongbo Liu 4 1 Department of Electrical.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Image Segmentation in Color Space By Anisa Chaudhary.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
Data Mining and Decision Support
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Unsupervised Classification
Digital Interventions for Sustainable Urban Mobility: A Pilot Study Silvia Gabrielli CREATE-NET Via Alla Cascata, 56/D Trento, Italy
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining – Intro.
Self-Organizing Network Model (SOM) Session 11
Administrative data, calling patterns and spatial economics: Impact evaluation drawing on multiple data sources Nathaniel Young (EBRD)
An analytical framework to nowcast well-being using Big Data
Pasquale Pagano CNR – ISTI (Pisa, Italy)
DATA MINING © Prentice Hall.
Data Mining, Neural Network and Genetic Programming
Self Organizing Maps: Parametrization of Parton Distribution Functions
Unsupervised Learning Networks
Description of national ongoing/intended data processing
DEMON A Local-first Discovery Method For Overlapping Communities
Final Project – Anomalies Detection
Clustering Uncertain Taxi data
Unsupervised Learning and Neural Networks
Self organizing networks
Roland Kwitt & Tobias Strohmeier
Description of target statistical outputs Roberta Radini – Istat
Sangeeta Devadiga CS 157B, Spring 2007
Density Mapping of Dating App Users across Time and Space in Mumbai, India Benjamin Eveslage, Purvi Shah, Caleb Parker, Bitra George, Jiban Baishya 24.
Internet of Things A Process Calculus Approach
Carlos César Barioni de Oliveira Enerq/USP André Méffe Enerq/ USP
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
CSE572, CBS572: Data Mining by H. Liu
Introduction to Cluster Analysis
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Sample Analytics Categories
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

IDENTIFYING USERS PROFILES FROM MOBILE CALLS HABITS August 12, Beijing, China B Furletti, L. Gabrielli, C. Renso, S. Rinzivillo KddLab, ISTI – CNR, Pisa (Italy)

Outline  Profiling of user behaviors from GSM data  GSM data  Validation of the dataset  Two complementary approaches Deductive approach (TOP DOWN) Inductive approach (BOTTOM UP)  New findings and future developments

Objective and Methods  Partition the users tracked by GSM phone calls into profiles like:  Residents  Commuters  People in transit  Visitors/Tourists  Analysis of the users’ phone call behaviors with:  A deductive technique (the Top-Down) based on spatio- temporal rules.  An inductive technique (the Bottom Up) based on machine learning.  Refinement and integration of the Top Down result with the Bottom Up.

The data GSM data provided by an Italian mobile phone operator on the whole province of Pisa Call Data Records (CDR) Data of the users’ calls.

GSM data to identify Visitors  Definition:  A foreign tourist is identified as «in roming user».  A Italian tourist is a user that, in the observation window, appears for a certain period of time and than disappear.

Validation of the GSM sample  Validation of the GSM data sample using the market penetration factor claimed by the mobile operator in the province of Pisa.  This factor is used to estimate the total number of residents in the province of Pisa.  RESULT: The GSM sample (Resident population in the province) is in line with the number of mobile contracts in the province.

Rule Bases Classifier (Top Down)  Objective: Partition the users seen in the urban area of Pisa in: Residents, Commuters, and People in Transit. Basing on the definition of these categories, a set of spatio- temporal rules are implemented in order to separate the set of users. Deductive approach Resident. A person is resident in an area A when his/her home is inside the A. Therefore the mobility tends to be from and towards his/her home. Commuter. A person is a commuter between an area B and an area A, if his/her home is in B while the workplace is in A. Therefore the daily mobility of this person is mainly between B and A. In Transit. An individual is “in transit” over an area A, if his/her home and work places are outside area A, and his/her presence inside area A is limited by a temporal threshold representing the time necessary to transit through A.

User’s Temporal Profile  Preliminary data preparation before the Bottom Up analysis…  Aggregation od the call data in a Temporal Profiles for each user:  Daily profile  Weekly profile  Shifted profile

Bottom Up: SOM Clustering  Objectives:  Integrate and refine the Top Down results trying to partition the unclassified users.  Identify the Visitors/Tourists, and Residents and Commuters not “captured” discovered with the Top Down method.  Definition of user Temporal Profile by using the call behavior.  Analysis of the temporal profiles by using a data mining strategy* in order to group similar profiles and identify the categories.  *Self Organizing Maps (SOM): a type of neural network based on unsupervised learning. It produces a one/two-dimensional representation of the input space using a neighborhood function to preserve the topological properties of the input space. Inductive approach Temporal Profile SOM Map Computation Commuters Visitors/Tourists Residents

SOM result: Visitors/Tourists  Rotated Temporal Profile to identify Visitors/Tourists categories.  Visitors/Tourists: Limited presence for few consecutive days

SOM results: Residents and Commuters  Residents: Uniformly distributed presence along the period (on the left, center and top).  Commuters: general presence during the weekdays. Noticeable absence during the weekends (bottom-left corner)

Future steps and work in progress  Improving the whole strategy: using the Top Down and Bottom Up analysis on the whole dataset.  Use the Top Down as validation set for the Bottom Up.  Modifying the user’s temporal profile in a more informative data structure.

New results Resident profile Commuter profile Visitor profile Among the unclassified there are other interesting profiles: - The occasional visitors; - The «night visitors».

Conclusions  Profiling of users by mean of an automatic GSM analytical procedure  Definition of a middle-aggregation: temporal profiles Sensible information is preserved during the transformation Profiling can operate only on the TP Complete separation of data provider and data analysts This may enable a continuous profiling service