Description of compiled mobile phone data sets Roberta Radini – Istat I Internal Meeting of WP5 Mobile Phone Data Madrid, 7 June
Outline Introduction to Process Data Collection of CDR Description of mobile data sets Results of first analysis IT Environment Description of compiled mobile phone data sets
Introduction to Process Data Collection of CDR ISTAT received CDR data from WIND at the end of February 2017 CDRs refer to calls and SMS (text messages) in the province of Pisa in the period between 1st January and 12th February. According to agreements with the Privacy Guarantor: WIND encrypted the calling SIMs (Subscriber Identity Module) and, before sending the data to ISTAT, it destroyed the bridge table of encrypted SIMs and internal code SIMs ISTAT received CDRs on a secure channel and stored data in a DB and on an IT platform for Big Data (Cloudera) Access to data is restricted to authorized users only Description of compiled mobile phone data sets
Description of mobile data sets ISTAT received CDRs in compressed files. Each file contains the CDRs of a municipality in the province of Pisa of one of the 53 days of the defined period The number of files is 1.484 (53 days x 28 municipalities). Note: the number of municipalities in the province of Pisa is 37, but 5 municipalities do not have antennas. With regards to the remaining 3 municipalities, even though they have antennas, we did not receive data The record path of the CDRs contained in the files: Variables names Descriptions TIPOLOGIA_CDR: Call or SMS CHIAVE_NUM_CHIAMANTE: Code of SIM DATA_INIZIO_CHIAMATA: The date of the beginning of the call ORA_INIZIO_CHIAMATA: The time of the beginning of the call DURATA_CHIAMATA: The duration of the call COMUNE: Municipality Description of compiled mobile phone data sets
Description of mobile data sets The total number of CDRs is 17.755.753 divided into: 10.888.301 Call 6.867.452 SMS The total number of Calling SIMs is 435.779 The volume is about 1,5 GB ……not really BIG!!!!! Note: the number of active SIMs is 22,9% of the total number of SIMs and the number of resident population of the province of Pisa is 420.913 Description of compiled mobile phone data sets
Results of first analysis Number calling SIM Per day Weekend Number of CDRs Per day Abnormal peak Description of compiled mobile phone data sets
8-node Hadoop cluster IT Environment Cloudera Enterprise 5.8 Standard Hadoop parallel storage/processing, SQL, NoSQL, Spark… Manager Administration console Impala High-speed analytics engine Security Advanced access control Cloudera Enterprise 5.8 Technical specifications 32/16 Cores CPUs 128 Gb RAM per node 20Gbit internal connection 6 x 1.2Tb hard drives per node (60Tb in overall) 8-node Hadoop cluster Description of compiled mobile phone data sets
Thanks