Description of compiled mobile phone data sets Roberta Radini – Istat

Slides:



Advertisements
Similar presentations
Big Data Working with Terabytes in SQL Server Andrew Novick
Advertisements

1 1 Apache Hadoop and the Emergence of the Enterprise Data Hub Eli Collins, Chief Technologist ©2014 Cloudera, Inc. All rights reserved.
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
Architecting for the Internet of Things
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
John Degenhart Joseph Allen.  What is FTP?  Communication over Control connection  Communication over Data Connection  File Type  Data Structure.
Under the Guidance of: Mr S.Karthikeyan.MCA..  The project is entitled as “ SMS Based Student Information System” created by using Visual Basic.  Flexible.
D4M – Signal Processing On Databases
Information Security for Managers (Master MIS)
®® Microsoft Windows 7 for Power Users Tutorial 5 Comparing Windows 7 File Systems.
Overview of SQL Server Alka Arora.
® © 2015 Inmar®, Inc. CONFIDENTIAL Not to be reproduced or distributed without written permission from Inmar Why We Chose Hadoop 1 5/26/2015.
An Approach for Processing Large and Non-uniform Media Objects on MapReduce-Based Clusters Rainer Schmidt and Matthias Rella Speaker: Lin-You Wu.
REQUIREMENTS The Desktop Team Raphael Perez MVP: Enterprise Client Management, MCT RFL Systems Ltd
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
Chokchai Junchey Microsoft Product Specialist Certified Technical Training Center.
An Introduction to HDInsight June 27 th,
Understanding our world.. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Editing Versioned Geodatabases.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer,
Esri UC 2014 | Technical Workshop | Editing Versioned Geodatabases : An Introduction Cheryl Cleghorn and Shawn Thorne.
IT Architectures for Handling Big Data in Official Statistics: the Case of Scanner Data in Istat Gianluca D’Amato, Annunziata Fiore, Domenico Infante,
All about Revolution R Enterprise
Under The Guidance of Smt. Ch.Ratna Kumari Asst.Professor Submitted by M Ravi Kumar Roll No:10021F0006 M.C.A.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
Training Workshop on Development of Core Statistical indicators for ICTs Tunisian Experience in ICT indicators Collection. Tunisian presentation June 2005.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Advanced Databases COMP3017 Dr Nicholas Gibbins
GPRS General Packet Radio Service Shay Toder – Ori Matalon The Department of Communication System Engineering Ben-Gurion University June 19, 2002.
SPECTO TRAINING contact us: , mail :
SAP Tuning 실무 SK㈜ ERP TFT.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
House Finding Management Supervisor: Mr. Trần Đình Trí & Avengers Team 1.
Ayman El-Ghazali Senior Microsoft.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Aga Private computer Institute Prepared by: Srwa Mohammad
- Inter-departmental Lab
Business Discovery, Monitoring & Reporting Data Flow iCLM UI Operator Systems OCS IN CDR PCC CRM Marketing Operations CSR Monitoring Marketing Integration.
Big Data is a Big Deal!.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
InGenius Connector Enterprise Microsoft Dynamics CRM
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
SNS (Security & Network Service)
Running virtualized Hadoop, does it make sense?
Bridging the Data Science and SQL Divide for Practitioners
Sqoop Mr. Sriram
Scaling SQL with different approaches
Description of national ongoing/intended data processing
Big Data Intro.
Data Platform and Analytics Foundational Training
Description of target statistical outputs Roberta Radini – Istat
INTRODUCTION We are perfectly formed creative Software Consulting Company based in USA & India. We have been making pretty things since Feb.
Collaborative Business Solutions
Analytics Plus Product Overview 1.
Introduction to SAP HANA
Chapter 1: Introduction
Applying Data Warehousing and Big Data Techniques to Analyze Internet Performance Thiago Barbosa, Renan Souza, Sérgio Serra, Maria Luiza and Roger Cottrell.
Big Data, Bigger Data & Big R Data
EAST MDSplus Log Data Management System
EAST MDSplus Log Data Management System
Presentation transcript:

Description of compiled mobile phone data sets Roberta Radini – Istat I Internal Meeting of WP5 Mobile Phone Data Madrid, 7 June

Outline Introduction to Process Data Collection of CDR Description of mobile data sets Results of first analysis IT Environment Description of compiled mobile phone data sets

Introduction to Process Data Collection of CDR ISTAT received CDR data from WIND at the end of February 2017 CDRs refer to calls and SMS (text messages) in the province of Pisa in the period between 1st January and 12th February. According to agreements with the Privacy Guarantor: WIND encrypted the calling SIMs (Subscriber Identity Module) and, before sending the data to ISTAT, it destroyed the bridge table of encrypted SIMs and internal code SIMs ISTAT received CDRs on a secure channel and stored data in a DB and on an IT platform for Big Data (Cloudera) Access to data is restricted to authorized users only Description of compiled mobile phone data sets

Description of mobile data sets ISTAT received CDRs in compressed files. Each file contains the CDRs of a municipality in the province of Pisa of one of the 53 days of the defined period The number of files is 1.484 (53 days x 28 municipalities). Note: the number of municipalities in the province of Pisa is 37, but 5 municipalities do not have antennas. With regards to the remaining 3 municipalities, even though they have antennas, we did not receive data The record path of the CDRs contained in the files: Variables names Descriptions TIPOLOGIA_CDR: Call or SMS CHIAVE_NUM_CHIAMANTE: Code of SIM DATA_INIZIO_CHIAMATA: The date of the beginning of the call ORA_INIZIO_CHIAMATA: The time of the beginning of the call DURATA_CHIAMATA: The duration of the call COMUNE: Municipality Description of compiled mobile phone data sets

Description of mobile data sets The total number of CDRs is 17.755.753 divided into: 10.888.301 Call 6.867.452 SMS The total number of Calling SIMs is 435.779 The volume is about 1,5 GB ……not really BIG!!!!! Note: the number of active SIMs is 22,9% of the total number of SIMs and the number of resident population of the province of Pisa is 420.913 Description of compiled mobile phone data sets

Results of first analysis Number calling SIM Per day Weekend Number of CDRs Per day Abnormal peak Description of compiled mobile phone data sets

8-node Hadoop cluster IT Environment Cloudera Enterprise 5.8 Standard Hadoop parallel storage/processing, SQL, NoSQL, Spark… Manager Administration console Impala High-speed analytics engine Security Advanced access control Cloudera Enterprise 5.8 Technical specifications 32/16 Cores CPUs 128 Gb RAM per node 20Gbit internal connection 6 x 1.2Tb hard drives per node (60Tb in overall) 8-node Hadoop cluster Description of compiled mobile phone data sets

Thanks