S SPATE: Compacting and Exploring Telco Big Data Constantinos Costa1 , Georgios Chatzimilioudis1, Demetris Zeinalipour-Yazti2,1, Mohamed F. Mokbel3.

Slides:



Advertisements
Similar presentations
Model-Based Query Processing Over Uncertain Data (in ICDE 2011) Raw Sensor Data Inference of time-varying probability distributions Creating probabilistic.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Solving Manufacturing Equipment Monitoring Through Efficient Complex Event Processing Tilmann Rabl, Kaiwen Zhang, Mohammad Sadoghi, Navneet Kumar Pandey,
Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
Kien A. Hua Division of Computer Science University of Central Florida.
Panoptes: A Scalable Architecture for Video Sensor Networking Applications Wu-chi Feng, Brian Code, Ed Kaiser, Mike Shea, Wu-chang Feng (OGI: The Oregon.
Dagstuhl Seminar 10042, Demetris Zeinalipour, University of Cyprus, 26/1/2010 NSF Workshop on Sustainable Energy Efficient Data Management (SEEDM), Arlington,
Sensor Data Management with Model-based View LSIR, EPFL.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Data Mining – Intro.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Toward a Statistical Framework for Source Anonymity in Sensor Networks.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Ch 4. The Evolution of Analytic Scalability
These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Chapter 11 Databases.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
A Unified, Low-overhead Framework to Support Continuous Profiling and Optimization Xubin (Ben) He Storage Technology & Architecture Research(STAR)
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
Grid Failure Monitoring and Ranking using FailRank Demetris Zeinalipour (Open University of Cyprus) Kyriacos Neocleous, Chryssis Georgiou, Marios D. Dikaiakos.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Dependable Technologies for Critical Systems Copyright Critical Software S.A All Rights Reserved. Handling big dimensions in distributed data.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Hierarchical Management Architecture for Multi-Access Networks Dzmitry Kliazovich, Tiia Sutinen, Heli Kokkoniemi- Tarkkanen, Jukka Mäkelä & Seppo Horsmanheimo.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Mary Ganesan and Lora Strother Campus Tours Using a Mobile Device.
The Application of Big Data and Learning Analytics in Universities Management: Practices from the Open University of China 魏顺平 博士 Shunping Wei May. 22nd,
Personal Home Healthcare System for the Cardiac Patient of Smart City Using Fuzzy Logic Shijia Liu.
Efficient Exploration of Telco Big Data with Compression and Decaying
Data Mining – Intro.
SAS users meeting in Halifax
Towards Real-Time Road Traffic Analytics using Telco Big Data
Database management system Data analytics system:
Golrezaei, N. ; Molisch, A.F. ; Dimakis, A.G.
Big Data Intro.
Ishan Sharma Abhishek Mittal Vivek Raj
Algorithms for Big Data Delivery over the Internet of Things
FailRank: Towards a Unified Grid Failure Monitoring and Ranking System
Meng Lu and Edzer Pebesma
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
SpatialHadoop: A MapReduce Framework for Spatial Data
Class project by Piyush Ranjan Satapathy & Van Lepham
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Spatio-Temporal Query Processing in Smartphone Networks
FMS: Managing Crowdsourced Indoor Signals with the Fingerprint Management Studio Marileni Angelidou1, Constantinos Costa1, Artyom Nikitin2, Demetrios.
Spatial Online Sampling and Aggregation
Telco Big Data (TBD) Example:
Ch 4. The Evolution of Analytic Scalability
Managing batch processing Transient Azure SQL Warehouse Resource
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
Parallel Analytic Systems
Speaker: Jin-Wei Lin Advisor: Dr. Ho-Ting Wu
Big Data Young Lee BUS 550.
Chapter 13 The Data Warehouse
Zoie Barrett and Brian Lam
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Online Analytical Processing Stream Data: Is It Feasible?
Spatio-Temporal Histograms
Carlos Ordonez, Javier Garcia-Garcia,
Dr. Risil Chhatrala Dept of E&TC I²IT, Pun
CELTIC-NEXT Event 20th June 2019, Valencia
Dr. Kunal Mankodiya University of Rhode Island, USA
Hao Hu, Luo Qi, Fazhi Qi IHEP 22 Mar. 2018
Reducing Probe Responses for faster AP discovery
Presentation transcript:

S SPATE: Compacting and Exploring Telco Big Data Constantinos Costa1 , Georgios Chatzimilioudis1, Demetris Zeinalipour-Yazti2,1, Mohamed F. Mokbel3 University of Cyprus1 Max Planck Institute for Informatics2 University of Minnesota3 Motivation SPATE Architecture Storage layer Compression logic Indexing layer Minimizing the query response time Gradual decay of the data Application layer Querying module (SQL-LIKE) User interface (Interactive Map) The expansion of mobile networks and IoT (IP-enabled hardware) have contributed to an explosion of data inside Telecommunication Companies (Telcos) Telco Data: Traditional source for OLAP Data Warehouses / Analytics. e.g., Accounting, Billing, Session data. Problem: Not enough resolution for enabling next generation applications: e.g., churn prediction, 5G network optimization, user-experience assessment, traffic mapping. Telco Big Data (TBD): Velocity data generated at the cell towers. e.g., signal strength, call drops, measurements. Size: ~5TBs/day for a 10M clientele (i.e., 2PB/year). Storage layer Indexing layer Application layer We empirically assessed appropriate compression libraries for TBD. Microbenchmark Setting: Anonymized Real Telco Trace 5GB (1.7M CDR and 21M NMS from 300K users) On top of an HDFS v2.5.2 filesystem Our index has 4 levels of temporal resolutions epoch (30 minutes), day, month, year), with The decaying process retains predefined aggregates for: extremely long time windows small amounts of storage SPATE UI: A spatio-temporal telco data exploration user interface we developed on top of Google Maps. SPATE UI Video : https://goo.gl/BNqHFV Libraries\ Objectives GZIP 7z SNAPPY ZSTD Compression Ratio (rc) 9.06 11.75 4.94 9.72 Compress. Time (Tc1) in sec 21.37 20.99 21.39 21.07 Decompr. Time (Tc2) in sec 0.11 0.12 0.13 Experimental Evaluation Conclusions Tasks: T1: Equality T2: Range T3: Aggregate T4: Join T5: Privacy T6: Multivariate Statistics T7: Clustering T8: Linear Regression Conclusion: SPATE retains comparative query response time with 1 order less data! We proposed SPATE: SPATE allows using 1 order of magnitude less storage space. SPATE retains the query response time for a variety of data exploration queries. In the future: Event detection mechanism . Smart city applications on top of the SPATE instrument. Compared Frameworks: RAW: stores snapshots on disk w/out compression or decay SHAHED: stores the snapshot using a ST-index [ICDE’15] SPATE: the proposed framework in this work. Reference: "Efficient Exploration of Telco Big Data with Compression and Decaying", C. Costa, G. Chatzimilioudis, D. Zeinalipour-Yazti, M. F. Mokbel, Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE'17), San Diego, CA, USA, April 19-22, 2017. Contact: dmsl@cs.ucy.ac.cy http://dmsl.cs.ucy.ac.cy/