Leveraging Big Data Analytics for Cache-Enabled Wireless Networks

Slides:



Advertisements
Similar presentations
Big Data, Bigger Data & Big R Data Birmingham R Users Meeting 23 rd April 2013 Andy Pryke
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
SDN + Storage.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
1 LAN Traffic Measurements Carey Williamson Department of Computer Science University of Calgary.
1 CAPS: A Peer Data Sharing System for Load Mitigation in Cellular Data Networks Young-Bae Ko, Kang-Won Lee, Thyaga Nandagopal Presentation by Tony Sung,
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Tracking Pedestrians Using Local Spatio- Temporal Motion Patterns in Extremely Crowded Scenes Louis Kratz and Ko Nishino IEEE TRANSACTIONS ON PATTERN ANALYSIS.
Authors: Ing-Ray Chen Weiping He Baoshan Gu Presenters: Yao Zheng.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Data Analysis using Java Mobile Agents Mark Dönszelmann, Information, Process and Technology Group, IT, CERN ATLAS Software Workshop Analysis Tools Meeting,
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
1 [3] Jorge Martinez-Bauset, David Garcia-Roger, M a Jose Domenech- Benlloch and Vicent Pla, “ Maximizing the capacity of mobile cellular networks with.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Kenza Hamidouche, Mérouane Debbah
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
The IEEE International Conference on Cluster Computing 2010
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Video Caching in Radio Access network: Impact on Delay and Capacity
DMAP: integrated mobility and service management in mobile IPv6 systems Authors: Ing-Ray Chen Weiping He Baoshan Gu Presenters: Chia-Shen Lee Xiaochen.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
Next Generation of Apache Hadoop MapReduce Owen
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Matrix Multiplication in Hadoop
Social and Spatial Proactive Caching for Mobile Data Offloading IEEE International Conference on Communications (ICC) – W3: Workshop on Small Cell and.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Azary Abboud, Ejder Baştuğ, Kenza Hamidouche, and Mérouane Debbah
Authors: Jiang Xie, Ian F. Akyildiz
Chapter 2 Memory and process management
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Wonkwang Shin, Byoung-Yoon Min and Dong Ku Kim
Golrezaei, N. ; Molisch, A.F. ; Dimakis, A.G.
Ioannis E. Venetis Department of Computer Engineering and Informatics
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Distributed Network Traffic Feature Extraction for a Real-time IDS
Big Data Caching for Networking: Moving from Cloud to Edge
Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka
Dynamic Graph Partitioning Algorithm
Floating Content in Vehicular Ad-hoc Networks
Oracle SQL*Loader
Hadoop Clusters Tess Fulkerson.
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Introduction to Apache
Overview of big data tools
Big Data, Bigger Data & Big R Data
COMP755 Advanced Operating Systems
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
SANDIE: Optimizing NDN for Data Intensive Science
Presentation transcript:

Leveraging Big Data Analytics for Cache-Enabled Wireless Networks Manhal Abdel Kader, Ejder Bastug, Mehdi Bennis, Engin Zeydan, Alper Karatepe, Ahmet Salih Er and Mérouane Debbah 2015 IEEE Globecom Workshops (GC Wkshps)

Outline Introduction Network Model Big Data Platform Numerical Results Conclusions

Introduction Big data The edge of network Proactive caching Spatio-temporal behaviour of users under high data sparsity, large amount of users and content catalog [3] J. K. Laurila, D. Gatica-Perez, I. Aad, B. J., O. Bornet, T.-M.-T. Do, O. Dousse, J. Eberle, and M. Miettinen, “The Mobile Data Challenge: Big Data for Mobile Computing Research,” in Pervasive Computing, 2012.

Network Model M small base stations N user terminals Wired backhaul link of capacity Cm Mbyte/s Wireless link with total capacity of C’m Mbyte/s Cm < C’m (SBSs are densely deployed)

Network Model A content library F Each content f has a size of L(f) Mbyte (Lmin < L(f) < Lmax) a bitrate requirement of B(f) Mbyte/s (Bmin < B(f) < Bmax) The content demand follow a Zipf-like distribution

Network Model A request d ∈ D is immediately served and is called satisfied if where τ(fd) and τ’(fd) represent the arrival and end time of delivery

Network Model The average request satisfaction ratio is where 1{...} is the indicator function which takes 1 if the statement holds and 0 otherwise

Network Model The instantaneous backhaul rate at time t for the request d is given by Rd(t) ≤ Cm Mbyte/s, ∀m ∈ M The average backhaul load is

Network Model We prefetch the strategic contents at the SBSs, aiming to obtain higher satisfaction ratio and less backhaul load

Network Model

Network Model The cache decision matrix of SBSs is X(t) ∈ {0, 1}M×F, where the entry xm,f(t) takes 0 if the f-th content is not cached at m-th SBS at time t, and 1 otherwise Pmn,f(t) represents the probability of requesting the content f by user n at time t

Network Model Each SBS has a limited storage capacity of Sm R’d(t) Mbyte/s represents the instantaneous rate of wireless link for request d ηmin describes the minimum target satisfaction ratio

Network Model A joint optimization of cache decision matrix X(t) and content popularity matrix estimation Pm(t) is required Large number of users with unknown ratings and big library size have to be considered Enable SBSs and/or their central entity to track, identify, analyse and predict the sparse content popularity/rating matrix Pm(t)

Network Model We suppose that the cache placement is done during peak-off hours, therefore X(t) is represented by a cache decision matrix X which remains static during in peak hours We consider that all SBSs have an identical and stationary content popularity matrix over T time slots, thus we represent Pm(t) by P

Network Model If we have sufficient amount of users’ ratings, we can approximately design a k- rank popularity matrix P ≈ NTF, where N ∈ Rk×N and F ∈ Rk×F are the factor matrices obtained from joint learning

Network Model where the vectors ni and fj describe the i- th and j-th columns of N and F matrices respectively ||.||2F is the Frobenius norm μ is to provide a balance between fitting training data and regularization

Network Model In order to estimate P in (6), we use a big data platform We then store contents at the cache- enabled base stations whose cache decisions are represented by X

Network Model

Big Data Platform Extract useful information for proactive caching decisions and store users’ traffic data We collect the streaming traces in a server with high speed link of 200 Mbit/sec at peak hours Network traffic is then transferred into the server on the platform

Hadoop platform Implementation and execution of distributed applications on large clusters Map-Reduce enables execution of an application in parallel, by dividing it into many small fragments of jobs on multiple nodes Hadoop Distributed File System (HDFS) is in charge of handling the data storage across the clusters

Hadoop platform We use a platform based on Cloudera’s Distribution Including Apache Hadoop (CDH4) [13] version on four nodes Computations powers corresponding to each node with INTEL Xeon CPU E5- 2670 running @2.6 GHz, 32 Core CPUs, 132 GByte RAM, 20 TByte hard disk [13] “Cloudera,” http://goo.gl/dfkWib, 2015, [Online; accessed 02-April-2015].

Data Extraction Process

Traffic Characteristics

Traffic Characteristics

Numerical Results

Methods Ground Truth: Collaborative Filtering: Content popularity matrix P is constructed by considering all available information in traces- table. This matrix has 6.42% of rating density. Collaborative Filtering: For estimation of P, 10% of ratings in traces-table are chosen uniformly at random and given to collaborative filtering (CF) for training. Then, the remaining ratings are predicted via regularized singular value decomposition (SVD) [15]. [15] J. Lee, M. Sun, and G. Lebanon, “A comparative study of collaborative filtering algorithms,” [Online] arXiv: 1205.3193, 2012.

Users’ Request Satisfaction

Backhaul Usage

Impact of Rating Density

Conclusions Demonstrate the collection/extraction process of our real traces on a big data platform Make use of machine learning tools for content popularity estimation

Conclusions Provide a more detailed characterization of the traffic Design of novel machine learning tools Design of deterministic/randomized cache placement algorithms