BD-CACHE Big Data Caching for Datacenters

Slides:



Advertisements
Similar presentations
Scalable Multi-Access Flash Store for Big Data Analytics
Advertisements

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
High Performing Cache Hierarchies for Server Workloads
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
A Status Report on Research in Transparent Informed Prefetching (TIP) Presented by Hsu Hao Chen.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
Evaluation of Delta Compression Techniques for Efficient Live Migration of Large Virtual Machines Petter Svärd, Benoit Hudzia, Johan Tordsson and Erik.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
Improving Network I/O Virtualization for Cloud Computing.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Dilip N Simha, Maohua Lu, Tzi-cher chiueh Park Chanhyun ASPLOS’12 March 3-7, 2012 London, England, UK.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
Load Rebalancing for Distributed File Systems in Clouds.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
Decentralized Distributed Storage System for Big Data Presenter: Wei Xie Data-Intensive Scalable Computing Laboratory(DISCL) Computer Science Department.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Unistore: Project Updates
Clustered Web Server Model
BD-Cache: Big Data Caching for Datacenters
Achieving the Ultimate Efficiency for Seismic Analysis
Efficient data maintenance in GlusterFS using databases
Parallel-DFTL: A Flash Translation Layer that Exploits Internal Parallelism in Solid State Drives Wei Xie1 , Yong Chen1 and Philip C. Roth2 1. Texas Tech.
Diskpool and cloud storage benchmarks used in IT-DSS
Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka
Sebastian Solbach Consulting Member of Technical Staff
Section 7 Erasure Coding Overview
Windows Server* 2016 & Intel® Technologies
Unistore: A Unified Storage Architecture for Cloud Computing
HPE Persistent Memory Microsoft Ignite 2017
PA an Coordinated Memory Caching for Parallel Jobs
Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen
Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems Sungjoon Koh, Jie Zhang, Miryeong.
A Survey on Distributed File Systems
Lecture 11: DMBS Internals
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
The Composite-File File System: Decoupling the One-to-one Mapping of Files and Metadata for Better Performance Shuanglong Zhang, Helen Catanese, Andy An-I.
Unistore: Project Updates
Presenter: Zhengyu Yang
Be Fast, Cheap and in Control
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evaluating Proxy Caching Algorithms in Mobile Environments
A Software-Defined Storage for Workflow Applications
Specialized Cloud Architectures
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Many-Core Graph Workload Analysis
Presentation transcript:

BD-CACHE Big Data Caching for Datacenters MOC Boston University, Northeastern University Intel, Brocade, Lenovo, Red Hat

Network Bottlenecks

Solution? CACHING PREFETCHING Characteristics of the Running Applications : High data input reuse Uneven Data Popularity Sequential Access CACHING PREFETCHING

Where to place SSD’s for efficient Caching? Compute Cluster Storage Cluster Bandwidth bottleneck Node Rack Node Rack ….. SSD Storage Cluster

Where to place SSD’s for efficient Caching? Compute Cluster Storage Cluster Bandwidth bottleneck Rack Level (Per Rack) Reduce backend traffic Node Rack Node Rack ….. SSD SSD Storage Cluster

Our Architecture Anycast Network Solution Cache Nodes are placed per rack Using Intel NVMe-SSDs. Two level caching architecture L1 Cache: Rack Local reduces inter rack traffic, consistent hash algorithm L2 Cache: Cluster Locality reduces traffic between the clusters and the back- end storage Anycast Network Solution Node Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER

Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 A1 File

Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 A1 File

Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 3 A1 File

Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 4 2 A1 3 4 A1 File

Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER A1 1 5 A1 4 2 A1 3 4 A1 File

IMPLEMENTATION Two level caching mechanism implemented by modifying the original CEPH Rados Gateway. BD-Cache supports read/write traffics but only cache on read operations. Data stored in SSD. Logically separated L1-Cache and L2-cache, Share the same physical cache infrastructure.

Experimental configurations: Methodology Experimental configurations: Unmodified-RGW Cache-RGW Ceph cluster 10 Lenovo storage nodes, each has 9 HDDs Cache node 2 x 1.5TB Intel NVMe-SSD, 128 GB DRAM, RAID0 Requests Concurrent connections request 4GB files.

CACHE MISS PERFORMANCE Cache-RGW imposes no overhead

CACHE HIT PERFORMANCE Cache-RGW saturates SSD. Caching improves the read performance significantly. Cache-RGW saturates SSD.

Future Works Evaluate caching architecture by benchmarking real-world workloads. Prefetching Cache replacement algorithms Enable Caching on write operations Project Webpage: http://info.massopencloud.org/blog/bigdata-research-at-moc Github Repo for Cache-RGW Code: https://github.com/maniaabdi/engage1

Bandwidth and latency issue on the backend storage Summary Bandwidth and latency issue on the backend storage Reusability and prefetching are observed in workloads Caching is a solution Proposed two level caching on the rack side L1 Cache: Rack local L2 Cache: Cluster locality Initial Results Negligible overhead of the Cache-RGW ~50% to ~300% improvement compared to Vanilla-RGW.