Presentation is loading. Please wait.

Presentation is loading. Please wait.

BD-CACHE Big Data Caching for Datacenters

Similar presentations


Presentation on theme: "BD-CACHE Big Data Caching for Datacenters"— Presentation transcript:

1 BD-CACHE Big Data Caching for Datacenters
MOC Boston University, Northeastern University Intel, Brocade, Lenovo, Red Hat

2 Network Bottlenecks

3 Solution? CACHING PREFETCHING
Characteristics of the Running Applications : High data input reuse Uneven Data Popularity Sequential Access CACHING PREFETCHING

4 Where to place SSD’s for efficient Caching?
Compute Cluster Storage Cluster Bandwidth bottleneck Node Rack Node Rack ….. SSD Storage Cluster

5 Where to place SSD’s for efficient Caching?
Compute Cluster Storage Cluster Bandwidth bottleneck Rack Level (Per Rack) Reduce backend traffic Node Rack Node Rack ….. SSD SSD Storage Cluster

6 Our Architecture Anycast Network Solution
Cache Nodes are placed per rack Using Intel NVMe-SSDs. Two level caching architecture L1 Cache: Rack Local reduces inter rack traffic, consistent hash algorithm L2 Cache: Cluster Locality reduces traffic between the clusters and the back- end storage Anycast Network Solution Node Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER

7 Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER
Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 A1 File

8 Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER
Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 A1 File

9 Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER
Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 3 A1 File

10 Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER
Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 4 2 A1 3 4 A1 File

11 Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER
Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER A1 1 5 A1 4 2 A1 3 4 A1 File

12 IMPLEMENTATION Two level caching mechanism implemented by modifying the original CEPH Rados Gateway. BD-Cache supports read/write traffics but only cache on read operations. Data stored in SSD. Logically separated L1-Cache and L2-cache, Share the same physical cache infrastructure.

13 Experimental configurations:
Methodology Experimental configurations: Unmodified-RGW Cache-RGW Ceph cluster 10 Lenovo storage nodes, each has 9 HDDs Cache node 2 x 1.5TB Intel NVMe-SSD, 128 GB DRAM, RAID0 Requests Concurrent connections request 4GB files.

14 CACHE MISS PERFORMANCE
Cache-RGW imposes no overhead

15 CACHE HIT PERFORMANCE Cache-RGW saturates SSD.
Caching improves the read performance significantly. Cache-RGW saturates SSD.

16 Future Works Evaluate caching architecture by benchmarking real-world workloads. Prefetching Cache replacement algorithms Enable Caching on write operations Project Webpage: Github Repo for Cache-RGW Code:

17 Bandwidth and latency issue on the backend storage
Summary Bandwidth and latency issue on the backend storage Reusability and prefetching are observed in workloads Caching is a solution Proposed two level caching on the rack side L1 Cache: Rack local L2 Cache: Cluster locality Initial Results Negligible overhead of the Cache-RGW ~50% to ~300% improvement compared to Vanilla-RGW.


Download ppt "BD-CACHE Big Data Caching for Datacenters"

Similar presentations


Ads by Google