BD-CACHE Big Data Caching for Datacenters MOC Boston University, Northeastern University Intel, Brocade, Lenovo, Red Hat
Network Bottlenecks
Solution? CACHING PREFETCHING Characteristics of the Running Applications : High data input reuse Uneven Data Popularity Sequential Access CACHING PREFETCHING
Where to place SSD’s for efficient Caching? Compute Cluster Storage Cluster Bandwidth bottleneck Node Rack Node Rack ….. SSD Storage Cluster
Where to place SSD’s for efficient Caching? Compute Cluster Storage Cluster Bandwidth bottleneck Rack Level (Per Rack) Reduce backend traffic Node Rack Node Rack ….. SSD SSD Storage Cluster
Our Architecture Anycast Network Solution Cache Nodes are placed per rack Using Intel NVMe-SSDs. Two level caching architecture L1 Cache: Rack Local reduces inter rack traffic, consistent hash algorithm L2 Cache: Cluster Locality reduces traffic between the clusters and the back- end storage Anycast Network Solution Node Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER
Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 A1 File
Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 A1 File
Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 2 3 A1 File
Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER 1 4 2 A1 3 4 A1 File
Work Flow Compute Cluster Node L1 CACHE L2 CACHE STORAGE CLUSTER Rack 1 L1 CACHE CACHE NODE 1 Rack 2 CACHE NODE 2 Rack N CACHE NODE N L2 CACHE Compute Cluster STORAGE CLUSTER A1 1 5 A1 4 2 A1 3 4 A1 File
IMPLEMENTATION Two level caching mechanism implemented by modifying the original CEPH Rados Gateway. BD-Cache supports read/write traffics but only cache on read operations. Data stored in SSD. Logically separated L1-Cache and L2-cache, Share the same physical cache infrastructure.
Experimental configurations: Methodology Experimental configurations: Unmodified-RGW Cache-RGW Ceph cluster 10 Lenovo storage nodes, each has 9 HDDs Cache node 2 x 1.5TB Intel NVMe-SSD, 128 GB DRAM, RAID0 Requests Concurrent connections request 4GB files.
CACHE MISS PERFORMANCE Cache-RGW imposes no overhead
CACHE HIT PERFORMANCE Cache-RGW saturates SSD. Caching improves the read performance significantly. Cache-RGW saturates SSD.
Future Works Evaluate caching architecture by benchmarking real-world workloads. Prefetching Cache replacement algorithms Enable Caching on write operations Project Webpage: http://info.massopencloud.org/blog/bigdata-research-at-moc Github Repo for Cache-RGW Code: https://github.com/maniaabdi/engage1
Bandwidth and latency issue on the backend storage Summary Bandwidth and latency issue on the backend storage Reusability and prefetching are observed in workloads Caching is a solution Proposed two level caching on the rack side L1 Cache: Rack local L2 Cache: Cluster locality Initial Results Negligible overhead of the Cache-RGW ~50% to ~300% improvement compared to Vanilla-RGW.