Download presentation
Presentation is loading. Please wait.
Published byBrayan Drury Modified over 10 years ago
1
Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir
2
GOAL Explore the feasibility of a distributed caching mechanism inside Hadoop
3
Presentation Overview Motivation Design Experimental Results Future Work
4
Motivation Disk Access Times are a bottleneck in cluster computing Large amount of data is read from disk DARE RAMClouds PACMan – Coordinated Cache Replacement We want to strike a balance between RAM and Disk Storage
5
Our Approach Integrate Memcached with Hadoop Used Quickcached and Spymemcached Reserve a portion of the main memory at each node to serve as local cache Local caches aggregate to abstract a distributed caching mechanism governed by Memcached Greedy caching strategy Least Recently Used (LRU) cache eviction policy
6
Design Overview
7
Memcached
8
Design Choice 1 Simultaneous requests to Namenode and Memcached Minimizes access latency with additional network overhead
9
Design Choice 2 Send request to Namenode only in the case of a cache miss Minimizes network overhead with increased latency
10
Design Choice 3 Datanodes send requests only to Memcached Memcached checks for cached blocks If cache miss occurs, it contacts the namenode and returns the replicas addresses to the datanodes
11
Global Cache Replacement LRU based Global Cache Eviction Scheme
12
Prefetching
13
Simulation Results Test data ranging from 2GB to 24GB Word Count and Grep
14
Word Count
16
Grep
18
Future Work Implement a pre-fetching mechanism Customized caching policies based on access patterns Compare and contrast caching with locality aware scheduling
19
Conclusion Caching can improve the performance of cluster based systems based on the access patterns of the workload being executed
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.