Download presentation
Presentation is loading. Please wait.
Published byTanner Hillman Modified over 9 years ago
1
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder
2
Paper proposes –To replace tape libraries by large non- redundant arrays of disks –To cache on active drives Files that have been recently accessed Update logs for other files –To keep other drives mostly inactive by spinning them down between accesses Highlights
3
Introduction (I) Robotic tape libraries are now the standard solution for archiving very large amounts of data Disadvantages include –Slow access times: average search time of 41s for T9940 drives –Not much cheaper than disk drives Could we replace tem by massive arrays of hard drives?
4
Introduction (II) Major limitation of hard drive solution is power consumption –Almost ten times that of equivalent tape library Could power down disks that are not currently accessed –50% of data are likely to be never accessed –25% of data are likely to be accessed once
5
Introduction (III) Must be at least as reliable as tape libraries –No need to use a redundant scheme Solution is Massive Array of Inactive Drives Paper investigates design issues through trace- driven simulations
6
Design Issues Two major design decisions – Data migration or duplication ( caching ) – File system or block-level interface
7
Migration or caching Migration would move “hot” data to active drives Migration uses disk space more efficiently Requires a map or directory mechanism that maps the storage across all drives Caching would cache read data and act as a write log for write data Keeps two copies of all cached files Maps or directories are proportional to size of cache
8
File system or block interface Could use file system information to cache entire files Would probably perform better Would require system modifications Would work with existing systems
9
MAID with caching Active drives (always on) Passive drives (spin up/down) Virtualization Manager Cache Manager Passive Drive Manager
10
Design choices (I) Compared MAID-cache and MAID-no cache MAID-cache –Caches read and writes on active drives –Caching unit is “chunk” of 64 sectors –Cache policy is LRU –All writes are placed in the cache write-log where they wait to be committed to the non- active ( passive ) drives
11
Design choices (II) Must always check write log before reading data from the cache or the passive drives –Passive drives remain on standby until A cache miss occurs The write log becomes too long –Return to standby when spin-down inactivity time limit is reached –Varying time limit is primary way to affect system performance and energy consumption
12
Simulation parameters 1.Power management policy : –Always on –Fixed-delay spin-down –Adaptive spin-down 2.Data layout –Linear: keep successive blocks on same drive –Striped: the opposite 3.Caching/No caching
13
Simulation results Based on a supercomputer center workload All MAID configurations achieve similar power consumptions –15 to 16 % of that of always on configuration MAID configurations w/o cache have average response times comparable to that of always on configuration –Workload had little locality
14
Simulation results (II) Average response times of MAID configurations with cache much worse than that of always on configuration –0.680 to 0.720 s compared to 0.303 s Striped configuration with fixed spin-down delay has lowest average response time of all MAID configurations –0.309 s
15
Conclusion MAID can achieve average response times comparable to that of an always on configuration with a much lower power consumption IMPORTANT In a more recent paper, the authors found out that cached configurations worked much better for workloads exhibiting more locality of accesses than their supercomputer center workload
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.