Download presentation
Presentation is loading. Please wait.
1
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri
2
Motivation Large-scale data-intensive applications like high performance key-value storage systems are being used by Facebook, LinkedIn, Amazon with more regularity. Being I/O, Requiring RA over large DB, performing parallel, concurrent and mostly independent operations, requiring large clusters and storing small sized objects are several common features these workloads share. System performance: queries/sec Energy efficiency: queries/joule CPU performance and I/O bandwidth Gap : For data intensive computing workloads, storage, network and memory bandwidth bottlenecks lead to low CPU utilization Solution: wimpy processors to reduce I/O induced idle cycles CPU Power consumption: operating processors at higher freq requires more energy. techniques to mask CPU bottleneck cause energy inefficiency branch prediction, speculative execution – more processor die area Solution: slower CPUs execute more instructions per joule 1 billion vs. 100 million instructions per Joule
3
FAWN Efficient – 1W at heavy load Vs 10W at load Fast random reads – up to 175 times faster Slow random writes – updating a single page means erasing an entire block before writing the modified block in its place Cluster of embedded CPUs using flash storage Efficient – 1W at heavy load Vs 10W at load Fast random reads – up to 175 times faster Slow random writes – updating a single page means erasing an entire block before writing the modified block in its place FAWN-KeyValue nodes organized into a ring using consistent Hashing physical node is a collection of virtual node FAWN-DS Log structured key-value stores contains values for key range associated with VID
4
FAWN - DS Uses as in-memory Hash Index to map 160-bit key to a value stored in the data log stores only a fragment of the actual key. Hash Index bucket = i low order index bits key fragment = next 15 low order bits Each bucket -6 bytes - stores frag, valid bit and 4-byte pointer
5
FAWN - DS Basic Functions: Store Lookup Delete Concurrent operations Virtual Node Maintenance: Split Merge Compact
6
FAWN-KV organizes the back-end VIDs into a storage ring- structure using consistent hashing Management node assigns each front-end to circular key space Front-end node manages fraction of key-space manages the VID membership list forwards out-of-range request Back-end nodes – VIDs owns a key range contacts front-end when joining FAWN - KV
7
Chain replication FAWN - KV
8
Join split key range pre-copy chain insertion log flush Leave merge key range Join into each chain FAWN - KV
9
Individual Node Performance Lookup speed Bulk store speed: 23.2 MB/s, or 96% of raw speed
10
Individual Node Performance Put speed Compared to BerkeleyDB: 0.07 MB/s – shows necessity of log-based filesystems
11
Individual Node Performance Read- and write-intensive workloads
12
System Benchmarks System throughput and power consumption
13
Impact of Ring Membership Changes Query throughput during node join and maintenance operations
14
Alternative Architectures Large Dataset, Low Query → FAWN+Disk number of nodes dominated by storage capacity per node has the lowest total cost per GB Small Dataset, High Query → FAWN+DRAM number of nodes dominated by per node query capacity has the lowest cost for queries/sec Middle Range → FAWN+SSD best balance of storage capacity, query rate and total cost
15
Conclusion Fast and energy efficient processing of random read- intensive workloads Over an order of magnitude more queries per Joule than traditional disk-based systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.