Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Overheads of Storage in Ceph

Similar presentations


Presentation on theme: "Software Overheads of Storage in Ceph"— Presentation transcript:

1 Software Overheads of Storage in Ceph
Akila Nagamani Aravind Soundararajan Krishnan Rajagopalan

2 Motivation I/O was the main bottleneck for applications
Has been the topic of research for decades Now is the era of faster storage 100X faster than the traditional spinning disks Where should the next focus be ? What is the dominant part now ? Software aka Kernel

3 What overheads exist? The processing in the software layers of storage introduce overheads. A “write” given in an application is not directly issued to the device A software storage layer called “File System” processes it The filesystem operations include Metadata management Journaling / Logging Compaction / Compression Caching Distributed data management

4 Why BlueStore? Why not FileStore?
Filestore has the problem of double writes. Every write is journaled by Ceph backend for consistency. The underlying filesystem performs additional journaling. Hence, an analysis of software overheads would point to the “journal of journals” problem which is already evident. BlueStore does not have this problem The objects/data are directly written onto raw block storage. Ceph metadata is written to RocksDB using a lightweight BlueFS. BlueStore has relatively lesser intermediate software layers to storage and hence studying the overheads has the potential for interesting inferences.

5 Write data flow in BlueStore
BlueStore.kv_queue OSD::ShardedOpWQ BlueStore.kv_sync_thread WAL Yes RocksDB metadata write No No Write data to disk, async WAL Yes No WAL BlueStore.finishers Yes BlueStore.bdev_aio_thread Client BlueStore::WALWQ Apply WAL to disk, async

6 Experiment We used ‘perf’ tool to collect the traces of our write operation and ‘FlameGraph’ for the analysis of our traces. The write workload size was varied from 4K upto 512K. We used ramdisk to simulate an infinitely fast storage. Single cluster configuration was used Network processing overheads are not considered. No replication for objects We don’t aim to study reliability We categorize each of the major functions encountered in our traces into two sets. (Motivated by the paper “The Importance! of Non-Data Touching Processing Overheads in TCP/IP”) Data touching: Methods that depend on the input data size. Non - data touching: Methods that are independent of input data size.

7 Flame Graph for a write size of 4K

8 Major functions identified by the Flamegraph
The time spent by the “Swapper” is very high in case of HDD. This shows that Ceph performs synchronous IO - makes sense to achieve “reliability” The time spent in “Swapper” can be viewed as the IO time.

9 Non data touching overhead - Journaling
Assumption: Ceph journals metadata and it shouldn’t depend on data size Expectation: Constant time for any data size Observation: Two regions of constant time < 64 K: longer time in journaling Data journaling happens for small writes < min_alloc_size

10 Non data touching overhead
Ceph performs other non-data touching operations RocksDB compaction Socket overhead These overheads were too small to be analyzed.

11 Data touching overhead
Ceph performs the following operations on data CRC calculation Zero-filling These overheads were too small (0.01 % of time) to be analyzed Problem for data > 5 MB in ramdisk (1 GB) Sage feels that the backend is saturated.

12 Conclusion Ceph Bluestore is better tuned for faster storage than Filestore. Journaling is the major software overhead added by the storage layer. The only way to avoid it is by trading consistency -> Might not be suitable for Ceph Data touching overheads in Ceph are very small. Storage layer has more non data touching overheads than data touching overheads. This is in contrast to the network layer The extra overhead caused by software storage layer is to give additional consistency and reliability guarantees. Could be avoided if unnecessary

13 THANK YOU ! QUESTIONS?


Download ppt "Software Overheads of Storage in Ceph"

Similar presentations


Ads by Google