Efficient data maintenance in GlusterFS using databases Joseph Fernandes Dan Lambright
Who we are ? Joseph Fernandes (Senior Engineer, Red Hat Storage) Dan Lambright (Principle Engineer, Red Hat Storage)
Agenda Quick GlusterFS Overview Data Maintenance Challenges Existing Solutions Proposed Solution : Optimized Database Case study : GlusterFS Data Cache Tier Lesson learned What's next
What is GlusterFS Distributed File System Software Define NAS TCP/IP or RDMA Native Client, SMB, NFS
What is Data Maintenance Maintenance tasks performed on data for protection, performance, and optimum storage utilization
Challenges in Data maintenance Data Maintenance has a overhead on CPU, Memory, Storage, Network.. Therefore.. Fast Search Rich Metadata Distribute Load balancing Search should be precise and fast Should have rich metadata filter : Modification Frequency, IO Sizes etc Should deal with distributed nature of data Should do load balancing
Existing Solutions File system crawl File system log Metadata databases In-memory inode caches File system crawl : Slow File system log : Write fast, Slow read and more space Metadata databases: Gluster doesnot have one In-memory inode caches: Not Durable
Optimized DB for GlusterFS Proposed Optimized DB for GlusterFS
Optimized DB for GlusterFS “ Record now , consume later” Database optimized to record fast Good Querying Capabilities Embedded Database
LibgfDB API Abstraction Rich Search Filters Non Centralized Performance optimization options API Abstraction : Any DB Rich Search Filters : Frequency Counters, Size of IO counters, Parts of File meta etc Non Centralized : local to bricks Performance optimization options
Gluster Client Data Maintenance Scanners IO Query LIBGFDB Gluster Brick DataStore Insert / Update CTR Xlator Posix Xlator LIBGFDB
Datastore Optimization: Sqlite3 PRAGMA page_size: Align page size PRAGMA cache_size: Increased cache size PRAGMA journal_mode: Change to WAL PRAGMA wal_autocheckpoint : Less often autocheck PRAGMA synchronous : Set to NORMAL PRAGMA auto_vacuum : Set to NONE
DataStore Optimization: Sqlite3 Buffer cache Insert/Update Shared Memory File Sync Write Ahead Logging (WAL) Checkpoint Database file
Cache Tiering (Gluster 3.7 feature) logical volume composed of diverse storage units Secure / nonsecure, compressed / uncompressed, etc. Cache tiering Fast storage as cache for slow storage Fa$t SSD, slow HDD Fast 2X replicated, slow erasure coded What goes in the cache? DB tracks usage patterns Files migrate between tiers per usage Migration is slow
Policies for Smart Migration File size Sequential vs. random Access rate Migration frequency Break files into chunks Gluster “sharding” feature
Gluster implementation New volume type: tier Attach / detach hot bricks to existing volumes Migration uses existing mechanisms Tweaks to Distributed Hash Table (DHT) Old DHT: destination node = hash(file+path) New: Always try hot tier first Hot tier may be multiple bricks. Which brick on tier? Choose with old DHT algorithm “Stacking DHT”
Other Client Xlator Tier Xlator HOT DHT COLD DHT Replication Xlator HOT Tier COLD Tier Other Server Xlator Other Server Xlator Demotion CTR Xlator CTR Xlator POSIX Xlator POSIX Xlator Brick Storage Brick Storage Heat Data Store Promotion Heat Data Store
Benchmarking: how well does it work? Many benchmarks a poor fit for tiering Cache miss triggers migration - costly Tiering needs stable workloads Data stays in hot tier for hours or longer e.g. a set of videos popular for several days New benchmarking tool Can use with dm-cache, Ceph tiering, … DB results Scalability problems
Lesson Learned : DB updates can be expensive DB query may have scalability problems Durability (ACID semantics) is expensive Updates can be Expense: Read + modify + updates Scalability Issues: Since Single files and WAL complex queries can be slow Durable Metadata: Not Suited for durable metadata
What's next: Libgfdb Performance options : PLog Sqlite3 Database Sharding Ceph Tier Implementation: Bloom Filters
Feature Page http://www.gluster.org/community/documentation/index.php/Featur es/data-classification Gluster Forge: https://forge.gluster.org/data-classification Email: Joseph Fernandes <josferna@redhat.com> Dan Lambright <dlambrig@redhat.com>
THANK YOU