Introduction to new high performance storage engines in MongoDB 2.8

Name: Introduction to new high performance storage engines in MongoDB 2.8
Uploaded: 2017-09-06T06:31:33+00:00
Duration: PTM14S52
Channel: Violet Robbins
Description: Introduction to new high performance storage engines in MongoDB 2.8

Introduction to new high performance storage engines in MongoDB 2.8
3.0 Henrik Ingo Solutions Architect, MongoDB

Hi, I am Henrik Ingo @h_ingo

Introduction to new high performance storage engines in MongoDB 2.8
3.0 Agenda: - MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance

Most popular NoSQL database

5 NoSQL categories Redis, Riak Cassandra Key Value Wide Column
Document Map Reduce Graph Neo4j Hadoop

MongoDB is a Document Database
Rich Queries Find Paul’s cars Find everybody in London with a car built between 1970 and 1980 { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: , … }, { model: ‘Rolls Royce’, year: 1965, value: , … } } Geospatial Find all of the car owners within 5km of Trafalgar Sq. Text Search Find all the cars described as having leather seats Aggregation Calculate the average value of Paul’s car collection Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)

Operational Database Landscape
Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance. NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model. MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance. Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section. May need to revise the graphic – either remove the line or all points should be on the line. To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc. Document data model is good segue to next section >> Data Model

MongoDB 3.0 & storage engines

Current state in MongoDB 2.6
Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...

Current state in MongoDB 2.6
How to get all of the above? Current state in MongoDB 2.6 Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...

MongoDB 3.0 Storage Engine API
Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party

MongoDB 3.0 Storage Engine API
One at a time: Many engines built into mongod Choose 1 at startup All data stored by the same engine Incompatible on-disk data formats (obviously) Compatible client API Compatible Oplog & Replication Same replica set can mix different engines No-downtime migration possible Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party

Some existing engines MMAPv1 WiredTiger RocksDB TokuMXse
Improved MMAP (collection-level locking) WiredTiger Discussed next RocksDB LSM style engine developed by Facebook Based on LevelDB TokuMXse Fractal Tree indexing engine from Tokutek

Some rumored engines Heap Devnull
In-memory engine Devnull Write all data to /dev/null Based on idea from famous flash animation... Oplog stored as normal SSD optimized engine (e.g. Fusion-IO) KV simple key-value engine

WiredTiger

What is WiredTiger Modern NoSQL database engine
flexible schema Advanced database engine Secondary indexes, MVCC, non-locking algorithms Multi-statement transactions (not in MongoDB 3.0) Very modular, tunable Btree, LSM and columnar indexes Snappy, Zlib, 3rd-party compression Index prefix compression, etc... Built by creators of BerkeleyDB Acquired by MongoDB in 2014 source.wiredtiger.com

Choosing WiredTiger at server startup
mongod --storageEngine wiredTiger

Main tunables exposed as MongoDB options
mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --syncDelay 30

All WiredTiger options via configString (hidden)
mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)" --wiredTigerCollectionConfigString "block_compressor=zlib" --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes See docs for wiredtiger_open() & WT_SESSION::create()

Also via createCollection(), createIndex()
db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )

More... db.serverStatus() db.collection.stats()

Understanding and Optimizing WiredTiger

Understanding WiredTiger architecture
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk

Covering 90% of your optimization needs
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib Decompression time OS Disk Cache (Default: 50%) Physical disk Disk seek time

Strategy 1: fit working set in Cache
Btree LSM Columnar cache_size = 80% WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk

Strategy 2: fit working set in OS Disk Cache
Btree LSM Columnar WiredTiger SE Cache (default: 50%) cache_size = 10% None Snappy Zlib OS Disk Cache (Remaining: 90%) OS Disk Cache (Default: 50%) Physical disk

Strategy 3: SSD disk + compression to save €
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk

Strategy 4: SSD disk (no compression)
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk

What problem is solved by LSM indexes?
Easy: No indexes Easy: Add indexes Performance Hard: Smart schema design (hire a consultant) LSM index structures (or columnar) Fast reads Both Fast writes

2B inserts (with 3 secondary indexes)

Introduction to new high performance storage engines in MongoDB 2.8

Similar presentations

Presentation on theme: "Introduction to new high performance storage engines in MongoDB 2.8"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to new high performance storage engines in MongoDB 2.8

Similar presentations

Presentation on theme: "Introduction to new high performance storage engines in MongoDB 2.8"— Presentation transcript:

Similar presentations

About project

Feedback