Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to new high performance storage engines in MongoDB 2.8

Similar presentations


Presentation on theme: "Introduction to new high performance storage engines in MongoDB 2.8"— Presentation transcript:

1 Introduction to new high performance storage engines in MongoDB 2.8
3.0 Henrik Ingo Solutions Architect, MongoDB

2 Hi, I am Henrik Ingo @h_ingo

3 Introduction to new high performance storage engines in MongoDB 2.8
3.0 Agenda: - MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance

4 Most popular NoSQL database

5 5 NoSQL categories Redis, Riak Cassandra Key Value Wide Column
Document Map Reduce Graph Neo4j Hadoop

6 MongoDB is a Document Database
Rich Queries Find Paul’s cars Find everybody in London with a car built between 1970 and 1980 { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: , … }, { model: ‘Rolls Royce’, year: 1965, value: , … } } Geospatial Find all of the car owners within 5km of Trafalgar Sq. Text Search Find all the cars described as having leather seats Aggregation Calculate the average value of Paul’s car collection Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)

7 Operational Database Landscape
Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance. NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model. MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance. Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section. May need to revise the graphic – either remove the line or all points should be on the line. To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc. Document data model is good segue to next section >> Data Model

8 MongoDB 3.0 & storage engines

9 Current state in MongoDB 2.6
Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...

10 Current state in MongoDB 2.6
How to get all of the above? Current state in MongoDB 2.6 Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...

11 MongoDB 3.0 Storage Engine API
Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party

12 MongoDB 3.0 Storage Engine API
One at a time: Many engines built into mongod Choose 1 at startup All data stored by the same engine Incompatible on-disk data formats (obviously) Compatible client API Compatible Oplog & Replication Same replica set can mix different engines No-downtime migration possible Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party

13 Some existing engines MMAPv1 WiredTiger RocksDB TokuMXse
Improved MMAP (collection-level locking) WiredTiger Discussed next RocksDB LSM style engine developed by Facebook Based on LevelDB TokuMXse Fractal Tree indexing engine from Tokutek

14 Some rumored engines Heap Devnull
In-memory engine Devnull Write all data to /dev/null Based on idea from famous flash animation... Oplog stored as normal SSD optimized engine (e.g. Fusion-IO) KV simple key-value engine

15 WiredTiger

16 What is WiredTiger Modern NoSQL database engine
flexible schema Advanced database engine Secondary indexes, MVCC, non-locking algorithms Multi-statement transactions (not in MongoDB 3.0) Very modular, tunable Btree, LSM and columnar indexes Snappy, Zlib, 3rd-party compression Index prefix compression, etc... Built by creators of BerkeleyDB Acquired by MongoDB in 2014 source.wiredtiger.com

17 Choosing WiredTiger at server startup
mongod --storageEngine wiredTiger

18 Main tunables exposed as MongoDB options
mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --syncDelay 30

19 All WiredTiger options via configString (hidden)
mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)" --wiredTigerCollectionConfigString "block_compressor=zlib" --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes See docs for wiredtiger_open() & WT_SESSION::create()

20 Also via createCollection(), createIndex()
db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )

21 More... db.serverStatus() db.collection.stats()

22 Understanding and Optimizing WiredTiger

23 Understanding WiredTiger architecture
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk

24 Covering 90% of your optimization needs
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib Decompression time OS Disk Cache (Default: 50%) Physical disk Disk seek time

25 Strategy 1: fit working set in Cache
Btree LSM Columnar cache_size = 80% WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk

26 Strategy 2: fit working set in OS Disk Cache
Btree LSM Columnar WiredTiger SE Cache (default: 50%) cache_size = 10% None Snappy Zlib OS Disk Cache (Remaining: 90%) OS Disk Cache (Default: 50%) Physical disk

27 Strategy 3: SSD disk + compression to save €
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk

28 Strategy 4: SSD disk (no compression)
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk

29 What problem is solved by LSM indexes?
Easy: No indexes Easy: Add indexes Performance Hard: Smart schema design (hire a consultant) LSM index structures (or columnar) Fast reads Both Fast writes

30 2B inserts (with 3 secondary indexes)

31


Download ppt "Introduction to new high performance storage engines in MongoDB 2.8"

Similar presentations


Ads by Google