Download presentation
Published byViolet Robbins Modified over 9 years ago
1
Introduction to new high performance storage engines in MongoDB 2.8
3.0 Henrik Ingo Solutions Architect, MongoDB
2
Hi, I am Henrik Ingo @h_ingo
3
Introduction to new high performance storage engines in MongoDB 2.8
3.0 Agenda: - MongoDB and NoSQL - Storage Engine API - WiredTiger configuration + performance
4
Most popular NoSQL database
5
5 NoSQL categories Redis, Riak Cassandra Key Value Wide Column
Document Map Reduce Graph Neo4j Hadoop
6
MongoDB is a Document Database
Rich Queries Find Paul’s cars Find everybody in London with a car built between 1970 and 1980 { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: , … }, { model: ‘Rolls Royce’, year: 1965, value: , … } } Geospatial Find all of the car owners within 5km of Trafalgar Sq. Text Search Find all the cars described as having leather seats Aggregation Calculate the average value of Paul’s car collection Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)
7
Operational Database Landscape
Dotted line is the natural boundary of what is possible today. Eg, ORCL lives far out on the right and does things nosql vendors will ever do. These things come at the expense of some degree of scale and performance. NoSQL born out of wanting greater scalability and performance, but we think they overreacted by giving up some things. Eg, caching layers give up many things, key value stores are super fast, but give up rich data model and rich query model. MongoDB tries to give up some features of a relational database (joins, complex transactions) to enable greater scalability and performance. You get most of the functionality – 80% - with much better scalability and performance. Start with rdbms, ask what could we do to scale – take out complex transactions and joins. How? Change the data model. >> segue to data model section. May need to revise the graphic – either remove the line or all points should be on the line. To enable horizontal scalability, reduce coordination between nodes (joins and transactions). Traditionally in rdbms you would denormalize the data or tell the system more about how data relates to one another. Another way, a more intuitive way, is to use a document data model. More intuitive b/c closer to the way we develop applications today with object oriented languages, like java,.net, ruby, node.js, etc. Document data model is good segue to next section >> Data Model
8
MongoDB 3.0 & storage engines
9
Current state in MongoDB 2.6
Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...
10
Current state in MongoDB 2.6
How to get all of the above? Current state in MongoDB 2.6 Read-heavy apps Great performance B-tree Low overhead Good scale-out perf Secondary reads Sharding Write-heavy apps Good scale-out perf Sharding Per-node efficiency wish-list: Doc level locking Write-optimized data structures (LSM) Compression Other Complex transactions In-memory engine SSD optimized engine etc...
11
MongoDB 3.0 Storage Engine API
Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party
12
MongoDB 3.0 Storage Engine API
One at a time: Many engines built into mongod Choose 1 at startup All data stored by the same engine Incompatible on-disk data formats (obviously) Compatible client API Compatible Oplog & Replication Same replica set can mix different engines No-downtime migration possible Read-heavy app Write-heavy app Special app MMAP WiredTiger 3rd party
13
Some existing engines MMAPv1 WiredTiger RocksDB TokuMXse
Improved MMAP (collection-level locking) WiredTiger Discussed next RocksDB LSM style engine developed by Facebook Based on LevelDB TokuMXse Fractal Tree indexing engine from Tokutek
14
Some rumored engines Heap Devnull
In-memory engine Devnull Write all data to /dev/null Based on idea from famous flash animation... Oplog stored as normal SSD optimized engine (e.g. Fusion-IO) KV simple key-value engine
15
WiredTiger
16
What is WiredTiger Modern NoSQL database engine
flexible schema Advanced database engine Secondary indexes, MVCC, non-locking algorithms Multi-statement transactions (not in MongoDB 3.0) Very modular, tunable Btree, LSM and columnar indexes Snappy, Zlib, 3rd-party compression Index prefix compression, etc... Built by creators of BerkeleyDB Acquired by MongoDB in 2014 source.wiredtiger.com
17
Choosing WiredTiger at server startup
mongod --storageEngine wiredTiger
18
Main tunables exposed as MongoDB options
mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB 8 --wiredTigerDirectoryForIndexes /data/indexes --wiredTigerCollectionBlockCompressor zlib --syncDelay 30
19
All WiredTiger options via configString (hidden)
mongod --storageEngine wiredTiger --wiredTigerEngineConfigString "cache_size=8GB,eviction=(threads_min=4,threads_max=8), checkpoint(wait=30)" --wiredTigerCollectionConfigString "block_compressor=zlib" --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib" --wiredTigerDirectoryForIndexes /data/indexes See docs for wiredtiger_open() & WT_SESSION::create()
20
Also via createCollection(), createIndex()
db.createCollection( "users", { storageEngine: { wiredTiger: { configString: "block_compressor=none" } } )
21
More... db.serverStatus() db.collection.stats()
22
Understanding and Optimizing WiredTiger
23
Understanding WiredTiger architecture
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk
24
Covering 90% of your optimization needs
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib Decompression time OS Disk Cache (Default: 50%) Physical disk Disk seek time
25
Strategy 1: fit working set in Cache
Btree LSM Columnar cache_size = 80% WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) Physical disk
26
Strategy 2: fit working set in OS Disk Cache
Btree LSM Columnar WiredTiger SE Cache (default: 50%) cache_size = 10% None Snappy Zlib OS Disk Cache (Remaining: 90%) OS Disk Cache (Default: 50%) Physical disk
27
Strategy 3: SSD disk + compression to save €
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk
28
Strategy 4: SSD disk (no compression)
Btree LSM Columnar WiredTiger SE Cache (default: 50%) None Snappy Zlib OS Disk Cache (Default: 50%) SSD Physical disk
29
What problem is solved by LSM indexes?
Easy: No indexes Easy: Add indexes Performance Hard: Smart schema design (hire a consultant) LSM index structures (or columnar) Fast reads Both Fast writes
30
2B inserts (with 3 secondary indexes)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.