MongoDB Introduction © Zoran Maksimovic
MongoDB is a scalable, high- performance, open source, schema-free schema-free, document-oriented database © Zoran Maksimovic
History First developed (by 10gen) Become Open Source Considered production ready (v 1.4 > ) MongoDB Closes $150 Million in Funding Latest stable version (v 2.6) Today- More than $231 million in total investment since 2007 MongoDB inc. valuated $1.2B. © Zoran Maksimovic
© Zoran Maksimovic
NoSQL Breakdown NoSQL encompasses a wide variety of different database technologies and were developed in response to a rise in the volume of data Document databases pair each key with a complex data structure known as a document (MongoDB, Couchbase Server, CouchDB ) Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value (DynamoDB, Windows Azure Table Storage, Riak, Redis, LevelDB, Dynomite ) Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. Graph stores are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB. © Zoran Maksimovic
NoSQL made by big vendors Oracle NoSQL Database (Key-Value store) Microsoft Azure Table Storage (Key-Value store) Google: BigTable (proprietary) Google: LevelDB (Open Source key-value store) Amazon: SimpleDB (Wide Column store) Amazon: DynamoDB (Key-Value store) Apache: HBase, Riak, … Facebook: Cassandra (Wide column store) © Zoran Maksimovic
MongoDB in a nutshell Document-Oriented Storage » JSON-style documents with dynamic schemas offer simplicity and power. Full Index Support »Index on any attribute, just like you're used to. Replication & High Availability » Mirror across LANs and WANs for scale and peace of mind. Auto-Sharding » Scale horizontally without compromising functionality. Querying » Rich, document-based queries. Fast In-Place Updates »Atomic modifiers for contention-free performance. Map/Reduce »Flexible aggregation and data processing. GridFS »Store files of any size without complicating your stack. MongoDB Management Service »Monitoring and backup designed for MongoDB. Professional Support by MongoDB »Enterprise class support, training, and consulting available. © Zoran Maksimovic
MongoDB is a Document oriented database Think of “documents” as database records. No Schema! Documents are basically just JSON objects that Mongo stores in binary (BSON) format © Zoran Maksimovic
MongoDB database structure © Zoran Maksimovic
Embedded Data Model © Zoran Maksimovic When to use: “contains” relationships between entities. one-to-many relationships between entities. In these relationships the “many” or child documents always appear with or are viewed in the context of the “one” or parent documents. Retrieving data in one query Data redundancy.
Document oriented database – Normalized data model May, Zoran Maksimovic When to use: When embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication. To represent more complex many-to-many relationships. To model large hierarchical data sets. Multiple queries!
Indexing All indexes in MongoDB are B-Tree indexes Index Types: Single field index Compound Index: more than one field in the collection Multikey index: index on array fields Geospatial index and queries. Text index: Index TTL index: (Time to live) index will contain entities for a limited time. Unique index: the entry in the field has to b unique. Sparse index: stores an index entry only for entities with the given field. © Zoran Maksimovic
Security Authentication: MongoDB’s default UserName/Password authentication x509 certificate authentication LDAP proxy authentication Kerberos authentication Authorization Role based access control © Zoran Maksimovic
Replication Replication provides redundancy and increases data high availability © Zoran Maksimovic
Sharding (Horizontal scaling) Sharding is a method for storing data across multiple machines When HDD, CPU or RAM limits are reached. Vertical Scaling vs Horizontal Scaling. Range based vs Hash based sharding © Zoran Maksimovic
How to access MongoDB? Drivers: Administration interfaces: © Zoran Maksimovic
C# code example var connectionString = "mongodb://localhost"; var client = new MongoClient(connectionString); var server = client.GetServer(); var database = server.GetDatabase("test"); Entity var collection = database.GetCollection ("entities"); //insert a new entity var entity = new Entity { Name = "Tom" }; collection.Insert(entity); var id = entity.Id; //Retrieve var query = Query.EQ(e => e.Id, id); entity = collection.FindOne(query); //Save (Update) -> Sends the full content of the entity to be updated. entity.Name = “Nick"; collection.Save(entity); //Update -> Sends partial content of the entity to be updated. var update = Update.Set(e => e.Name, "Harry"); collection.Update(query, update); //Deleting the entity collection.Remove(query); public class Entity { public ObjectId Id { get; set; } public string Name { get; set; } } { _id: “ ”, Name: “Tom” } { _id: “ ”, Name: “Nick” } { _id: “ ”, Name: “Nick” } © Zoran Maksimovic
Some of the MongoDB Shell methods db.inventory.find( { type: "snacks" } ) db.inventory.find( { type: 'food', price: { $lt: 9.95 } } ) db.inventory.insert ( { _id: 10, type: "misc", item: "card", qty: 15 } ) db.inventory.find( { type: 'food' } ).explain() { "cursor": "BtreeCursor type_1", "isMultiKey": false, "n": 5, "nscannedObjects": 5, "nscanned": 5, "nscannedObjectsAllPlans": 5, "nscannedAllPlans": 5, "scanAndOrder": false, "indexOnly": false, "nYields": 0, "nChunkSkips": 0, "millis" : 0, "indexBounds": { "type" : [ [ "food", "food" ] ] }, "server": "mongodbo0.example.net:27017" } © Zoran Maksimovic
What is missing (from the RDBMS perspective) No JOINS support No complex transaction support No constrains support (have to be implemented at the application level) © Zoran Maksimovic
Where/When to use? A main drivers: Big amount of data (Twitter: ~12TB of data per day!) Develop more easily (according to surveys)! impedance mismatch problem! In general: Content Management and Delivery: serve content, as well as the associated metadata (attachments, images, binary) Big Data too diverse, fast-changing, or massive… These include a wide variety of apps such as genomics, clickstream analysis, customer Sentiment analysis, log data collection etc… Analytics and Reporting (data warehouse) Market Data Management © Zoran Maksimovic
Problems Maturity!!! Skillset? Organizational change? What’s about the future? © Zoran Maksimovic
Q&A © Zoran Maksimovic