Mongodb.org A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University.

Slides:



Advertisements
Similar presentations
Introduction to MongoDB
Advertisements

Mongo An alternative database system. Installing Mongo We must install both the Mongo database and at least one GUI for managing Mongo See
Chapter 10: Designing Databases
Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
What is a Database By: Cristian Dubon.
Concepts of Database Management Seventh Edition
1 More MongoDB: Ch 3- 8, plus a little Hadoop CSSE 533 Week 2, Spring, 2015.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Data Modeling.
Introduction to Database Systems
Software Engineer, #MongoDBDays.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall 1 1. Chapter 2: Relational Databases and Multi-Table Queries Exploring Microsoft Office.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
MONGODB NOSQL SERIES Karol Rástočný 1. Prominent Users 2  AppScale, bit.ly, Business Insider, CERN LHC, craigslist, diaspora, Disney Interactive Media.
Systems analysis and design, 6th edition Dennis, wixom, and roth
ASP.NET Programming with C# and SQL Server First Edition
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
WTT Workshop de Tendências Tecnológicas 2014
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Goodbye rows and tables, hello documents and collections.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Aggregation.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 4th Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
Concepts of Database Management Seventh Edition
Concepts of Database Management Seventh Edition
WEEK 1, DAY 2 STEVE CHENOWETH CSSE DEPT CSSE 533 –INTRO TO MONGODB.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Introduction to MongoDB
Database Objective Demonstrate basic database concepts and functions.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
MongoDB First Light. Mongo DB Basics Mongo is a document based NoSQL. –A document is just a JSON object. –A collection is just a (large) set of documents.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Session 1 Module 1: Introduction to Data Integrity
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
Introduction to MongoDB. Database compared.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
Introduction to Database Programming with Python Gary Stewart
COMP 430 Intro. to Database Systems MongoDB. What is MongoDB? “Humongous” DB NoSQL, no schemas DB Lots of similarities with SQL RDBMs, but with more flexibility.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
Mongo Database (Intermediate)
and Big Data Storage Systems
Chapter 6 - Database Implementation and Use
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
MongoDB Distributed Write and Read
Learning MongoDB ZhangGang
Dineesha Suraweera.
MongoDB CRUD Operations
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
NOSQL databases and Big Data Storage Systems
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
1 Demand of your DB is changing Presented By: Ashwani Kumar
MongoDB for the SQL DBA.
Physical Database Design
MongoDB Aggregations.
CSE 482 Lecture 5: NoSQL.
CS5220 Advanced Topics in Web Programming Introduction to MongoDB
Building applications with MongoDB – An introduction
MongoDB Aggregations.
Server & Tools Business
Presentation transcript:

mongodb.org A. Im, G. Cai, H. Tunc, J. Stevens, Y. Barve, S. Hei Vanderbilt University

Content  Part 1: Introduction & Basics  2: CRUD  3: Schema Design  4: Indexes  5: Aggregation  6: Replication & Sharding

History  mongoDB = “Hu mongo us DB”  Open-source  Document-based  “High performance, high availability”  Automatic scaling  C-P on CAP -blog.mongodb.org/post/ /on-distributed-consistency-part-1 -mongodb.org/manual

Other NoSQL Types Key/value (Dynamo) Columnar/tabular (HBase) Document (mongoDB)

Motivations  Problems with SQL  Rigid schema  Not easily scalable (designed for 90’s technology or worse)  Requires unintuitive joins  Perks of mongoDB  Easy interface with common languages (Java, Javascript, PHP, etc.)  DB tech should run anywhere (VM’s, cloud, etc.)  Keeps essential features of RDBMS’s while learning from key-value noSQL systems

Company Using mongoDB “MongoDB powers Under Armour’s online store, and was chosen for its dynamic schema, ability to scale horizontally and perform multi-data center replication.”

-Steve Francia,

Data Model  Document-Based (max 16 MB)  Documents are in BSON format, consisting of field-value pairs  Each document stored in a collection  Collections  Have index set in common  Like tables of relational db’s.  Documents do not have to have uniform structure -docs.mongodb.org/manual/

JSON  “JavaScript Object Notation”  Easy for humans to write/read, easy for computers to parse/generate  Objects can be nested  Built on  name/value pairs  Ordered list of values

BSON “Binary JSON” Binary-encoded serialization of JSON-like docs Also allows “referencing” Embedded structure reduces need for joins Goals – Lightweight – Traversable – Efficient (decoding and encoding)

BSON Example { "_id" : "37010" "city" : "ADAMS", "pop" : 2660, "state" : "TN", “councilman” : { name: “John Smith” address: “13 Scenic Way” }

BSON Types TypeNumber Double1 String2 Object3 Array4 Binary data5 Object id7 Boolean8 Date9 Null10 Regular Expression11 JavaScript13 Symbol14 JavaScript (with scope)15 32-bit integer16 Timestamp17 64-bit integer18 Min key255 Max key127 The number can be used with the $type operator to query by type!

The _id Field By default, each document contains an _id field. This field has a number of special characteristics: – Value serves as primary key for collection. – Value is unique, immutable, and may be any non-array type. – Default data type is ObjectId, which is “small, likely unique, fast to generate, and ordered.” Sorting on an ObjectId value is roughly equivalent to sorting on creation time.

mongoDB vs. SQL mongoDBSQL DocumentTuple CollectionTable/View PK: _id FieldPK: Any Attribute(s) Uniformity not RequiredUniform Relation Schema Index Embedded StructureJoins ShardPartition

CRUD Create, Read, Update, Delete

Getting Started with mongoDB To install mongoDB, go to this link and click on the appropriate OS and architecture: First, extract the files (preferrably to the C drive). Finally, create a data directory on C:\ for mongoDB to use i.e. “md data” followed by “md data\db”

Getting Started with mongoDB Open your mongodb/bin directory and run mongod.exe to start the database server. To establish a connection to the server, open another command prompt window and go to the same directory, entering in mongo.exe. This engages the mongodb shell—it’s that easy!

CRUD: Using the Shell To check which db you’re usingdb Show all databasesshow dbs Switch db’s/make a new one use See what collections exist show collections Note: db’s are not actually created until you insert data!

CRUD: Using the Shell (cont.) To insert documents into a collection/make a new collection: db..insert( ) INSERT INTO VALUES( );

CRUD: Inserting Data Insert one document db..insert({ : }) Inserting a document with a field name new to the collection is inherently supported by the BSON model. To insert multiple documents, use an array.

CRUD: Querying  Done on collections.  Get all docs: db..find()  Returns a cursor, which is iterated over shell to display first 20 results.  Add.limit( ) to limit results  SELECT * FROM ;  Get one doc: db..findOne()

CRUD: Querying To match a specific value: db..find({ : }) “AND” db..find({ :, : }) SELECT * FROM WHERE = AND = ;

CRUD: Querying OR db..find({ $or: [ : : ] }) SELECT * FROM WHERE = OR = ; Checking for multiple values of same field db..find({ : {$in [, ]}})

CRUD: Querying Including/excluding document fields db..find({ : }, { : 0}) SELECT field1 FROM ; db..find({ : }, { : 1}) Find documents with or w/o field db..find({ : { $exists: true}})

CRUD: Updating db..update( { : }, //all docs in which field = value {$set: { : }}, //set field to value {multi:true} )//update multiple docs upsert: if true, creates a new doc when none matches search criteria. UPDATE SET = WHERE = ;

CRUD: Updating To remove a field db..update({ : }, { $unset: { : 1}}) Replace all field-value pairs db..update({ : }, { :, : }) *NOTE: This overwrites ALL the contents of a document, even removing fields.

CRUD: Removal Remove all records where field = value db..remove({ : }) DELETE FROM WHERE = ; As above, but only remove first document db..remove({ : }, true)

CRUD: Isolation By default, all writes are atomic only on the level of a single document. This means that, by default, all writes can be interleaved with other operations. You can isolate writes on an unsharded collection by adding $isolated:1 in the query area: db..remove({ :, $isolated: 1})

Schema Design

RDBMSMongoDB Database ➜ Table ➜ Collection Row ➜ Document Index ➜ Join ➜ Embedded Document Foreign Key ➜ Reference

Intuition – why database exist in the first place?  Why can’t we just write programs that operate on objects?  Memory limit  We cannot swap back from disk merely by OS for the page based memory management mechanism  Why can’t we have the database operating on the same data structure as in program?  That is where mongoDB comes in

Mongo is basically schema- free  The purpose of schema in SQL is for meeting the requirements of tables and quirky SQL implementation  Every “row” in a database “table” is a data structure, much like a “struct” in C, or a “class” in Java. A table is then an array (or list) of such data structures  So we what we design in mongoDB is basically same way how we design a compound data type binding in JSON

There are some patterns  Embedding  Linking

Embedding & Linking

One to One relationship zip = { _id: 35004, city: “ACMAR” loc: [-86, 33], pop: 6065, State: “AL”, council_person: { name: “John Doe", address: “123 Fake St.”, Phone: } zip = { _id: 35004, city: “ACMAR”, loc: [-86, 33], pop: 6065, State: “AL” } Council_person = { zip_id = 35004, name: “John Doe", address: “123 Fake St.”, Phone: }

Example 2 MongoDB: The Definitive Guide, By Kristina Chodorow and Mike Dirolf Published: 9/24/2010 Pages: 216 Language: English Publisher: O’Reilly Media, CA

One to many relationship - Embedding book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate(" "), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }

One to many relationship – Linking publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA" } book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate(" "), pages: 216, language: "English", publisher_id: "oreilly" }

Linking vs. Embedding  Embedding is a bit like pre-joining data  Document level operations are easy for the server to handle  Embed when the “many” objects always appear with (viewed in the context of) their parents.  Linking when you need more flexibility

Many to many relationship  Can put relation in either one of the documents (embedding in one of the documents)  Focus how data is accessed queried

Example book = { title: "MongoDB: The Definitive Guide", authors : [ { _id: "kchodorow", name: "Kristina Chodorow” }, { _id: "mdirolf", name: "Mike Dirolf” } ] published_date: ISODate(" "), pages: 216, language: "English" } author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York" } db.books.find( { authors.name : " Kristina Chodorow " } )

What is bad about SQL ( semantically )  “Primary keys” of a database table are in essence persistent memory addresses for the object. The address may not be the same when the object is reloaded into memory. This is why we need primary keys.  Foreign key functions just like a pointer in C, persistently point to the primary key.  Whenever we need to deference a pointer, we do JOIN  It is not intuitive for programming and also JOIN is time consuming

Example 3 Book can be checked out by one student at a time Student can check out many books

Modeling Checkouts student = { _id: "joe" name: "Joe Bookreader", join_date: ISODate(" "), address: {... } } book = { _id: " " title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ],... }

Modeling Checkouts student = { _id: "joe" name: "Joe Bookreader", join_date: ISODate(" "), address: {... }, checked_out: [ { _id: " ", checked_out: " " }, { _id: " ", checked_out: " " },... ] }

What is good about mongoDB?  find() is more semantically clear for programming  De-normalization provides Data locality, and Data locality provides speed (map (lambda (b) b.title) (filter (lambda (p) (> p 100)) Book)

Part 4: Index in MongoDB

Before Index  What does database normally do when we query?  MongoDB must scan every document.  Inefficient because process large volume of data db.users.find( { score: { “$lt” : 30} } )

Definition of Index  Definition  Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. Diagram of a query that uses an index to select Index

Index in MongoDB  Creation index  db.users.ensureIndex( { score: 1 } )  Show existing indexes  db.users.getIndexes()  Drop index  db.users.dropIndex( {score: 1} )  Explain—Explain  db.users.find().explain()  Returns a document that describes the process and indexes  Hint  db.users.find().hint({score: 1})  Overide MongoDB’s default index selection Operations

Index in MongoDB Types Single Field Indexes – db.users.ensureIndex( { score: 1 } ) Single Field Indexes Compound Field Indexes Multikey Indexes

Index in MongoDB Types Compound Field Indexes – db.users.ensureIndex( { userid:1, score: -1 } ) Single Field Indexes Compound Field Indexes Multikey Indexes

Index in MongoDB Types Multikey Indexes – db.users.ensureIndex( { addr.zip:1} ) Single Field Indexes Compound Field Indexes Multikey Indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes

Demo of indexes in MongoDB  Import Data  Create Index  Single Field Index  Compound Field Indexes  Multikey Indexes  Show Existing Index  Hint  Single Field Index  Compound Field Indexes  Multikey Indexes  Explain  Compare with data without indexes Without Index With Index

Aggregation  Operations that process data records and return computed results.  MongoDB provides aggregation operations  Running data aggregation on the mongod instance simplifies application code and limits resource requirements.mongod

Pipelines  Modeled on the concept of data processing pipelines.  Provides:  filters that operate like queries  document transformations that modify the form of the output document.  Provides tools for:  grouping and sorting by field  aggregating the contents of arrays, including arrays of documents  Can use operators for tasks such as calculating the average or concatenating a string.operators

Pipelines  $limit  $skip  $sort

Map-Reduce  Has two phases:  A map stage that processes each document and emits one or more objects for each input document  A reduce phase that combines the output of the map operation.  An optional finalize stage for final modifications to the result  Uses Custom JavaScript functions  Provides greater flexibility but is less efficient and more complex than the aggregation pipeline  Can have output sets that exceed the 16 megabyte output limitation of the aggregation pipeline.

Single Purpose Aggregation Operations  Special purpose database commands:  returning a count of matching documents  returning the distinct values for a field  grouping data based on the values of a field.  Aggregate documents from a single collection.  Lack the flexibility and capabilities of the aggregation pipeline and map-reduce.

Replication & Sharding Image source:

Replication  What is replication?  Purpose of replication/redundancy  Fault tolerance  Availability  Increase read capacity

Replication in MongoDB  Replica Set Members  Primary  Read, Write operations  Secondary  Asynchronous Replication  Can be primary  Arbiter  Voting  Can’t be primary  Delayed Secondary  Can’t be primary

Replication in MongoDB  Automatic Failover  Heartbeats  Elections  The Standard Replica Set Deployment  Deploy an Odd Number of Members  Rollback  Security  SSL/TLS

Demo for Replication

Sharding  What is sharding?  Purpose of sharding  Horizontal scaling out  Query Routers  mongos  Shard keys  Range based sharding  Cardinality  Avoid hotspotting

Demo for Sharding

Thanks