NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

NoSQL continued CMSC 461 Michael Wilson

MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data is stored as BSON (Binary JSON)  Binary encoded JSON, extends JSON  Allows storage of large amounts of data

SQL vs. MongoDB  SQL has databases, tables, rows, columns  Monbo has databases, collections, documents, fields  Both have primary keys, indexes  Collection structures are not enforced heavily  Inserts automatically create schemas

Interacting with MongoDB  Multiple databases within MongoDB  Switch databases  use newDb  New databases will be stored after an insert  Create collection  db.createCollection(“collectionName”)  Not necessary, collections are implicitly created on insert

BSON  MongoDB uses BSON very heavily  Binary JSON  Like JSON with a binary serialization method  Has extensions so that it can represent data types that JSON cannot  Used to represent documents, provide input to queries

Selects/queries  In MongoDB, querying typically consists of providing an appropriately crafted BSON  SELECT * FROM collectionName  db.collectionName.find()  SELECT * FROM collectionName WHERE field = value  db.collectionName.find( {field: value} )  SELECT * FROM collectionName WHERE field > 5  db.collectionName.find( {field: {$gt: 5} } )  Other functions that take a query argument have queries that are formatted this way

Interacting with MongoDB  Insert  db.collectionName.insert( {queryBSON} )  Update  db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} )  updateBSON  Set field to 5: {$set: {field: 5}}  Increment field by 1 {$inc: {field: 1}}  optionBSON  Options that determine whether or not to create new documents, update more than one document, write concerns

Interacting with MongoDB  Delete  db.collectionName.remove( {queryBSON} )

Apache Hive  Also runs on Hadoop, uses HDFS as a data store  Queryable like SQL  Using an SQL-inspired language, HiveQL

Hive data organization  Databases  Tables  Partitions  Tables are broken down into partitions  Partition keys allow data to be stored into separate data files on HDFS  Can query on particular partitions  Buckets  Can bucket by column to sample data

Purpose of Hive  Provide analytics, query large volumes of data  NOT to be used for real time queries like Postgres or Oracle  Hive queries take forever  Partitions and buckets can help reduce this amount of time

Hive queries  Hive queries actually generate MapReduce jobs  MapReduce jobs take a while to set up and run  MapReduce jobs can be run manually, but for structured data and analytics, Hive can be used

NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Similar presentations

Presentation on theme: "NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Similar presentations

Presentation on theme: "NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data."— Presentation transcript:

Similar presentations

About project

Feedback