Download presentation
Presentation is loading. Please wait.
Published byNigel Carroll Modified over 9 years ago
1
NoSQL continued CMSC 461 Michael Wilson
2
MongoDB MongoDB is another NoSQL solution Provides a bit more structure than a solution like Accumulo Data is stored as BSON (Binary JSON) Binary encoded JSON, extends JSON Allows storage of large amounts of data
3
SQL vs. MongoDB SQL has databases, tables, rows, columns Monbo has databases, collections, documents, fields Both have primary keys, indexes Collection structures are not enforced heavily Inserts automatically create schemas
4
Interacting with MongoDB Multiple databases within MongoDB Switch databases use newDb New databases will be stored after an insert Create collection db.createCollection(“collectionName”) Not necessary, collections are implicitly created on insert
5
BSON MongoDB uses BSON very heavily Binary JSON Like JSON with a binary serialization method Has extensions so that it can represent data types that JSON cannot Used to represent documents, provide input to queries
6
Selects/queries In MongoDB, querying typically consists of providing an appropriately crafted BSON SELECT * FROM collectionName db.collectionName.find() SELECT * FROM collectionName WHERE field = value db.collectionName.find( {field: value} ) SELECT * FROM collectionName WHERE field > 5 db.collectionName.find( {field: {$gt: 5} } ) Other functions that take a query argument have queries that are formatted this way
7
Interacting with MongoDB Insert db.collectionName.insert( {queryBSON} ) Update db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} ) updateBSON Set field to 5: {$set: {field: 5}} Increment field by 1 {$inc: {field: 1}} optionBSON Options that determine whether or not to create new documents, update more than one document, write concerns
8
Interacting with MongoDB Delete db.collectionName.remove( {queryBSON} )
9
Apache Hive Also runs on Hadoop, uses HDFS as a data store Queryable like SQL Using an SQL-inspired language, HiveQL
10
Hive data organization Databases Tables Partitions Tables are broken down into partitions Partition keys allow data to be stored into separate data files on HDFS Can query on particular partitions Buckets Can bucket by column to sample data
11
Purpose of Hive Provide analytics, query large volumes of data NOT to be used for real time queries like Postgres or Oracle Hive queries take forever Partitions and buckets can help reduce this amount of time
12
Hive queries Hive queries actually generate MapReduce jobs MapReduce jobs take a while to set up and run MapReduce jobs can be run manually, but for structured data and analytics, Hive can be used
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.