MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.

MapReduce and NoSQL CMSC 461 Michael Wilson

Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities of data and retrieve them in a short amount of time  Images, movies, etc.  Large files

MapReduce  http://research.google.com/archive/map reduce.html http://research.google.com/archive/map reduce.html  Concept pioneered by Google  Performing operations on large volumes of data  Map function  Reduce function

Map function  Map function  Receives a set of key value pairs as input  Performs some operation (user defined)  Produces a set of new key value pairs

Reduce function  Receives the intermediate key value pairs  Can have multiple values for the same key  Merges the values together in some way  Produces a merged output

When to use MapReduce  MapReduce doesn’t work for all problems  Problems have to be parallelizable  In other words, an algorithm that involves stateful steps is not necessarily a good candidate for MapReduce

Commodity hardware  MapReduce clusters are commodity hardware  X86 processors, several gigabytes of RAM  In this day and age, more computers are cheap  Rather than beef up the machines, just use more

Hadoop  Hadoop is a Java based MapReduce implementation  Very popular  Has a secondary component, HDFS  Hadoop Distributed File System

HDFS  File system spread across a Hadoop MapReduce cluster  Large block sizes – 64 MB by default  Very popular base for other distributed applications  In particular, NoSQL applications

NoSQL  NoSQL is a somewhat nebulous term  Basically means “not SQL,” or “something other than SQL”  Many different approaches  Key-Value stores are a big part of the NoSQL movement  Focus on them here

Key-Value?!  This almost seems like a step backward  Key-Value stores are far less structured  Can’t establish relations between entities in a key value store  Can’t constrain data very well  Why is reducing the structure gaining popularity?

Distributable nature  Many Key-Value stores can be distributed amongst many nodes  By distributing these nodes, searches and operations on vast swaths of data can be performed in a sensible amount of time  Not all, however  Some can be single server applications stored in RAM

NoSQL Key-Value implementations  Hbase  Accumulo  Memcached  Dynamo  Many many more

MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.

Similar presentations

Presentation on theme: "MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities.

Similar presentations

Presentation on theme: "MapReduce and NoSQL CMSC 461 Michael Wilson. Big data  The term big data has become fairly popular as of late  There is a need to store vast quantities."— Presentation transcript:

Similar presentations

About project

Feedback