Download presentation
Presentation is loading. Please wait.
Published byLynn Higgins Modified over 9 years ago
1
MapReduce and NoSQL CMSC 461 Michael Wilson
2
Big data The term big data has become fairly popular as of late There is a need to store vast quantities of data and retrieve them in a short amount of time Images, movies, etc. Large files
3
MapReduce http://research.google.com/archive/map reduce.html http://research.google.com/archive/map reduce.html Concept pioneered by Google Performing operations on large volumes of data Map function Reduce function
4
Map function Map function Receives a set of key value pairs as input Performs some operation (user defined) Produces a set of new key value pairs
5
Reduce function Receives the intermediate key value pairs Can have multiple values for the same key Merges the values together in some way Produces a merged output
6
When to use MapReduce MapReduce doesn’t work for all problems Problems have to be parallelizable In other words, an algorithm that involves stateful steps is not necessarily a good candidate for MapReduce
7
Commodity hardware MapReduce clusters are commodity hardware X86 processors, several gigabytes of RAM In this day and age, more computers are cheap Rather than beef up the machines, just use more
8
Hadoop Hadoop is a Java based MapReduce implementation Very popular Has a secondary component, HDFS Hadoop Distributed File System
9
HDFS File system spread across a Hadoop MapReduce cluster Large block sizes – 64 MB by default Very popular base for other distributed applications In particular, NoSQL applications
10
NoSQL NoSQL is a somewhat nebulous term Basically means “not SQL,” or “something other than SQL” Many different approaches Key-Value stores are a big part of the NoSQL movement Focus on them here
11
Key-Value?! This almost seems like a step backward Key-Value stores are far less structured Can’t establish relations between entities in a key value store Can’t constrain data very well Why is reducing the structure gaining popularity?
12
Distributable nature Many Key-Value stores can be distributed amongst many nodes By distributing these nodes, searches and operations on vast swaths of data can be performed in a sensible amount of time Not all, however Some can be single server applications stored in RAM
13
NoSQL Key-Value implementations Hbase Accumulo Memcached Dynamo Many many more
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.