Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington

Similar presentations


Presentation on theme: "MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington"— Presentation transcript:

1 MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington
CSci 5707, Fall 2013 University of Minnesota

2 MapReduce Idea Mapping Reducing  list (k2, v2) reduce (k2, list(v2))
map (k1, v1)  list (k2, v2) Reducing reduce (k2, list(v2))  list (v2) Pseudo-code for counting the number of occurrences of each word in a large collection of documents Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08

3 Calculation of the number of occurrences of each word
MapReduce Example Calculation of the number of occurrences of each word

4 MapReduce Architecture
Execution overview Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clustering, OSDI’08

5 MapReduce or Parallel DBMS
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and Stonebraker, M., “A comparison of approaches to large-scale data analysis”, ACM SIGMOD International Conference, 2009 ( Dean, J., and Ghemawat, S., “MapReduce: A flexible data processing tool”, Communications of the ACM, Vol. 53, 2010 (DOI: / )

6 MapReduce Design Properties
Heterogeneous Systems Processing and combining data from a wide variety of storage systems (such as relational databases, file systems, etc.) Fault Tolerance Providing fine-grain fault tolerance for large jobs (Failure in middle of a multi-hour execution does not require restarting the job from scratch) Complex Functions Simple Map and Reduce functions with straightforward SQL equivalents Offering a better framework for some complicated tasks 6 Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010

7 MapReduce Design Properties
Performance Loading data: Startup overhead for MapReduce Reading data: Full scan over large data files Merging results: A MapReduce as the next consumer Cost Hardware: Network workstations Software: Open source (Hodoop) Communication: Network system 7 Jeffrey Dean and Sanjay Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, Vol. 53, 2010

8 Companies Using Hodoop
Facebook Yahoo! Google Amazon Twitter 8


Download ppt "MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington"

Similar presentations


Ads by Google