Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems.

Similar presentations


Presentation on theme: "Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems."— Presentation transcript:

1 Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems

2 MapReduce  Simplified data processing on large clusters A programming model and an associated implementation for processing and generating large data sets  Inspired by “map” and “reduce” primitives present in Lisp and many other functional languages  Simple and powerful interface that enables automatic paralization and distribution of large-scale computations 2 Fall 2008 CS492

3 Map and Reduce  Map map  list (k2, v2) example  k1 = document, v1 = contents  k2 = words, v2 = 1  Reduce reduce (k2, list(v2))  list (v2)  k2 = words, v2 = 1  list (v2) = total count of v2 (or k2) 3 Fall 2008 CS492

4 MapReduce Execution Overview 4 Fall 2008 CS492

5 Examples  Distributed grep  Count of URL access frequency  Reverse web-link graph  Term-vector per host 5 Fall 2008 CS492

6  Pig platform for analyzing large data sets that consists of a hig h-level language for expressing data analysis programs, cou pled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is a menable to substantial parallelization, which in turns enabl es them to handle very large data sets  Dryad Distributed data-parallel programs from sequential building blocks (EuroSys 2007) 6 Fall 2008 CS492


Download ppt "Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems."

Similar presentations


Ads by Google