Download presentation
Presentation is loading. Please wait.
Published byRosamund Nelson Modified over 9 years ago
1
Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems
2
MapReduce Simplified data processing on large clusters A programming model and an associated implementation for processing and generating large data sets Inspired by “map” and “reduce” primitives present in Lisp and many other functional languages Simple and powerful interface that enables automatic paralization and distribution of large-scale computations 2 Fall 2008 CS492
3
Map and Reduce Map map list (k2, v2) example k1 = document, v1 = contents k2 = words, v2 = 1 Reduce reduce (k2, list(v2)) list (v2) k2 = words, v2 = 1 list (v2) = total count of v2 (or k2) 3 Fall 2008 CS492
4
MapReduce Execution Overview 4 Fall 2008 CS492
5
Examples Distributed grep Count of URL access frequency Reverse web-link graph Term-vector per host 5 Fall 2008 CS492
6
Pig platform for analyzing large data sets that consists of a hig h-level language for expressing data analysis programs, cou pled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is a menable to substantial parallelization, which in turns enabl es them to handle very large data sets Dryad Distributed data-parallel programs from sequential building blocks (EuroSys 2007) 6 Fall 2008 CS492
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.