Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology.

Similar presentations


Presentation on theme: "MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology."— Presentation transcript:

1 MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology

2 YOUR SITE HERE LOGO 1. Basic MapReduce Programs 2. Advanced MapReduce 3. Beyond the horizon 4. discussion Contents

3 YOUR SITE HERE LOGO Job Configuration Master Jobtracker Master Jobtracker Job Basic MapReduce Programs

4 YOUR SITE HERE LOGO Implement Interface Environment Configuration Basic MapReduce Programs Job Configuration? Java Class

5 YOUR SITE HERE LOGO Interface Combiner InputFormat OutputFormat Mapper Reducer Partitioner

6 YOUR SITE HERE LOGO Configure jvm: Mapred.child.java.opts {mapred.local.dir} InputPath OutputPath How many Map/Reduce Tasks?

7 YOUR SITE HERE LOGO InputFormat Map Reduce OutputFormat Basic MapReduce Program Text

8 YOUR SITE HERE LOGO Basic MapReduce

9 YOUR SITE HERE LOGO  Combiners an optimization in MapReduce that allow for local aggregation before the shue and sort phase  Partitioner determines which reducer will be responsible for processing a particular key, and the execution framework uses this information to copy the data to the right location during the shue and sort phase PARTITIONERS AND COMBINERS

10 YOUR SITE HERE LOGO CREATING CUSTOM INPUTFORMAT KeyValue Text Sequence File NLine Text Input Format Basic MapReduce Program InputFormat

11 YOUR SITE HERE LOGO TextInputFormat - Each line in the text fi les is a record. Key is the byte offset of the line, and value is the content of the line. KeyValueTextInputFormat - Each line in the text fi les is a record. The fi rst separator character divides each line. Everything before the separator is the key, and everything after is the value. The separator is set by the key.value.separator.in.input.line property, and the default is the tab (\t) character. NLineInputFormat - Same as TextInputFormat, but each split is guaranteed to have exactly N lines. The mapred.line.input.format. Lines/map property, which defaults to one, sets N. InputFormat

12 YOUR SITE HERE LOGO 4 Basic MapReduce Program types for the key/value pairs

13 YOUR SITE HERE LOGO code for mapper, reducer, combiner, partitioner, along with job conguration parameters The execution framework handles everything else Summary for basic Program What’s a complete MapReduce job ??

14 YOUR SITE HERE LOGO Chaining MapReduce jobs LOCAL AGGREGATION SECONDARY SORTING Work on Hadoop Files Advanced MapReduce

15 YOUR SITE HERE LOGO  You’ve been doing data processing tasks which a single MapReduce job can accomplish.  But……  As you get more comfortable writing MapReduce programs and take on more ambitious data processing tasks  you’ll find many complex tasks need to be broken down into simpler subtasks, each accomplished by an individual MapReduce job Chaining MapReduce jobs

16 YOUR SITE HERE LOGO  in Hadoop, intermediate results are written to local disk before being sent over the network.  Reductions in the amount of intermediate data translate should increase in algorithmic efficiency  use of the combiner is possible to substantially reduce both the number and size of key-value pairs that need to be shuffled from the mappers to the reducers LOCAL AGGREGATION

17 YOUR SITE HERE LOGO seudo-code for computing the mean of values associated with the same string.

18 YOUR SITE HERE LOGO LOCAL AGGREGATION, Is it right ??

19 YOUR SITE HERE LOGO  1. combiners must have the same input and output key-value type  2. Combiners are optimizations that cannot change the correctness of the algorithm Hadoop makes no guarantees on how many times combiners are called; it could be zero, one, or multiple times LOCAL AGGREGATION

20 YOUR SITE HERE LOGO LOCAL AGGREGATION, right usage !

21 YOUR SITE HERE LOGO  we also need to sort by value sometimes  (k1;m1; v8)  (k1;m2; v1)  (k1;m3; v7)  :::  (k2;m1; v2)  (k2;m2; v6)  (k2;m3; v9)  k1 (m1; k8)  (k1; m1) (k8) SECONDARY SORTING

22 YOUR SITE HERE LOGO  It’s a shame  The rest I will talk about Plays an important role in MapReduce, but, they are beyond my horizon.  So, need all your help, to master them together…. Beyond the horizon

23 YOUR SITE HERE LOGO Beyond the horizon Creat user custom Inputformat Manipulate local file Creat user custom Partitioner Pipes for C++ Streaming other language

24 YOUR SITE HERE LOGO Beyond the horizon Joining data from different sources Hive Pig HBase Multiple File output

25 Joining data from different sources Orders files CSV format fields: (Customer ID, Order ID, Price, and Purchase Date) Customers file CSV format record fields: (Customer ID, Name, and Phone Number)

26 YOUR SITE HERE LOGO Joey Leung,555-555-55 Edward,123-456-7890 Jose Madriz,281-330-8004 David Stork,408-555-0000 ….... A,12.95,02-Jun-2008 B,88.25,20-may-2008 C,32.00,30-Nov-2007 D,25.02,22-Jan-2009 Joining data from different sources Joey Leung,555-555-5555,B,88.25,20-May-2008 Edward,123-456-7890,C,32.00,30-Nov-2007 Jose Madriz,281-330-8004,A,12.95,02-Jun-2008 Jose Madriz,281-330-8004,D,25.02,22-Jan-2009

27 YOUR SITE HERE LOGO Thank you!


Download ppt "MapReduce in Action Team 306 Led by Chen Lin College of Information Science and Technology."

Similar presentations


Ads by Google