Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net

Chaining Mapreduce 2 Table join and sorting need two Mapreduce jobs mapreduce-1 | mapreduce-2 | mapreduce-3 |... JobClient.runJob()

Chaining MapReduce jobs with complex dependency 3 Mapreduce1 may process one data set, while mapreduce2 independently processes another data set. The third job, mapreduce3, performs an inner join of the first two jobs’ output.

Chaining MapReduce jobs with complex dependency Hadoop has a mechanism to simplify the management of such (nonlinear) job dependencies via the Job and JobControl classes. For Job objects x and y, x.addDependingJob(y)

Chaining preprocessing and postprocessing steps MAP+ | REDUCE | MAP*

Driver for chaining mappers within a MapReduce job Configuration conf = getConf(); JobConf job = new JobConf(conf); job.setJobName("ChainJob"); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); JobConf map1Conf = new JobConf(false); ChainMapper.addMapper(job,Map1.class,LongWritable. class,Text.class,Text.class,Text.class, true,map1Conf); JobConf map2Conf = new JobConf(false); ChainMapper.addMapper(job,Map2.class,Text.class,Tex t.class,LongWritable.class,Text.class,true,map2Conf);

Driver for chaining mappers within a MapReduce job JobConf reduceConf = new JobConf(false); ChainReducer.setReducer(job,Reduce.class,LongWrita ble.class,Text.class,Text.class,Text.class,true,reduceC onf); JobConf map3Conf = new JobConf(false); ChainReducer.addMapper(job,Map3.class,Text.class,T ext.class,LongWritable.class, Text.class,true,map3Conf); JobConf map4Conf = new JobConf(false); ChainReducer.addMapper(job,Map4.class,LongWritabl e.class,Text.class,LongWritable.class,Text.class,true,m ap4Conf); JobClient.runJob(job);

Driver for chaining mappers within a MapReduce job public static void addMapper(JobConf job, klass, inputKeyClass, inputValueClass, outputKeyClass, outputValueClass, boolean byValue, JobConf mapperConf)

DistributedCache DistributedCache.addCacheFile() to specify the files to be disseminated to all nodes DistributedCache.getLocalCacheFiles()

DistributedCache public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf, DataJoinDC.class); DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf); Path in = new Path(args[1]); Path out = new Path(args[2]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out);

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations

Presentation on theme: "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations

Presentation on theme: "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

Similar presentations

About project

Feedback