Download presentation
Presentation is loading. Please wait.
Published byDayna Simon Modified over 9 years ago
1
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net
2
Chaining Mapreduce 2 Table join and sorting need two Mapreduce jobs mapreduce-1 | mapreduce-2 | mapreduce-3 |... JobClient.runJob()
3
Chaining MapReduce jobs with complex dependency 3 Mapreduce1 may process one data set, while mapreduce2 independently processes another data set. The third job, mapreduce3, performs an inner join of the first two jobs’ output.
4
Chaining MapReduce jobs with complex dependency Hadoop has a mechanism to simplify the management of such (nonlinear) job dependencies via the Job and JobControl classes. For Job objects x and y, x.addDependingJob(y)
5
Chaining preprocessing and postprocessing steps MAP+ | REDUCE | MAP*
6
Driver for chaining mappers within a MapReduce job Configuration conf = getConf(); JobConf job = new JobConf(conf); job.setJobName("ChainJob"); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); JobConf map1Conf = new JobConf(false); ChainMapper.addMapper(job,Map1.class,LongWritable. class,Text.class,Text.class,Text.class, true,map1Conf); JobConf map2Conf = new JobConf(false); ChainMapper.addMapper(job,Map2.class,Text.class,Tex t.class,LongWritable.class,Text.class,true,map2Conf);
7
Driver for chaining mappers within a MapReduce job JobConf reduceConf = new JobConf(false); ChainReducer.setReducer(job,Reduce.class,LongWrita ble.class,Text.class,Text.class,Text.class,true,reduceC onf); JobConf map3Conf = new JobConf(false); ChainReducer.addMapper(job,Map3.class,Text.class,T ext.class,LongWritable.class, Text.class,true,map3Conf); JobConf map4Conf = new JobConf(false); ChainReducer.addMapper(job,Map4.class,LongWritabl e.class,Text.class,LongWritable.class,Text.class,true,m ap4Conf); JobClient.runJob(job);
8
Driver for chaining mappers within a MapReduce job public static void addMapper(JobConf job, klass, inputKeyClass, inputValueClass, outputKeyClass, outputValueClass, boolean byValue, JobConf mapperConf)
9
DistributedCache DistributedCache.addCacheFile() to specify the files to be disseminated to all nodes DistributedCache.getLocalCacheFiles()
10
DistributedCache public int run(String[] args) throws Exception { Configuration conf = getConf(); JobConf job = new JobConf(conf, DataJoinDC.class); DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf); Path in = new Path(args[1]); Path out = new Path(args[2]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out);
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.