Presentation is loading. Please wait.

Presentation is loading. Please wait.

Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨.

Similar presentations


Presentation on theme: "Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨."— Presentation transcript:

1 Map Reduce Programming Waue Chen

2 Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨

3 What is Hadoop Hadoop 是一個 open source 可運作於大規 模 cluster 上的平行分散式程式框架 提供一個分散式文件系統 HDFS ,用來在各 個節點上存儲數據 高容錯性,自動處理失敗節點 實現了 Google 的 MapReduce 算法

4 What is MapReduce 把應用程序分割成許多很小的工作單元 每個單元可以在任何節點上執行或運算

5 MapReduce: Example

6 MapReduce in Parallel: Example

7 Thinking in Hadoop:MapReduce HDFS Map Class Reduce Class Overall Configuration

8 Program prototype Class MR{ Class Map …{ } Class Reduce …{ } main(){ JobConf conf = new JobConf(“MR.class”); conf.setInputPath(“the_path_of_HDFS ”); conf.setMapperClass(Map.class); conf.setReduceClass(Reduce.class); JobClient.runJob(conf); }}

9 Word Count Sample Class WordCount{ main(){ JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // set path conf.setInputPath(new Path(“/user/waue/input”)); conf.setOutputPath(new Path(“counts”)); FileSystem.get(conf).delete(new Path(wc.outputPath)); // set map reduce conf.setOutputKeyClass(Text.class); // set every word as key conf.setOutputValueClass(IntWritable.class); // set 1 as value conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class); conf.setNumMapTasks(1); conf.setNumReduceTasks(1); // run JobClient.runJob(conf); }}

10 Word Count Sample class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map( LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); }}} 123456789123456789

11 Word Count Sample class ReduceClass extends MapReduceBase implements Reducer { IntWritable SumValue = new IntWritable(); public void reduce( Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); SumValue.set(sum); output.collect(key, SumValue); }} 1234567812345678

12 Result

13 MapReduce with HBase prototype Class MR_HBase{ Class Map extends …{ } Class Reduce extends…{ } main(){ JobConf conf = new …; conf.setInputPath(…); conf.setMapperClass( …); conf.setReduceClass( …); JobClient.runJob(conf); } HBase API: TableMap TableReduce

14 WordCountIntoHbase Sample Class WordCountIntoHbase{ main(){ BuildHTable build_table = new BuildHTable( Table_Name, ColumnF); if (!build_table.checkTableExist(Table_Name)) { if ( !build_table.createTable() ) System.err.println("create table error !"); } else System.out.println("Table existed !"); JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setInputPath(new Path(“/user/waue/input”)); //conf.setOutputPath(new Path(“counts”)); //FileSystem.get(conf).delete(new Path(wc.outputPath)); //conf.setOutputKeyClass(Text.class); // set every word as key //conf.setOutputValueClass(IntWritable.class); // set 1 as value //conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class); conf.setNumMapTasks(0); conf.setNumReduceTasks(1); JobClient.runJob(conf); }}

15 class ReduceClass extends TableReduce { Text col = new Text( “word:text” ); private MapWritable map = new MapWritable(); public void reduce( LongWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { ImmutableBytesWritable bytes = new ImmutableBytesWritable(values.next().getBytes()); map.clear(); map.put(col, bytes); output.collect(new Text(key.toString()), map); }} WordCountIntoHbase Sample 1234567812345678

16 result

17 WordCountFromHbase Word Counting from Hbase after WordCountIntoHbase run. In Trac …  http://trac.nchc.org.tw/cloud/browser/sample/h adoop- 0.16/tw/org/nchc/code/WordCountFromHBase.java http://trac.nchc.org.tw/cloud/browser/sample/h adoop- 0.16/tw/org/nchc/code/WordCountFromHBase.java

18 What’s HBaseRecordPro parse your record create Hbase set the first line as column qualify store in HBase Automatically Locally http://trac.nchc.org.tw/cloud/wiki/HBaseRe cordPro http://trac.nchc.org.tw/cloud/wiki/HBaseRe cordPro

19 HBaseRecordPro name:locate:years waue:taiwan:1981 rock:taiwan:1981 aso:taiwan:1981 jazz:taiwan:1982 Run HBaseRecordPro.java hql> Select * from Table;

20

21 Detailed Code Explanation Apache log parser http://trac.nchc.org.tw/cloud/wiki/LogParse r http://trac.nchc.org.tw/cloud/wiki/LogParse r

22 More.. ? Enjoy http://trac.nchc.org.tw/cloud/http://trac.nchc.org.tw/cloud/  How to code Hadoop in Eclipse  http://trac.nchc.org.tw/cloud/browser/hado op-eclipse.pdf http://trac.nchc.org.tw/cloud/browser/hado op-eclipse.pdf  Map Reduce in Hadoop/HBase Manual  http://trac.nchc.org.tw/cloud/wiki/MR_manual http://trac.nchc.org.tw/cloud/wiki/MR_manual  My Code sources  http://trac.nchc.org.tw/cloud/browser/sample/h adoop-0.16/tw/org/nchc/code http://trac.nchc.org.tw/cloud/browser/sample/h adoop-0.16/tw/org/nchc/code

23 Then.. ? Intrusion-Detection-System log parser  Count => the last  Format => 6 lines / 1 cell Apache Pig  Pig is a platform for analyzing large data sets that consists of a high-level language [**] [1:2189:3] BAD-TRAFFIC IP Proto 103 PIM [**] [Classification: Detection of a non-standard protocol or event] [Priority: 2] 07/08-14:57:56.500718 140.110.138.253 -> 224.0.0.13 PIM TTL:1 TOS:0xC0 ID:11078 IpLen:20 DgmLen:54 [Xref => http://cve.mitre.org/cgi- bin/cvename.cgi?name=2003-0567][Xref => http://www.securityfocus.com/bid/8211]

24 References API  http://hadoop.apache.org/hbase/docs/current/ api/index.html http://hadoop.apache.org/hbase/docs/current/ api/index.html  http://hadoop.apache.org/core/docs/r0.16.4/a pi/index.html http://hadoop.apache.org/core/docs/r0.16.4/a pi/index.html 用 Hadoop 進行分佈式並行編程 用 Hadoop 進行分佈式並行編程  http://www.ibm.com/developerworks/cn/opens ource/os-cn-hadoop1/index.html http://www.ibm.com/developerworks/cn/opens ource/os-cn-hadoop1/index.html


Download ppt "Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨."

Similar presentations


Ads by Google