Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨.

Map Reduce Programming Waue Chen

Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨

What is Hadoop Hadoop 是一個 open source 可運作於大規模 cluster 上的平行分散式程式框架提供一個分散式文件系統 HDFS ，用來在各個節點上存儲數據高容錯性，自動處理失敗節點實現了 Google 的 MapReduce 算法

What is MapReduce 把應用程序分割成許多很小的工作單元每個單元可以在任何節點上執行或運算

MapReduce: Example

MapReduce in Parallel: Example

Thinking in Hadoop:MapReduce HDFS Map Class Reduce Class Overall Configuration

Program prototype Class MR{ Class Map …{ } Class Reduce …{ } main(){ JobConf conf = new JobConf(“MR.class”); conf.setInputPath(“the_path_of_HDFS ”); conf.setMapperClass(Map.class); conf.setReduceClass(Reduce.class); JobClient.runJob(conf); }}

Word Count Sample Class WordCount{ main(){ JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // set path conf.setInputPath(new Path(“/user/waue/input”)); conf.setOutputPath(new Path(“counts”)); FileSystem.get(conf).delete(new Path(wc.outputPath)); // set map reduce conf.setOutputKeyClass(Text.class); // set every word as key conf.setOutputValueClass(IntWritable.class); // set 1 as value conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class); conf.setNumMapTasks(1); conf.setNumReduceTasks(1); // run JobClient.runJob(conf); }}

Word Count Sample class MapClass extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map( LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); }}} 123456789123456789

Word Count Sample class ReduceClass extends MapReduceBase implements Reducer { IntWritable SumValue = new IntWritable(); public void reduce( Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); SumValue.set(sum); output.collect(key, SumValue); }} 1234567812345678

Result

MapReduce with HBase prototype Class MR_HBase{ Class Map extends …{ } Class Reduce extends…{ } main(){ JobConf conf = new …; conf.setInputPath(…); conf.setMapperClass( …); conf.setReduceClass( …); JobClient.runJob(conf); } HBase API: TableMap TableReduce

WordCountIntoHbase Sample Class WordCountIntoHbase{ main(){ BuildHTable build_table = new BuildHTable( Table_Name, ColumnF); if (!build_table.checkTableExist(Table_Name)) { if ( !build_table.createTable() ) System.err.println("create table error !"); } else System.out.println("Table existed !"); JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setInputPath(new Path(“/user/waue/input”)); //conf.setOutputPath(new Path(“counts”)); //FileSystem.get(conf).delete(new Path(wc.outputPath)); //conf.setOutputKeyClass(Text.class); // set every word as key //conf.setOutputValueClass(IntWritable.class); // set 1 as value //conf.setMapperClass(MapClass.class); conf.setReducerClass(ReduceClass.class); conf.setNumMapTasks(0); conf.setNumReduceTasks(1); JobClient.runJob(conf); }}

class ReduceClass extends TableReduce { Text col = new Text( “word:text” ); private MapWritable map = new MapWritable(); public void reduce( LongWritable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { ImmutableBytesWritable bytes = new ImmutableBytesWritable(values.next().getBytes()); map.clear(); map.put(col, bytes); output.collect(new Text(key.toString()), map); }} WordCountIntoHbase Sample 1234567812345678

result

WordCountFromHbase Word Counting from Hbase after WordCountIntoHbase run. In Trac …  http://trac.nchc.org.tw/cloud/browser/sample/h adoop- 0.16/tw/org/nchc/code/WordCountFromHBase.java http://trac.nchc.org.tw/cloud/browser/sample/h adoop- 0.16/tw/org/nchc/code/WordCountFromHBase.java

What’s HBaseRecordPro parse your record create Hbase set the first line as column qualify store in HBase Automatically Locally http://trac.nchc.org.tw/cloud/wiki/HBaseRe cordPro http://trac.nchc.org.tw/cloud/wiki/HBaseRe cordPro

HBaseRecordPro name:locate:years waue:taiwan:1981 rock:taiwan:1981 aso:taiwan:1981 jazz:taiwan:1982 Run HBaseRecordPro.java hql> Select * from Table;

Detailed Code Explanation Apache log parser http://trac.nchc.org.tw/cloud/wiki/LogParse r http://trac.nchc.org.tw/cloud/wiki/LogParse r

More.. ? Enjoy http://trac.nchc.org.tw/cloud/http://trac.nchc.org.tw/cloud/  How to code Hadoop in Eclipse  http://trac.nchc.org.tw/cloud/browser/hado op-eclipse.pdf http://trac.nchc.org.tw/cloud/browser/hado op-eclipse.pdf  Map Reduce in Hadoop/HBase Manual  http://trac.nchc.org.tw/cloud/wiki/MR_manual http://trac.nchc.org.tw/cloud/wiki/MR_manual  My Code sources  http://trac.nchc.org.tw/cloud/browser/sample/h adoop-0.16/tw/org/nchc/code http://trac.nchc.org.tw/cloud/browser/sample/h adoop-0.16/tw/org/nchc/code

Then.. ? Intrusion-Detection-System log parser  Count => the last  Format => 6 lines / 1 cell Apache Pig  Pig is a platform for analyzing large data sets that consists of a high-level language [**] [1:2189:3] BAD-TRAFFIC IP Proto 103 PIM [**] [Classification: Detection of a non-standard protocol or event] [Priority: 2] 07/08-14:57:56.500718 140.110.138.253 -> 224.0.0.13 PIM TTL:1 TOS:0xC0 ID:11078 IpLen:20 DgmLen:54 [Xref => http://cve.mitre.org/cgi- bin/cvename.cgi?name=2003-0567][Xref => http://www.securityfocus.com/bid/8211]

References API  http://hadoop.apache.org/hbase/docs/current/ api/index.html http://hadoop.apache.org/hbase/docs/current/ api/index.html  http://hadoop.apache.org/core/docs/r0.16.4/a pi/index.html http://hadoop.apache.org/core/docs/r0.16.4/a pi/index.html 用 Hadoop 進行分佈式並行編程用 Hadoop 進行分佈式並行編程  http://www.ibm.com/developerworks/cn/opens ource/os-cn-hadoop1/index.html http://www.ibm.com/developerworks/cn/opens ource/os-cn-hadoop1/index.html

Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨.

Similar presentations

Presentation on theme: "Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨.

Similar presentations

Presentation on theme: "Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨."— Presentation transcript:

Similar presentations

About project

Feedback

Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨.

Presentation on theme: "Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月， CPU 的主頻就會增加一倍  2005 開始失效多核及平行運算時代來臨."— Presentation transcript: