Download presentation
Presentation is loading. Please wait.
Published byHugh O’Connor’ Modified over 9 years ago
1
Team3: Xiaokui Shu, Ron Cohen subx@vt.edu roncohen@vt.edu CS5604 at Virginia Tech December 6, 2010
2
Introduction Hadoop MapReduce Working With Hadoop Environment MapReduce Programming Summary
3
Is a software framework User should program Like a super-library For distributed applications Build-in solutions Solutions depend on this framework Inspired by Google's MapReduce and Google File System (GFS) papers
4
Who use Hadoop A9.com – Amazon ▪ Amazon's product search indices Adobe ▪ 30 nodes running HDFS, Hadoop and Hbase Baidu ▪ handle about 3000TB per week Facebook ▪ store copies of internal log and dimension data sources Last.fm, LinkedIn, IBM, Yahoo!, Google…
5
Hadoop Common HDFS MapReduce ZooKeeper
6
Connections to the IR book Ch.4 Index construction ▪ Distributed indexing (4.4) Ch.20 Web crawling and indexes ▪ Distributed crawler (20.2) ▪ Distributed indexing (20.3)
7
Is a software framework For distributed computing Mass amount of data Simple processing requirement Portability across variety platforms ▪ Clusters ▪ CMP/SMP ▪ GPGPU Introduced by Google
8
Cited from MapReduce: Simplified Data Processing on Large Clusters
9
Map Map(k1,v1) -> list(k2,v2) Reduce Reduce(k2, list (v2)) -> list(v3) Hadoop MapReduce (input) -> map -> -> combine -> -> reduce -> (output)
10
Source $cat file01 Hello World Bye World $cat file02 Hello Hadoop Goodbye Hadoop $
11
Map Output For File01 For File02
12
Reduce Output
13
More input More mappers Combiner Function after Map More reducers Partition Function before Reduce Focus on Map & Reduce
14
Hadoop in Java (C++) Run in 3 modes Local (Standalone) Mode Pseudo-Distributed Mode Fully-Distributed Mode It is setup to Pseudo-Distributed Mode in our instance on IBM cloud
15
Process 1. Start Hadoop service 2. Prepare input 3. Write your MapReduce program 4. Compile your program 5. Run your application with Hadoop
16
Start Hadoop service $ bin/hadoop namenode -format $ bin/start-all.sh Initialize filesystem $ bin/hadoop fs -put localdir hinputdir You can also use -get, -rm, -cat with fs
17
Compile your program & create jar $ javac -classpath ${HADOOP}-core.jar -d wordcount_classes WordCount.java $ jar -cvf wordcount.jar -C wordcount_classes/. Run your application with Hadoop $ bin/hadoop jar wordcount.jar org.myorg.WordCount hinputdir houtputdir
18
void map(String name, String document): // name: document name // document: document contents for each word w in document: EmitIntermediate(w, "1"); void reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts int result = 0; for each pc in partialCounts: result += ParseInt(pc); Emit(AsString(result)); Cited from Wikipedia
19
public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); }
20
public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); }
21
Configurations & Main class Leave other work for the Hadoop MapReduce Framework
22
Hadoop Introduction Connections to the IR book MapReduce Overview E.g. WordCount Environment configuration Writing your MapReduce application
23
Hadoop Project http://hadoop.apache.org/ MapReduce in Hadoop http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html MapReduce: Simplified Data Processing on Large Clusters http://portal.acm.org/citation.cfm?id=1327452.1327492&coll=GUIDE&dl=&idx=J7 9&part=magazine&WantType=Magazines&title=Communications%20of%20the %20ACM Hadoop Single-Node Setup http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html Who use Hadoop http://wiki.apache.org/hadoop/PoweredBy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.