Download presentation
Published byBryan Kelly Modified over 11 years ago
1
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
ITO483-PRINCIPLES OF CLOUD COMPUTING Unit-5. CASE STUDY : Amazon Case Study. Introduction to MapReduce: Discussion of Google Paper, GFS, HDFS, Hadoop Framework. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
2
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
AMAZON WEB SERVICE Using Amazon Web Services, an e-commerce web site can weather unforeseen demand with ease; a pharmaceutical company can “rent” computing power to execute large-scale simulations; a media company can serve unlimited videos, music, and more; and an enterprise can deploy bandwidth-consuming services and training to its mobile workforce IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
3
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
BENEFITS No contracts or commitments Pay as you go Transparent pricing Better economics Better use of your time Better environmental impact IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
4
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
MAP REDUCE The idea of Map, and Reduce is 40+ year old Present in all Functional Programming Languages. See, e.g., APL, Lisp and ML Alternate names for Map: Apply-All Higher Order Functions take function definitions as arguments, or return a function as output Map and Reduce are higher-order functions IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
5
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
MAP REDUCE F(x: int) returns r: int Let V be an array of integers. W = map(F, V) W[i] = F(V[i]) for all I i.e., apply F to every element of V IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
6
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
reduce: A Higher Order Function reduce also known as fold, accumulate, compress or inject Reduce/fold takes in a function and folds it in between the elements of a list. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
7
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
Map/Reduce Implementation Idea MapReduce and Distributed File System framework for large commodity clusters Master/Slave relationship JobTracker handles all scheduling & data flow between TaskTrackers TaskTracker handles all worker tasks on a node Individual worker task runs map or reduce operation Integrates with HDFS for data locality IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
8
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
Hadoop Supported File Systems HDFS: Hadoop's own file system. Amazon S3 file system. Targeted at clusters hosted on the Amazon Elastic Compute Cloud server-on-demand infrastructure Not rack-aware CloudStore previously Kosmos Distributed File System like HDFS, this is rack-aware. FTP Filesystem stored on remote FTP servers. Read-only HTTP and HTTPS file systems. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
9
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
HDFS: Hadoop Distr File System Designed to scale to petabytes of storage, and run on top of the file systems of the underlying OS. Master (“NameNode”) handles replication, deletion, creation Slave (“DataNode”) handles data retrieval Files stored in many blocks Each block has a block Id Block Id associated with several nodes hostname:port (depending on level of replication) IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
10
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
Hadoop v. ‘MapReduce MapReduce is also the name of a framework developed by Google Hadoop was initially developed by Yahoo and now part of the Apache group. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
11
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
MapReduce Hadoop Org Google Yahoo/Apache Impl C++ Java Distributed File Sys GFS HDFS Data Base Bigtable HBase Distributed lock mgr Chubby ZooKeeper IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
12
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
3/25/2017
13
A Simple Hadoop Example http://wiki.apache.org/hadoop/WordCount
wordCount A Simple Hadoop Example
14
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
Word Count Example Read text files and count how often words occur. The input is text files The output is a text file each line: word, tab, count Map: Produce pairs of (word, count) Reduce: For each word, sum up the counts. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
15
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
WordCount Overview 3 import ... 12 public class WordCount { 13 14 public static class Map extends MapReduceBase implements Mapper ... { 17 public void map ... 26 } 27 28 public static class Reduce extends MapReduceBase implements Reducer ... { 29 public void reduce ... 37 } 38 39 public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 JobClient.runJob(conf); 57 } 58 59 } IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
16
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
wordCount Mapper 14 public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); 17 public void map( LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } 26 } IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
17
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
wordCount Reducer 28 public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 29 public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } 37 } IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
18
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
wordCount JobConf JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); 42 conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); 45 conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); 49 conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
19
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
WordCount main 39 public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); 42 conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); 45 conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); 49 conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); 52 FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); 55 JobClient.runJob(conf); 57 } IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
20
Invocation of wordcount
/usr/local/bin/hadoop dfs -mkdir <hdfs-dir> /usr/local/bin/hadoop dfs -copyFromLocal <local-dir> <hdfs-dir> /usr/local/bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir> IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
21
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
GFS Google File System (GFS or GoogleFS) is a proprietary distributed file system developed by Google Inc. for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
22
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
HDFS Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
23
IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN
Review questions Part A 1) What is the Apache Hadoop? 2) Mention the uses of Amazon EC2 Cloud Computing services 3) What is meant by MapReduce? 4) Mention the hot spots of MapReduce framework 5) What are the different steps in MapReduce framework 6) What is the use of Map Partition function 7) Mention the uses of MapReduce function 8) Differentiate job tracker and task tracker 9) What is the algorithm used in scheduling of Hadoop? 10) What is meant by fair scheduler. Mention its uses 11) What is meant by capacity scheduler 12) Mention any four applications of Hadoop 13) Who are the all the main users of Hadoop 14) Mention any four commercially supported Hadoop related products IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 3/25/2017
24
IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN
Part B 1) Draw and explain about Hadoop architecture 2) Explain about Hadoop File System 3) Explain about Amazon EC2 Cloud Computing Case Study for financial organization 4) Explain the concept of MapReduce 5) Explain the concept of Google File System IT0483-PRINCIPLES OF CLOUD COMPUTING ,N.ARIVAZHAGAN 3/25/2017
25
IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN
REFERENCES 1. IT0483-PRINCIPLES OF CLOUD COMPUTING,N.ARIVAZHAGAN 3/25/2017
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.