Download presentation
Presentation is loading. Please wait.
Published byElla Rice Modified over 6 years ago
1
인공지능연구실 이남기 ( beohemian@gmail.com )
유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )
2
Environment Cloudera QuickStart VM with 5.4.2 Guide for Download
%BF%BC%ED%84%B0%EC%8A%A4+%EC%9D%91%EC%9A%A9%EC%8B%9C% EC%8A%A4%ED%85%9C&uid=660
3
Contents Using HDFS Running MapReduce Job : WordCount
How To Use How To Upload File How To View and Manipulate File Exercise Running MapReduce Job : WordCount Goal Remind MapReduce Code Review Run WordCount Program Extra Exercise : Number of Connection per Hour Meaningful Data from ‘Access Log’ Foundation of Regural Expression Run MapReduce Job Importing Data With Sqoop Review MySQL
4
Using HDFS With Exercise
5
Using HDFS How to use HDFS How to Upload File
How to View and Manipulate File
6
Using HDFS – How To Use (1)
You see a help message describing all the commands associated with HDFS $ hadoop fs
7
Using HDFS – How To Use (2)
You see the contents of directory in HDFS: $ hadoop fs –ls / $ hadoop fs –ls /user $ hadoop fs –ls /user/cloudera
8
Exercise How To Use
9
Using HDFS – How To Upload File (1)
Unzip ‘shakespeare.tar.gz’: $ cd ~/training_materials/developer/data $ tar zxvf shakespeare.tar.gz
10
Using HDFS – How To Upload File (2)
Insert ‘shakespeare’ directory into HDFS: $ hadoop fs -put shakespeare /user/cloudera/shakespeare
11
Exercise How To Upload
12
Using HDFS – How To View and Manipulate Files (1)
Remove directory $ hadoop fs –ls shakespeare $ hadoop fs –rm shakespeare/glossary
13
Using HDFS – How To View and Manipulate Files (2)
Print the last 50 lines of Herny IV $ hadoop fs –cat shakespeare/histories \ | tail –n 50
14
Using HDFS – How To View and Manipulate Files (3)
Download file and manipulate $ hadoop fs –get shakespeare/poems \ ~/shakepoems.txt $ less ~/shakepoems.txt If you want to know other command: $ hadoop fs
15
Exercise How To View and Manipulate Files
16
Running a MapReduce Job
With Exercise
17
Running a MapReduce Job
Goal Remind MapReduce Code Review Run WordCount Program
18
Running a MapReduce Job – Goal
Works of Shakespeare Final Result ALL'S WELL THAT ENDS WELL DRAMATIS PERSONAE KING OF FRANCE (KING:) DUKE OF FLORENCE (DUKE:) BERTRAM Count of Rousillon. LAFEU an old lord. PAROLLES a follower of Bertram. Steward | | servants to the Countess of Rousillon.Clown | A Page. (Page:) COUNTESS OFROUSILLON mother to Bertram. (COUNTESS:) HELENA a gentlewoman protected by the Countess. … Key Value A 2027 ADAM 16 AARON 72 ABATE 1 ABIDE ABOUT 18 ACHIEVE ACKNOWN … Run WordCount We will submit a MapReduce job to count the number of occurrences of every word in the works of Shakespeare
19
Running a MapReduce Job – Mapper
20
Running a MapReduce Job – Shuffle & Sort
21
Running a MapReduce Job – SumReducer
22
Running a MapReduce Job – WordCount Code Review
WordCount.java A simple MapReduce driver class WordMapper.java A Mapper class for the job SumReducer.java A reducer class for the job
23
Running a MapReduce Job – Code Review : WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.printf( "Usage: WordCount <input dir> <output dir>\n"); System.exit(-1); } Job job = new Job(); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1);
24
Running a MapReduce Job – Code Review : WordMapper.java
Ex ) Text File => the cat sat on the mat The aardvark sat on the sofa public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); for (String word : line.split("\\W+")) { if (word.length() > 0) { context.write(new Text(word), new IntWritable(1)); } Key (LongWritable) Value (Text) the cat sat on the mat the aardvark sat one the sofa Write to context Object Key (Text) Value (IntWritable) the 1 cat … The map method runs once for each line of text in the input file.
25
Running a MapReduce Job – Code Review : WordReduce.java
Input Data Output Data public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce (Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for (IntWritable value : values) { wordCount += value.get(); } context.write(key, new IntWritable(wordCount)); Key Values aardvark [1] cat mat on [1,1] sat sofa the [1,1,1,1] … Key Value aardvark 1 cat mat on 2 sat sofa the 4 … SumReducer The reduce method runs once for each key received from the shuffle and sort phase of the MapReduce framework
26
Running a MapReduce Job – Run WordCount in HDFS
Complie the three Java classes and Collect complied Java files into a JAR file: $ cd ~/{Your Workspace} $ javac –classpath `hadoop classpath` *.java $ jar cvf wc.jar *.class
27
Running a MapReduce Job – Run WordCount in HDFS
Submit a MapReduce job to Hadoop using your JAR file to count the occurrences of each word in Shakespeare: $ hadoop jar wc.jar WordCount shakespeare \ wordcounts wc.jar – jar file WordCount – Class Name containing Main method(Driver Class) shakespeare – Input directory wordcounts – Output directory
28
Exercise MapReduce Job : WordCount
29
Extra Exercise MapReduce Job : Number of Connection per Hour
30
Extra Exercise – Meaningful data from ‘Access Log’ data
Let's extract meaningful data from ‘Access Log’ data. [15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" …
31
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address
32
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid
33
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.]
34
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.”
35
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status code
36
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client.
37
Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client.
38
Extra Exercise – Meaningful data from ‘Access Log’ data
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" …
39
Extra Exercise – Meaningful data from ‘Access Log’ data
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" … How many times hourly connections?
40
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – Using Regural Expression \d : Matches any digit character(0-9). Ex )+1-(444) \w : Matches any word character. Ex )Hello World! \s : Matches any whitespace character. Ex )Hello World! + : Matches 1 or more the preceding token. Ex ) \w+ Hello World!
41
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
42
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
43
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
44
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
45
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
46
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
47
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
48
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
49
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
50
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
51
Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]
52
Extra Exercise [15/Jul/2009:20:50: ]
53
Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Mapper Key Value 20 1 21 …
54
Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Shuffle & Sort Key Values 20 [1,1,1] 21 [1,1,1,1,1, …] …
55
Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Reducer Key Value 20 87681 21 85914 …
56
Extra Exercise – Run MapReduce
Final Output Key Value 119827 1 165533 2 246174 3 273089 4 273020 5 264181 6 294837 7 312028 8 327732 9 300460 …
57
Extra Exercise
58
Exercise MapReduce Job : Number of Connection per Hour
59
Importing Data With Sqoop
Review MySQL and Exercise
60
Importing Data With Sqoop
Log on to MySQL: $ mysql --user=root \ --password=cloudera Select Database > use retail_db; Show Databases: > show databases;
61
Importing Data With Sqoop – Review MySQL (1)
Log on to MySQL: $ mysql --user=root \ --password=cloudera Show Databases: > show databases; Select Databases: > use retail_db; Show Tables: > show tables;
62
Importing Data With Sqoop – Review MySQL (2)
Review ‘customers’ table schema: > DESCRIBE customers;
63
Importing Data With Sqoop – Review MySQL (3)
Review ‘customers’ table: > DESCRIBE customers; … > SELECT * FROM customers LIMIT 5;
64
Importing Data With Sqoop – How To Use (1)
List the databases (schemas) in your database server: $ sqoop list-databases \ --connect jdbc:mysql://localhost \ --username root --password cloudera List the tables in the ‘retail_db’ database: $ sqoop list-tables \ --connect jdbc:mysql://localhost/retail_db \ --username root --password cloudera
65
Importing Data With Sqoop – How To Use (2)
Import the ‘customers’ table into HDFS $ sqoop import \ --connect jdbc:mysql://localhost/retail_db \ --table customers --fields-terminated-by '\t' \ --username training --password training Verify that the command has worked $ hadoop fs –ls customers $ hadoop fs –tail movie/part-m-00000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.