Presentation is loading. Please wait.

Presentation is loading. Please wait.

인공지능연구실 이남기 ( beohemian@gmail.com ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( beohemian@gmail.com )

Similar presentations


Presentation on theme: "인공지능연구실 이남기 ( beohemian@gmail.com ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( beohemian@gmail.com )"— Presentation transcript:

1 인공지능연구실 이남기 ( beohemian@gmail.com )
유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )

2 Environment Cloudera QuickStart VM with 5.4.2 Guide for Download
%BF%BC%ED%84%B0%EC%8A%A4+%EC%9D%91%EC%9A%A9%EC%8B%9C% EC%8A%A4%ED%85%9C&uid=660

3 Contents Using HDFS Running MapReduce Job : WordCount
How To Use How To Upload File How To View and Manipulate File Exercise Running MapReduce Job : WordCount Goal Remind MapReduce Code Review Run WordCount Program Extra Exercise : Number of Connection per Hour Meaningful Data from ‘Access Log’ Foundation of Regural Expression Run MapReduce Job Importing Data With Sqoop Review MySQL

4 Using HDFS With Exercise

5 Using HDFS How to use HDFS How to Upload File
How to View and Manipulate File

6 Using HDFS – How To Use (1)
You see a help message describing all the commands associated with HDFS $ hadoop fs

7 Using HDFS – How To Use (2)
You see the contents of directory in HDFS: $ hadoop fs –ls / $ hadoop fs –ls /user $ hadoop fs –ls /user/cloudera

8 Exercise How To Use

9 Using HDFS – How To Upload File (1)
Unzip ‘shakespeare.tar.gz’: $ cd ~/training_materials/developer/data $ tar zxvf shakespeare.tar.gz

10 Using HDFS – How To Upload File (2)
Insert ‘shakespeare’ directory into HDFS: $ hadoop fs -put shakespeare /user/cloudera/shakespeare

11 Exercise How To Upload

12 Using HDFS – How To View and Manipulate Files (1)
Remove directory $ hadoop fs –ls shakespeare $ hadoop fs –rm shakespeare/glossary

13 Using HDFS – How To View and Manipulate Files (2)
Print the last 50 lines of Herny IV $ hadoop fs –cat shakespeare/histories \ | tail –n 50

14 Using HDFS – How To View and Manipulate Files (3)
Download file and manipulate $ hadoop fs –get shakespeare/poems \ ~/shakepoems.txt $ less ~/shakepoems.txt If you want to know other command: $ hadoop fs

15 Exercise How To View and Manipulate Files

16 Running a MapReduce Job
With Exercise

17 Running a MapReduce Job
Goal Remind MapReduce Code Review Run WordCount Program

18 Running a MapReduce Job – Goal
Works of Shakespeare Final Result ALL'S WELL THAT ENDS WELL DRAMATIS PERSONAE KING OF FRANCE (KING:) DUKE OF FLORENCE (DUKE:) BERTRAM Count of Rousillon. LAFEU an old lord. PAROLLES a follower of Bertram. Steward | | servants to the Countess of Rousillon.Clown | A Page. (Page:) COUNTESS OFROUSILLON mother to Bertram. (COUNTESS:) HELENA a gentlewoman protected by the Countess. Key Value A 2027 ADAM 16 AARON 72 ABATE 1 ABIDE ABOUT 18 ACHIEVE ACKNOWN Run WordCount We will submit a MapReduce job to count the number of occurrences of every word in the works of Shakespeare

19 Running a MapReduce Job – Mapper

20 Running a MapReduce Job – Shuffle & Sort

21 Running a MapReduce Job – SumReducer

22 Running a MapReduce Job – WordCount Code Review
WordCount.java A simple MapReduce driver class WordMapper.java A Mapper class for the job SumReducer.java A reducer class for the job

23 Running a MapReduce Job – Code Review : WordCount.java
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.printf( "Usage: WordCount <input dir> <output dir>\n"); System.exit(-1); } Job job = new Job(); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1);

24 Running a MapReduce Job – Code Review : WordMapper.java
Ex ) Text File => the cat sat on the mat The aardvark sat on the sofa public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); for (String word : line.split("\\W+")) { if (word.length() > 0) { context.write(new Text(word), new IntWritable(1)); } Key (LongWritable) Value (Text) the cat sat on the mat the aardvark sat one the sofa Write to context Object Key (Text) Value (IntWritable) the 1 cat The map method runs once for each line of text in the input file.

25 Running a MapReduce Job – Code Review : WordReduce.java
Input Data Output Data public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce (Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for (IntWritable value : values) { wordCount += value.get(); } context.write(key, new IntWritable(wordCount)); Key Values aardvark [1] cat mat on [1,1] sat sofa the [1,1,1,1] Key Value aardvark 1 cat mat on 2 sat sofa the 4 SumReducer The reduce method runs once for each key received from the shuffle and sort phase of the MapReduce framework

26 Running a MapReduce Job – Run WordCount in HDFS
Complie the three Java classes and Collect complied Java files into a JAR file: $ cd ~/{Your Workspace} $ javac –classpath `hadoop classpath` *.java $ jar cvf wc.jar *.class

27 Running a MapReduce Job – Run WordCount in HDFS
Submit a MapReduce job to Hadoop using your JAR file to count the occurrences of each word in Shakespeare: $ hadoop jar wc.jar WordCount shakespeare \ wordcounts wc.jar – jar file WordCount – Class Name containing Main method(Driver Class) shakespeare – Input directory wordcounts – Output directory

28 Exercise MapReduce Job : WordCount

29 Extra Exercise MapReduce Job : Number of Connection per Hour

30 Extra Exercise – Meaningful data from ‘Access Log’ data
Let's extract meaningful data from ‘Access Log’ data. [15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" …

31 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address

32 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid

33 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.]

34 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.”

35 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status code

36 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client.

37 Extra Exercise – Meaningful data from ‘Access Log’ data
frank [10/Oct/2000:13:55: ] "GET /apache_pb.gif HTTP/1.0" IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client.

38 Extra Exercise – Meaningful data from ‘Access Log’ data
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" …

39 Extra Exercise – Meaningful data from ‘Access Log’ data
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /assets/img/loading.gif HTTP/1.1" 304 – [15/Jul/2009:20:50: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:04: ] "GET / HTTP/1.1" [15/Jul/2009:21:04: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:06: ] "GET / HTTP/1.1" [15/Jul/2009:21:06: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:12: ] "GET / HTTP/1.1" [15/Jul/2009:21:12: ] "GET /favicon.ico HTTP/1.1" [15/Jul/2009:21:13: ] "GET / HTTP/1.1" … How many times hourly connections?

40 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – Using Regural Expression \d : Matches any digit character(0-9). Ex )+1-(444) \w : Matches any word character. Ex )Hello World! \s : Matches any whitespace character. Ex )Hello World! + : Matches 1 or more the preceding token. Ex ) \w+ Hello World!

41 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

42 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

43 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

44 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

45 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

46 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

47 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

48 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

49 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

50 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

51 Extra Exercise – Regular Expression
[15/Jul/2009:20:50: ] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

52 Extra Exercise [15/Jul/2009:20:50: ]

53 Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Mapper Key Value 20 1 21

54 Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Shuffle & Sort Key Values 20 [1,1,1] 21 [1,1,1,1,1, …]

55 Extra Exercise – Run MapReduce
[15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:20:50: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:04: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:06: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:12: ] [15/Jul/2009:21:13: ] … Reducer Key Value 20 87681 21 85914

56 Extra Exercise – Run MapReduce
Final Output Key Value 119827 1 165533 2 246174 3 273089 4 273020 5 264181 6 294837 7 312028 8 327732 9 300460

57 Extra Exercise

58 Exercise MapReduce Job : Number of Connection per Hour

59 Importing Data With Sqoop
Review MySQL and Exercise

60 Importing Data With Sqoop
Log on to MySQL: $ mysql --user=root \ --password=cloudera Select Database > use retail_db; Show Databases: > show databases;

61 Importing Data With Sqoop – Review MySQL (1)
Log on to MySQL: $ mysql --user=root \ --password=cloudera Show Databases: > show databases; Select Databases: > use retail_db; Show Tables: > show tables;

62 Importing Data With Sqoop – Review MySQL (2)
Review ‘customers’ table schema: > DESCRIBE customers;

63 Importing Data With Sqoop – Review MySQL (3)
Review ‘customers’ table: > DESCRIBE customers; … > SELECT * FROM customers LIMIT 5;

64 Importing Data With Sqoop – How To Use (1)
List the databases (schemas) in your database server: $ sqoop list-databases \ --connect jdbc:mysql://localhost \ --username root --password cloudera List the tables in the ‘retail_db’ database: $ sqoop list-tables \ --connect jdbc:mysql://localhost/retail_db \ --username root --password cloudera

65 Importing Data With Sqoop – How To Use (2)
Import the ‘customers’ table into HDFS $ sqoop import \ --connect jdbc:mysql://localhost/retail_db \ --table customers --fields-terminated-by '\t' \ --username training --password training Verify that the command has worked $ hadoop fs –ls customers $ hadoop fs –tail movie/part-m-00000


Download ppt "인공지능연구실 이남기 ( beohemian@gmail.com ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( beohemian@gmail.com )"

Similar presentations


Ads by Google