인공지능연구실 이남기 ( beohemian@gmail.com ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( beohemian@gmail.com )

Slides:



Advertisements
Similar presentations
Hadoop Programming. Overview MapReduce Types Input Formats Output Formats Serialization Job g/apache/hadoop/mapreduce/package-
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Web MVC-2: Apache Struts Rimon Mikhaiel
CS246 TA Session: Hadoop Tutorial Peyman kazemian 1/11/2011.
Poly Hadoop CSC 550 May 22, 2007 Scott Griffin Daniel Jackson Alexander Sideropoulos Anton Snisarenko.
An Introduction to MapReduce: Abstractions and Beyond! -by- Timothy Carlstrom Joshua Dick Gerard Dwan Eric Griffel Zachary Kleinfeld Peter Lucia Evan May.
Java Intro. A First Java Program //The Hello, World! program in Java public class Hello { public static void main(String[] args) { System.out.println("Hello,
Introduction to Google MapReduce WING Group Meeting 13 Oct 2006 Hendra Setiawan.
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
MapReduce Programming Yue-Shan Chang. split 0 split 1 split 2 split 3 split 4 worker Master User Program output file 0 output file 1 (1) fork (2) assign.
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.
Tutorial on Hadoop Environment for ECE Login to the Hadoop Server Host name: , Port: If you are using Linux, you could simply.
An program As a simple example of socket programming we can implement a program that sends to a remote site As a simple example of socket.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
Help session: Unix basics Keith 9/9/2011. Login in Unix lab  User name: ug0xx Password: ece321 (initial)  The password will not be displayed on the.
Cassandra Installation Guide and Example Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
Recitation 1 CS0445 Data Structures Mehmud Abliz.
Data Analytics 김재윤 이성민 ( 팀장 ) 이용현 최찬희 하승도. Contents Part l 1. Introduction - Data Analytics Cases - What is Data Analytics? - OLTP, OLAP - ROLAP - MOLAP.
FUNCTIONS AND STORED PROCEDURES & FUNCTIONS AND PROTECTING A DB AND PHP (Chapters 9, 15, 18)
HAMS Technologies 1
Chapter 3 Servlet Basics. 1.Recall the Servlet Role 2.Basic Servlet Structure 3.A simple servlet that generates plain text 4.A servlet that generates.
Hadoop Introduction Wang Xiaobo Outline Install hadoop HDFS MapReduce WordCount Analyzing Compile image data TeleNav Confidential.
Big Data for Relational Practitioners Len Wyatt Program Manager Microsoft Corporation DBI225.
Writing a MapReduce Program 1. Agenda  How to use the Hadoop API to write a MapReduce program in Java  How to use the Streaming API to write Mappers.
Lecture 101 CS110 Lecture 10 Thursday, February Announcements –hw4 due tonight –Exam next Tuesday (sample posted) Agenda –questions –what’s on.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Unix Environment Input Output 2  List Content (ls) ◦ ls (list current directory) ◦ ls –all (include hidden files/folders)  Make directory (mkdir) ◦
1 Creating Web Services Presented by Ashraf Memon Hands-on Ghulam Memon, Longjiang Ding.
Before we start, please download: VirtualBox: – The Hortonworks Data Platform: –
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Set up environment for mapreduce developing on Hadoop.
Team3: Xiaokui Shu, Ron Cohen CS5604 at Virginia Tech December 6, 2010.
SQOOP INSTALLATION GUIDE Lecturer : Prof. Kyungbaek Kim Presenter : Zubair Amjad.
Cloud Computing Mapreduce (2) Keke Chen. Outline  Hadoop streaming example  Hadoop java API Framework important APIs  Mini-project.
Airlinecount CSCE 587 Spring Preliminary steps in the VM First: log in to vm Ex: ssh vm-hadoop-XX.cse.sc.edu -p222 Where: XX is the vm number assigned.
Hadoop&Hbase Developed Using JAVA USE NETBEANS IDE.
Sort in MapReduce. MapReduce Block 1 Block 2 Block 3 Block 4 Block 5 Map Reduce Output 1 Output 2 Shuffle/Sort.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Lecture 4: Mapreduce and Hadoop
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Unit 2 Hadoop and big data
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Introduction to Google MapReduce
Hadoop Architecture Mr. Sriram
Introduction to programming in java
Getting started with CentOS Linux
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
인공지능연구실 이남기 ( ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )
Set up environment for mapreduce developing on Hadoop
Lecture 17 (Hadoop: Getting Started)
Calculation of stock volatility using Hadoop and map-reduce
Intro to Java.
Airlinecount CSCE 587 Fall 2017.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Wordcount CSCE 587 Spring 2018.
Hadoop MapReduce Types
WordCount 빅데이터 분산컴퓨팅 박영택.
Wordcount CSCE 587 Spring 2018.
Java Intro.
Getting started with CentOS Linux
Lecture 18 (Hadoop: Programming Examples)
VI-SEEM data analysis service
Chapter X: Big Data.
Sadalage & Fowler (Amazon)
MapReduce Practice :WordCount
What is Serialization? Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing to.
Bryon Gill Pittsburgh Supercomputing Center
Hola Hadoop.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Presentation transcript:

인공지능연구실 이남기 ( beohemian@gmail.com ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( beohemian@gmail.com )

Environment Cloudera QuickStart VM with 5.4.2 Guide for Download http://ailab.ssu.ac.kr/rb/?c=8/29&cat=2015_2_%EC%9C%A0%EB%B9%84%EC %BF%BC%ED%84%B0%EC%8A%A4+%EC%9D%91%EC%9A%A9%EC%8B%9C% EC%8A%A4%ED%85%9C&uid=660

Contents Using HDFS Running MapReduce Job : WordCount How To Use How To Upload File How To View and Manipulate File Exercise Running MapReduce Job : WordCount Goal Remind MapReduce Code Review Run WordCount Program Extra Exercise : Number of Connection per Hour Meaningful Data from ‘Access Log’ Foundation of Regural Expression Run MapReduce Job Importing Data With Sqoop Review MySQL

Using HDFS With Exercise

Using HDFS How to use HDFS How to Upload File How to View and Manipulate File

Using HDFS – How To Use (1) You see a help message describing all the commands associated with HDFS $ hadoop fs

Using HDFS – How To Use (2) You see the contents of directory in HDFS: $ hadoop fs –ls / $ hadoop fs –ls /user $ hadoop fs –ls /user/cloudera

Exercise How To Use

Using HDFS – How To Upload File (1) Unzip ‘shakespeare.tar.gz’: $ cd ~/training_materials/developer/data $ tar zxvf shakespeare.tar.gz

Using HDFS – How To Upload File (2) Insert ‘shakespeare’ directory into HDFS: $ hadoop fs -put shakespeare /user/cloudera/shakespeare

Exercise How To Upload

Using HDFS – How To View and Manipulate Files (1) Remove directory $ hadoop fs –ls shakespeare $ hadoop fs –rm shakespeare/glossary

Using HDFS – How To View and Manipulate Files (2) Print the last 50 lines of Herny IV $ hadoop fs –cat shakespeare/histories \ | tail –n 50

Using HDFS – How To View and Manipulate Files (3) Download file and manipulate $ hadoop fs –get shakespeare/poems \ ~/shakepoems.txt $ less ~/shakepoems.txt If you want to know other command: $ hadoop fs

Exercise How To View and Manipulate Files

Running a MapReduce Job With Exercise

Running a MapReduce Job Goal Remind MapReduce Code Review Run WordCount Program

Running a MapReduce Job – Goal Works of Shakespeare Final Result ALL'S WELL THAT ENDS WELL DRAMATIS PERSONAE KING OF FRANCE (KING:) DUKE OF FLORENCE (DUKE:) BERTRAM Count of Rousillon. LAFEU an old lord. PAROLLES a follower of Bertram. Steward | | servants to the Countess of Rousillon.Clown | A Page. (Page:) COUNTESS OFROUSILLON mother to Bertram. (COUNTESS:) HELENA a gentlewoman protected by the Countess. … Key Value A 2027 ADAM 16 AARON 72 ABATE 1 ABIDE ABOUT 18 ACHIEVE ACKNOWN … Run WordCount We will submit a MapReduce job to count the number of occurrences of every word in the works of Shakespeare

Running a MapReduce Job – Mapper

Running a MapReduce Job – Shuffle & Sort

Running a MapReduce Job – SumReducer

Running a MapReduce Job – WordCount Code Review WordCount.java A simple MapReduce driver class WordMapper.java A Mapper class for the job SumReducer.java A reducer class for the job

Running a MapReduce Job – Code Review : WordCount.java public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.out.printf( "Usage: WordCount <input dir> <output dir>\n"); System.exit(-1); } Job job = new Job(); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1);

Running a MapReduce Job – Code Review : WordMapper.java Ex ) Text File => the cat sat on the mat The aardvark sat on the sofa public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); for (String word : line.split("\\W+")) { if (word.length() > 0) { context.write(new Text(word), new IntWritable(1)); } Key (LongWritable) Value (Text) 1000055 the cat sat on the mat 1000257 the aardvark sat one the sofa Write to context Object Key (Text) Value (IntWritable) the 1 cat … The map method runs once for each line of text in the input file.

Running a MapReduce Job – Code Review : WordReduce.java Input Data Output Data public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce (Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int wordCount = 0; for (IntWritable value : values) { wordCount += value.get(); } context.write(key, new IntWritable(wordCount)); Key Values aardvark [1] cat mat on [1,1] sat sofa the [1,1,1,1] … Key Value aardvark 1 cat mat on 2 sat sofa the 4 … SumReducer The reduce method runs once for each key received from the shuffle and sort phase of the MapReduce framework

Running a MapReduce Job – Run WordCount in HDFS Complie the three Java classes and Collect complied Java files into a JAR file: $ cd ~/{Your Workspace} $ javac –classpath `hadoop classpath` *.java $ jar cvf wc.jar *.class

Running a MapReduce Job – Run WordCount in HDFS Submit a MapReduce job to Hadoop using your JAR file to count the occurrences of each word in Shakespeare: $ hadoop jar wc.jar WordCount shakespeare \ wordcounts wc.jar – jar file WordCount – Class Name containing Main method(Driver Class) shakespeare – Input directory wordcounts – Output directory

Exercise MapReduce Job : WordCount

Extra Exercise MapReduce Job : Number of Connection per Hour

Extra Exercise – Meaningful data from ‘Access Log’ data Let's extract meaningful data from ‘Access Log’ data. 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/loading.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:04:42 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:04:43 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:06:22 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:06:23 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:13:28 -0700] "GET / HTTP/1.1" 200 524 …

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid [The time that the request was received.]

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.”

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status code

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client.

Extra Exercise – Meaningful data from ‘Access Log’ data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 IP Address – userid [The time that the request was received.] “The request line from the client is given in double quotes.” status-code size of object returned to the client. http://httpd.apache.org/docs/2.2/logs.html

Extra Exercise – Meaningful data from ‘Access Log’ data 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/loading.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:04:42 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:04:43 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:06:22 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:06:23 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:13:28 -0700] "GET / HTTP/1.1" 200 524 …

Extra Exercise – Meaningful data from ‘Access Log’ data 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/loading.gif HTTP/1.1" 304 – 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:04:42 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:04:43 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:06:22 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:06:23 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET / HTTP/1.1" 200 524 10.223.157.186 - - [15/Jul/2009:21:12:41 -0700] "GET /favicon.ico HTTP/1.1" 404 209 10.223.157.186 - - [15/Jul/2009:21:13:28 -0700] "GET / HTTP/1.1" 200 524 … How many times hourly connections?

Extra Exercise – Regular Expression 10.223.157.186 - - [15/Jul/2009:20:50:39 -0700] "GET /assets/img/closelabel.gif HTTP/1.1" 304 – Using Regural Expression \d : Matches any digit character(0-9). Ex )+1-(444)-555-1234 \w : Matches any word character. Ex )Hello World! \s : Matches any whitespace character. Ex )Hello World! + : Matches 1 or more the preceding token. Ex ) \w+ Hello World!

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise – Regular Expression [15/Jul/2009:20:50:39 -0700] \[\d+\/\w+\/\d+:\d+:\d+:\d+\s+-\w+\]

Extra Exercise [15/Jul/2009:20:50:39 -0700]

Extra Exercise – Run MapReduce [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:21:04:42 -0700] [15/Jul/2009:21:04:43 -0700] [15/Jul/2009:21:06:22 -0700] [15/Jul/2009:21:06:23 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:13:28 -0700] … Mapper Key Value 20 1 21 …

Extra Exercise – Run MapReduce [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:21:04:42 -0700] [15/Jul/2009:21:04:43 -0700] [15/Jul/2009:21:06:22 -0700] [15/Jul/2009:21:06:23 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:13:28 -0700] … Shuffle & Sort Key Values 20 [1,1,1] 21 [1,1,1,1,1, …] …

Extra Exercise – Run MapReduce [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:20:50:39 -0700] [15/Jul/2009:21:04:42 -0700] [15/Jul/2009:21:04:43 -0700] [15/Jul/2009:21:06:22 -0700] [15/Jul/2009:21:06:23 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:12:41 -0700] [15/Jul/2009:21:13:28 -0700] … Reducer Key Value 20 87681 21 85914 …

Extra Exercise – Run MapReduce Final Output Key Value 119827 1 165533 2 246174 3 273089 4 273020 5 264181 6 294837 7 312028 8 327732 9 300460 …

Extra Exercise

Exercise MapReduce Job : Number of Connection per Hour

Importing Data With Sqoop Review MySQL and Exercise

Importing Data With Sqoop Log on to MySQL: $ mysql --user=root \ --password=cloudera Select Database > use retail_db; Show Databases: > show databases;

Importing Data With Sqoop – Review MySQL (1) Log on to MySQL: $ mysql --user=root \ --password=cloudera Show Databases: > show databases; Select Databases: > use retail_db; Show Tables: > show tables;

Importing Data With Sqoop – Review MySQL (2) Review ‘customers’ table schema: > DESCRIBE customers;

Importing Data With Sqoop – Review MySQL (3) Review ‘customers’ table: > DESCRIBE customers; … > SELECT * FROM customers LIMIT 5;

Importing Data With Sqoop – How To Use (1) List the databases (schemas) in your database server: $ sqoop list-databases \ --connect jdbc:mysql://localhost \ --username root --password cloudera List the tables in the ‘retail_db’ database: $ sqoop list-tables \ --connect jdbc:mysql://localhost/retail_db \ --username root --password cloudera

Importing Data With Sqoop – How To Use (2) Import the ‘customers’ table into HDFS $ sqoop import \ --connect jdbc:mysql://localhost/retail_db \ --table customers --fields-terminated-by '\t' \ --username training --password training Verify that the command has worked $ hadoop fs –ls customers $ hadoop fs –tail movie/part-m-00000