Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.

Similar presentations


Presentation on theme: "Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim."— Presentation transcript:

1 Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim

2 Installation Guide Requirements Java 1.6 (this example using java-7-openjdk) Hadoop 0.23.x, 1.2.x, or 2.5.x (example using Hadoop 1.2.1)

3 Configuration Make sure you have installed Hadoop and can run Hadoop correctly Download Pig Stable Version (0.13) $ wget http://apache.tt.co.kr/pig/pig-0.13.0/pig-0.13.0.tar.gzhttp://apache.tt.co.kr/pig/pig-0.13.0/pig-0.13.0.tar.gz Unpack the downloaded Pig distribution and move it to preferred directory (example using /usr/local/pig/) $ tar -xvzf pig-0.13.0.tar.gz $ mv pig-0.13.0 /usr/local/pig Edit ~/.bashrc and add the following statement in the last line export PIG_HOME=/usr/local/pig export PATH=$PATH:$PIG_HOME/bin Test the Pig installation with simple command $pig -help

4 Practical Example Objective : Counting packet length between ip source and ip destination in the network traffic Running Hadoop Download Input files and copy them to HDFS -$ wget https://www.dropbox.com/s/k6li67bha12geet/input.txt?dl=1 -O input.txthttps://www.dropbox.com/s/k6li67bha12geet/input.txt?dl=1 -$ hadoop dfs –copyFromLocal input.txt /input/input.txt Note: get input file using tcpdump : tcpdump -n -i wlan0 >> input.txt

5 Screenshot Input File (input.txt) Enter grunt $ pig –x mapreduce

6 Load text file into a bag, stick entire line into element ‘line’ of type ’chararray’ RAW_LOGS = LOAD ‘/input/input.txt ' AS (line:chararray); Apply a schema to raw data LOGS_BASE = FOREACH RAW_LOGS GENERATE FLATTEN( (tuple(CHARARRAY,CHARARRAY,LONG))REGEX_EXTRACT_ALL(line,'.+\\s( \\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).+\\s(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).+length\\s+(\\d+)')) AS (IPS:chararray, IPD:chararray, S:long); Group traffic information by source IP addresses and destination IP addresses FLOW = GROUP LOGS_BASE BY (IPS, IPD);

7 Count the number of packet length by each IP address TRAFFIC = FOREACH FLOW {sorted = ORDER LOG_BASE by S DESC; GENERATE group, SUM(LOGS_BASE.S);} Store output data in HDFS (/output) STORE TRAFFIC INTO '/output';

8 SCREENSHOT EACH PROCESS

9

10

11 Screenshot Output File


Download ppt "Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim."

Similar presentations


Ads by Google