Download presentation
Presentation is loading. Please wait.
Published byJames Stevens Modified over 9 years ago
1
Presented by Priagung Khusumanegara Prof. Kyungbaek Kim
APACHE PIG Presented by Priagung Khusumanegara Prof. Kyungbaek Kim
2
Agenda Introducing Pig Pig Characteristics Pig Element
Pig Latin Foundation Data Flow Pig Feature Data Types Pig Operator and Function
3
Pig Characteristics A platform for analyzing large data sets that runs on top Hadoop Provides a high-level language for expressing data analysis Uses both HDFS (read and write files) and MapReduce (execute jobs)
4
Pig Elements Pig Latin High-level scripting language
Designed specifically for data transformation and flow expression Grunt The environment in which Pig Latin commands are executed Currently there is support for Local and Hadoop modes. Pig Interpreter Pig interpreter converts Pig Latin to MapReduce
5
Pig Latin Data Flow A LOAD statement to read data from the file system. A series of "transformation" statements to process the data. A DUMP statement to view results or a STORE statement to save the results. LOAD TRANSFORM DUMP OR STORE
6
Running Pig Script - Execute commands in a file - $ pig scriptFile.pig
Grunt - Interactive shell for executing Pig Commands - Started when script file is NOT provided
7
Running Modes Local Executes in a single JVM
Works exclusively with local file system Great for development, experimentation and prototyping Hadoop Mode Also known as MapReduce mode Pig renders Pig Latin into MapReduce jobs and executes them on the cluster Can execute against pseudo-distributed or fully distributed
8
Running Modes $pig -x local $pig -x mapreduce
9
Hadoop Mode
10
Pig Relation Pig Latin statements work with relation
A field is a piece of data 19 A tuple is an ordered set of fields (19,2) A bag is a collection of unordered tuples {(19,2), (18,1)} A relation is a bag Field Tuple Field Field Bag
11
Data Type Data Type int Description Signed 32-bit integer Example 10
long Signed 64-bit integer Data: 10L or 10l Display: 10L float 32-bit floating point Data: 10.5F or 10.5f or 10.5e2f or 10.5E2F Display: 10.5F or F double 64-bit floating point Data: 10.5 or 10.5e2 or 10.5E2 Display: 10.5 or chararray Character array (string) in Unicode UTF-8 format hello world boolean true/false (case insensitive) datetime T00:00: :00
12
LOAD operator schema Load contents of text files into a bag names data
13
DUMP and STORE operator
No action is taken until DUMP or STORE commands are encountered Pig will parse, validate and analyzed statements but not execute them DUMP – display the results to screen STORE – save results to a file
14
DUMP and STORE operator
DUMP Example STORE Example
15
FILTER and GROUP operator
Filter the data bag Group bag filtered by score
16
ORDER operator Note: For descending order
Sorted = ORDER data BY score DESC;
17
FOREACH operator For each row emit score, status fields
18
DISTINCT operator Remove duplicate tuples in bag
19
UNION operator Merge the contents of two or more bags
20
JOIN operator Bag data1 and data2 are joined by their first fields.
21
SUM, MIN, AVG Function Note: find min value : MIN find sum value : SUM
find average value : AVG
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.