Download presentation
Presentation is loading. Please wait.
1
Pentaho Data Integration
2014 3/26
2
Outline Introduction Installation Key point Demo
3
Introduction A single product, but consists of multiple programs that are used in different phases of the ETL development and deployment cycle. Each program serves a particular purpose and is more or less independent of the others. All of the programs depend on a common set of Java archives that make up the actual data integration engine.
4
Overview of Kettle programs
5
Spoon The integrated development environment
GUI that allows you to design transformations and jobs that can run with the Kettle tools-Pan and Kitchen Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. It also includes functionality for performance monitoring.
6
Pan & Kitchen Pan : A command line–driven program for transformation
A program to execute transformation designed by Spoon in XML or database repository Transformations are scheduled in batch mode to be run automatically at regular intervals. Kitchen : A command line–driven job runner Execute jobs designed by Spoon in XML or database
7
Carte Simple web server to execute transformations and jobs remotely
Allows to remotely monitor, start and stop the transformations and jobs. Accept an XML that contains transformation to execute and the execution configuration.
8
安裝KETTLE Download from http://community.pentaho.com/
Set environmental variable $PENTAHO_JAVA_HOME Extract the file and execute “Spoon.bat”
9
Settiing HDFS 下載對應版本的package
修改設定檔 C:\Program Files\Pentaho\data-integration\plugins\pentaho-big-data-plugin\plugin.properties
10
重要Key point Job: 主要的具體任務 Transformation: 任務內的component,細部地控制資料的處理
Hop: 兩任務間的stream
11
Hops Color Convention
12
使用說明
13
sPOON
14
Pan & Kitchen 將 Kettle 的 Transformation 檔案或 Job 檔案放置到 <Kettle_Home> 的任意子目錄中 若是 Transformation 檔案, 則透過 pan.bat 來執行 若是 Job 檔案, 則透過 kitchen.bat 來執行
15
Carte
16
Remote execution with Carte
start up a slave server on port 8080 Setting : ./data-integration/pwd Start : ./Carte.bat IP address port Define salve server in Kettle Open Kettle, open a transformation or job Click on the View panel Right click on Slave server and select New. Transformations can only use the slave server if you specify it in the Execute a transformation dialog
17
參考資料 查hdfs對應版本 Pentaho Community wiki Pentaho Data Integration (Kettle) Tutorial
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.