Download presentation
Presentation is loading. Please wait.
1
Extract, Transform & Load Tool
Talend – A Primer Extract, Transform & Load Tool Bob Brennan Integrated Manufacturing Systems, Inc.
2
Introduction Integrated Manufacturing Systems, Inc.
Consumer Product Company Integration 4 Companies under 1 new roof Bottles, Caps/Covers, Product, Filling, Warehousing and Logistics Wamas (Warehouse Management System) by Shaeffer Autefa – Automation
5
ETL Tools - Original Concepts
Extract – Gather Data from various sources Transform - mold and rework data elements Load – Put the data into a container for reporting and analysis Build data warehouses from corporate data for BI analysis Aggregate data into common reporting structures Not Normalized Do the math on the load
6
ETL Tools Today Expanded role of these tools to include all of the original goals Additions Service Bus Streams Data scrubbing or cleansing Non Structured Outputs Service Oriented Interfaces (REST/JSON, SOAP, CLOUD, Streams)
7
Partial List of Players
Oracle Warehouse Builder (OWB) SAP Data Services IBM Infosphere Information Server SAS Data Management PowerCenter Informatica Elixir Repertoire for Data ETL Data Migrator (IBI) SQL Server Integration Services (SSIS) Talend Studio for Data Integration Sagent Data Flow Syncsort DMX Actian DataConnect Open Text Integration Center Oracle Data Integrator (ODI) Cognos Data Manager CloverETL Centerprise Data Integrator IBM Infosphere Warehouse Edition Pentaho Data Integration Adeptia Integration Server QlikView Expressor
9
Talend Versions – Open Source
Data Integration – Most of the tools we will discuss Enterprise Service Bus – Active MQ as the Bus and Kafka Streaming Big Data – Hadoop and NoSQL Data Preparation - Desktop Tool Data Quality – Normalizing data like address info, titles Master Data Management – ‘ One view of the truth’ Data Streams – In line Processing
10
Common Features Eclipse based Interface
Shared Resources Shipped in the Box Repositories of customized objects and behaviors Consistency in UI and Processing across all Open Source Products Lots of Community Activity
11
Swiss Army Knife
12
Talend Open Studio
14
Repository All the sources of data All of the Built Out Components
Built-In and Custom Objects
15
Design Workspace Area used to Design and Layout Jobs
Graphical Drag and Drop Code Preview Tab
16
Configuration Configure Components Add Context Configure Jobs
Run and debug
17
Palette Lists Components Available Over 800 come in the box Data Types
Systems Actions Controls Interfaces Connections
18
Business Model Non technical view of a business workflow. Includes the strategic systems or processes already running and new needs. Modeling Large Flows or Use Cases Not something we have leveraged even with the large commercial project
19
Job A distinct task that starts and stops.
This is the key area of focus as you get started. Jobs can be re-used components.
20
Route A routine that continually processes data when available. When no data is available, it waits. Think ESB. A Job that keeps running in a loop.
21
Demo Break #1 Scroll the Palette
22
Component Families Big Data This family provides a wide range of built-in components of Big Data like Cassandra, Google Storage, HBase, HDFS, Hive, Impala, MongoDB, Pig etc. Using these components you can connect to the modules of the Hadoop distribution. They create connections to various 3rd party tools used for transferring, storing or analysing Big Data. Databases This family provides Talend components which cover various needs like opening connections, reading and writing tables, committing transactions, performing rollback for error handling etc. More than 40 RDBMS are supported by Talend some of which are MySQL, MS SQL Server, Hive, Amazon, Azure etc.
23
Component Families File This family groups together various components which read and write data in all types of files like Delimited, Positional, XML, Excel etc. Moreover, it also provides a number of components which help in performing various tasks like unarchiving, deleting, copying, comparing etc. This family is further divided into subfamilies like Input, Output, and Management. Internet This family includes all of the components that help in accessing information from the Internet, through various means like Web services, RSS flows, SCP, MOM, s, FTP etc.
24
Component Families Logs & Errors This family, groups together all the components which are dedicated to catch log information and handle Job errors. Misc This family gathers different miscellaneous components covering various needs like the creation of sets of dummy data rows, buffering data, loading context variables etc. Orchestration This family includes various components which help to sequence or orchestrate tasks and processing Jobs or SubJobs etc.
25
Open Edge Integration Need to use JDBC to connect to OE
From the whole of the JDBC driver is found in OPENEDGE.JAR Prior to 10.2 you need to reference
26
JDBC Connection To Open Edge
27
Syntax for JDBC Connection
"jdbc:datadirect:openedge://<server>:<port>;databaseName=<db>“ Where: <server> can be name or IP Address of the server <port> is the number you used to create the ODBC Broker <db> Is the name of the database
28
Demo #1 Read an Excel File as Input Output as a delimited file using “{“ as delimiter.
29
Demo #2 Connect to an Open Edge Database via JDBC
Read a complex SQL query Do in line data changes Map Data to a specific Output Write it out as XML
30
Demo #3 Connect to an Open Edge Database via JDBC
Read a complex SQL query Do in line data changes Map Data to a specific Output Write it out as CSV File (Not Normalized – Feed New Business App)
31
Take Aways www.talend.com Use JDBC to connect to an OE Database
Data Source Agnostic 800 Components Shipped in the Box
32
Commercial versus Open Source
Training and Professional Services Mapping Tools Synchronization Tools Deployment Tools
33
Bob Brennan Integrated Manufacturing System, Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.