Download presentation
Presentation is loading. Please wait.
1
Data Warehouse Tools and Technologies - ETL
By: Issarachevawat, Raynoo Romieh, Christian Wongkamolchun, Siri Zhang, Ying
2
What Is ETL? Extract -- the process of reading data from a outer database. Transform -- the process of converting extracted data to a form useable by the target database. Occurs by using rules or lookup tables or by combining the data with other data. Load -- the process of writing the data into the target database.
3
What does ETL do? Extracts data from multiple data sources
Migrates data from one DB to another Converts DB from one format or type to another. Transforms the data to make it accessible to business analysis Forms data marts and data warehouses Enables loading of multiple target databases Performs at least three specific functions reads data from an input source ; passes the stream of information through either an ETL engine- or code-based process to modify, enhance, or eliminate data elements based on the instructions of the job; writes the resultant data set back out to a flat file, relational table, etc.
4
What can ETL be used? To acquire a temporary subset of data (like a VIEW) for reports or other purposes. a more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse Question: Since the ETL provides a mini-data-warehouse component that looks remarkably like the data mart and perform all the data extraction, filtering, integration, classification and aggregation functions that the data warehouse normally provides, why we need a extra data warehouse as an duplicated part? Answer In Fact, when properly implemented, the data warehouse performs all data preparation function instead of letting ETL perform those chores, so there is no duplication of function. Better yet, the data warehouse handles the data component much more efficiently than ETL does, so we can appreciate the benefits of having a central data warehouse serve as the large enterprise decision support database. Moreover, to provide better performance, ETL merge the data warehouse and data mart approaches by storing small extracts of the data warehouse at end-user workstations.
5
Data extracted from the data warehouse provide faster processing
ETL SYSTEM Operational Data OLAP End Users Local Data Marts ETL Engine Extract Transform Load Filter Outer Sources Different vendor Different format Data Warehouse Data extracted from the data warehouse provide faster processing
6
Issues that are key to an effective ETL tool
Scheduling and job dependencies: particularly relies on graphical environment. Session nesting: When developing an ETL session for a particular part of the system, nesting eliminates duplicate development. Robust SQL support: Increases speed over using code to read and write to a database. Version management: enables quick roll back rather than manually making code changes. In many cases, the DB’s version control may not work on the ETL.
7
Key Issues … (Cont’d) Debugging functionality: very useful for developer support. ETL should rely on underlying database security. Transformation capabilities vs. cleansing capabilities: seldom very strong in both. Metadata support: must work with the overall metadata strategy.
8
Current ETL Market Share
Total Market Share: $667 Million
9
ETL Evaluation Ascential Software Formed in July 2001
Throughout the following sections, each of the vendors and their ETL products are evaluated, focusing on primary differences between such products. Ascential Software Formed in July 2001 Focuses on improving, developing, and perfecting their ETL and “back-end” tools Do not have current plans of entering the BI tool market. The Ascential DataStage product family highly scalable ETL solution uses end-to-end metadata management and data quality assurance functions. can create and manage scalable, complex data integration for enterprise applications such as CRM, ERP, SCM, BI/analytics, E-business and data warehouses.
11
Cognos Corporation Founded in 1969
Prefers that all components of the enterprise data warehouse are Cognos Products DecisionStream easily integrates with Cognos BI tools, etc. has difficulty integrating with other vendor Products. DecisionStream is powerful ETL software Allows users to extract and unite data from disparate sources and deliver coordinated Business Intelligence across your organization. includes advanced data merging, aggregation and transformation capabilities: let users unite data from different sources, and transform it into information using best-practices dimensional design.
13
Informatica PowerConnect
An extension to Informatica PowerCenter, and PowerCenterRT data integration software. Eliminates the need for customers to manually code data extraction programs for their enterprise applications. Ensures that mission-critical operational data can be effectively used to inform key business decisions across the enterprise. Allows companies to directly source and integrate: ERP CRM Real-time message queue Mainframe AS/400 Remote data Metadata with other enterprise data and deliver it to: Data warehouses Operational data stores Business intelligence tools Packaged analytic applications.
15
Conclusion Issues analyzed: Cognos could not compete
development environments version control Securities metadata exchanges standards Cost Cognos could not compete based on the relative youth limitations of ETL tools. unable to show support for version or revision control security provided by the underlying database, favors non-Cognos Products. The ETL tools presented by Ascential and Informatica are comparable in numerous ways it would be best to select Informatica as an ETL vendor. more mature and stable as a company more comprehensive ETL at an efficient price.
16
Questions? For Copies of the paper, Please Christian Romieh,
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.