Download presentation
Presentation is loading. Please wait.
1
1 A Comparative Study between ETL and E-LT approaches for loading data into a Data Warehouse Vikas Ranjan CSCI 693
2
2 Agenda Introduction Extract, Transform and Load (ETL) Approach Strengths/ Weaknesses of ETL Extract, Load and Transform (E-LT) Approach Strengths/ Weaknesses of E-LT Experiments Results Conclusion/ Future Work References Q & A
3
3 Introduction All Business Intelligence applications are data-centric. Large volumes of data is stored and processed along with its history in the Data Warehouse. The Data is extracted from various heterogeneous source systems and transformed as per the business requirements. Data is used for analytical purposes- future forecasting, profitability analysis, trend analysis etc. to drive the business.
4
4 Extract, Transform and Load (ETL) Traditional approach of loading data into data warehouses. Data is first pulled/ pushed from various heterogeneous sources like ERP, CRM, RDBMS, Flat Files. Business rules are applied on the data in the Staging Area. Transformed data is loaded into target database. Often designed backward, thus only the relevant data is fetched.
5
5 Strengths/ Weaknesses of ETL Strengths Can perform complex operations in single data flow diagrams. Mostly designed backward, thus only relevant data is loaded. Used for building real-time data warehouses. Robust tools are available like Informatica, Ab Initio, Data Stage. Weaknesses Data transformation step of ELT is performed by ETL engine, therefore increasing processing time. Data is moved over the network twice. Since it is developed backward, more effort for future redesign.
6
6 Extract, Load and Transform (E-LT) Newer Approach of loading into target data warehouse. The Data is extracted from various sources same way it is done via ETL. This extracted data is loaded directly into target data warehouse. The transformations and complex business rules are applied by native SQL drivers. The processing is done by database engine rather than ETL engine.
7
7 Strengths/ Weaknesses of E-LT Strengths Since all the data is available, the future changes can be easily incorporated. Once the data is loaded on the target platform, all transformations are placed on the RDBMS engine. This reduces network congestion. Provides optimal performance as no extra hardware needed. Weaknesses E-LT suffers from a limited availability of tools like Informatica Pushdown, Data Integrator etc. Does not work well for complex business cases. E-LT cannot be used to design near real-time enterprise data warehouse.
8
8 Tools and Data Set Software / Hardware OS Platform: Sun M9000 Server (www.sun.com).www.sun.com Relational Database Management System: Teradata V2R6.2.1 (www.teradata.com).www.teradata.com ETL/E-LT Tool: Informatica PC 8.6.1 Hot Fix 9 Advanced Edition (www.informatica.com).www.informatica.com Data Set All the data used is experiments is obtained from the test database a telecom company. All the experiments were conducted using Informatica as ETL and E-LT tool.
9
9 Experiment 1 ETL vs. E-LT (Full Pushdown): Informatica has recently introduced both ETL and E-LT capabilities in its Power Center tool Job Name: Tax _Write_Off Job Same Job was developed using ETL approach as well as E-LT approach. Informatica pushed down all the code processing to the RDBMS engine.
10
10 Experiment1 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL
11
11 Experiment 1 contd. E-LT Job ( Full Pushdown): In the full pushdown, Informatica Server pushed all the code to RDBMS engine and Informatica Server worked purely as an E-LT tool
12
12 Experiment 1 Results Approximately Five Times more performance gain using E-LT Full Pushdown ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL32044 110291DB/ETL Server E-LT Full32044 65340DB Memory DB CPU
13
13 Experiment 2 ETL vs. E-LT (Target Pushdown): Job Name: Tax_Extract_ETL_tst Job Same job was developed using ETL approach as well as E-LT approach with Target Pushdown only The Pushdown to database engine happened at Target database only
14
14 Experiment 2 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL
15
15 Experiment 2 contd. E-LT Job (Target Pushdown): Informatica Server pushed code processing on target database to RDBMS engine
16
16 Experiment 2 Results No Performance Differences since the Pushdown was on Target Database only ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL1542714628131879246ETL/Partial DB E-LT Target1542714628131887245.26DB/E-LT Server
17
17 Experiment 3 ETL vs. E-LT (Source Pushdown): Job Name: Tax_Extract_ETL Job Same job was developed using ETL approach as well as E-LT approach with Source Pushdown only The Pushdown to database engine happened at Source database only
18
18 Experiment 3 contd. ETL Job: Informatica Server handled all the code processing and generated its own internal SQL
19
19 Experiment 3 contd. E-LT Job (Source Pushdown): Informatica Server pushed down code processing on source side to RDBMS engine
20
20 Experiments 3 Results No Performance Differences as Pushdown was on Source Database only Source Pushdown E-LT does not work with Teradata Sequence Generator ApproachData Read (rows) Data Load (rows) Runtime (second) Throughput (rows/sec) MemoryCPU ETL154271 948162ETL/Partial DB E-LT Source 154271 947162DB/E-LT Server
21
21 Results Significant performance gains were obtained using full pushdown E- LT over ETL approach. No performance gains using both source and target pushdown of E- LT approach and ETL approach as both used ETL Server and Database Engine resources( memory and CPU). Code changes were needed to redesign existing ETL jobs to use E-LT power of database engine. E-LT does not work in building Real Time Data warehouses
22
22 Conclusion and Future Work ETL works well for very complex transformations and active data warehouses E-LT works well for small and medium-sized data marts and when the source and target are on the same database platform only Future: Building data warehouse solutions using hybrid approach (combination of ETL and E-LT processes).
23
23 References. [1] A. Simitsis, P. Vassiliadis, T. Sellis “Optimizing ETL Processes in Data Warehouses,” in Proc. 21st International Conference on Data Engineering, 2005, (ICDE 2005), pp. 564-575. [2] G.X. Zhou, Q.S. Xie, Y. Hu, “E-LT Integration to Heterogeneous Data Information for SMEs Networking based on E-HUB,” in Proc. Fourth International Conference on Natural Computation, 2008, IEEE, pp. 212-216. [3] I. William, S. Derek, and N. Genia, DW 2.0: The Architecture for the Next Generation of Data Warehousing. Burlington, MA: Morgan Kaufman, 2008, pp. 215- 229. [4] R. J. Davenport, September 2007. [Online] ETL vs. ELT: A Subjective View. InSource IT Consulting Ltd., U.K. Available at: http://www.insource.co.uk/pdf/ETL_ELT.pdf http://www.insource.co.uk/pdf/ETL_ELT.pdf [5] L. Troy, C. Pydimukkala, How to Use PowerCenter with Teradata to Load and Unload Data, Informatica Corporation [Online], Available at: www.myinformatica.com www.myinformatica.com
24
24 Questions ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.