LOCO Extract – Transform - Load Overview
What is it: Loco-ETL is a low cost, high performance and functionally rich ETL application. With a low memory footprint it can be configured to have multiple instances running concurrently eg. multiple batch processes updating static data from external systems and multiple on-line services. Some offerings from the big boys require a minimum of 4 Gigabytes of memory, Loco-ETL will run in less than 32 Megabytes. It can be used as either a batch process or as an on-line service. It is coded using Java with a web based configuration client making installation and deployment very simple. From the ground up it has been designed to deal very easily with complex XML data structures which is often where other ETL Applications fall down. The configuration client has been designed to be used by users with little or no technical experience.
What’s in it for me: Low cost of ownership – The cost of Loco-ETL is a fraction of that of the major suppliers. Typically, the big boys charge by number of CPU’s and number of concurrent developers, Loco-ETL has none of these constraints. Time to production – The application ships with a good selection of Extractors, Loaders and Transformer classes which should cater for most scenarios enabling the configuration and testing cycles to be completed in a timely manner. Extensibility – Transformers are designed to implement data transformations and apply business logic to the source data, the delivered library can easily be extended by simply adding new user defined classes to the Java classpath, no recompilation and deployment is required. Ownership – For those clients with their own development facilities, the source code is available, the benefit here is no recurring fees after the initial purchase. If this option is taken, a handover period is included. Versatility – As well as being ideal for production batch and on-line services, the application is well suited to other tasks such as Data Migration etc.
Jargon Buster: Parallel Processing and Partitioning Pipelining This refers to splitting a task into multiple process nodes where each node processes a subset of the data, the obvious advantage here is scalability. Loco-ETL addresses this by being able to run many instances concurrently, its low memory footprint makes this possible. Pipelining This refers to running each of the Extract, Transform and Load components concurrently rather than sequentially. Loco-ETL addresses this by being designed as a multi-threaded architecture.
Component Structure - Server Extractor Transformation Engine Including Translation Engine And Transformers From Source System To Target System Loader Response Translator Including Style sheet Translation To Source System From Target System The Response Translator Is only configured for Source Systems that require a response
Structural Overview Server: An instance of the ETL is encapsulated in what we call a Server, this is outlined in the previous slide. A server comprises four main components: The Extractor – this component is responsible for processing the data supplied by the source system and includes: Files – Delimited, Fixed Field Width, XLS and XML formats Database SQL Query data Middleware – Sockets, JMS and MQ The Transformation Engine – this component is responsible for mapping the data from the source format to the target format. The mapped data then has any Transformer definitions applied before being submitted to the Target system. A full list is available as a download from www.loco-etl.com. Examples are String manipulation, Filters, Decision Making, Data formatting and Data Lookups. The Loader – this component is responsible for loading the transformed data into the target system, again, this includes: Files – Delimited, Fixed Field Width and XLS formats
Structural Overview - continued The Response Translator – This component is configured only when the source system requires a response from the target system. As is often the case, the source will require the response in a different format from that supplied by the target system so to cater for this, the Translator includes the ability to apply Java Stylesheets to the response.
Performance Multi - Threading Performance is always an issue so to try and optimize this, each of these four components run in separate threads . In addition, the Transformation Engine can be further multi-threaded by the configuration client. Memory Footprint The use and availability of memory has an impact on performance, typically each instance of the application will require just 32M however, this can be configured on an individual basis. With its low memory footprint, Loco-ETL can easily be configured to have multiple instances running concurrently which will have an obvious impact on throughput.
Additional Features Where the source system is a database, the SQL query used to extract the data can be developed using any proprietary tool, the query then just has to be made available to the ETL by defining its location using the ETL Configuration client. The query should follow the guidelines provided in the Help system but to see just what the Extractor component makes of the query, a Query Checker is provided. Failure Mechanic, this is a feature that is used for source systems that provide data in an XML format, if the data fails to load it can be persisted along with the error message. The Mechanic is then used to inspect, update and re-submit the failed data. Transformers, the library supplied includes a good set of Transformer classes to cover things like Lookups, Cashflow generators, String manipulation, Decision making structures etc. Mapping Engine, Obviously, the mapping engine needs to know the format of the source and target data, this is accomplished using what we call a data template. These templates and the Mapping Engine are the key items for dealing with complex nested XML structures so to make life easy, Loco-ETL includes a facility to import the templates directly from example XML files.