TotalETL:infoServer Chris Fournier Nathan Clark Scott Longley Cyril Shilnikov MQP Project 2005 Sponsored by TotalETL inc.
TotalETL Small ETL Company ETL (Extract Transform Load) –Used in large companies –Multimillion dollar business Existing Product is infoSight-- desktop solution
infoSight
infoSight Current Features GUI Project creation Library of Transformers Works with multiple input types Single machine Single user One project at a time
MQP Project goals Prototype the client-server version of infoSight –Distributed –Multi-user –Database-centric –Extensible –Alpha-level code –Focus on back-end design
Project Methodology Met with TotalETL team on-site Design requirements Refine and discuss requirements as needed Build core modules, demo end first term Build additional modules, final demo.
General design overview Thin Clients Thick Clients Repository Distributed Server System
Actual design overview Security Manager Session Manager Repository Manager Event & Log Manager Project Manager Scheduling Manager Job Manager DB Client Version Manager
Repository Manager System core Store all information about –System operation –Security –Projects XML Parser to store Projects JDBC to connect to DB’s
Repository Table Design
Project Manager & Version Control Storage and Retrieval of Projects –In-memory Object -> XML File -> Repository Version Control –Per user locking –Version tracking
Job Manager Combine Projects into Jobs Set interdependencies Running Jobs
Schedule Manager Schedule Jobs –On request –Per schedule Multiple scheduling strategies
Session Manager Establish and maintain client connections RMI –Simple, robust, built-in to Java Front end for all functions in server Security checking –Authentication of users –Authorization of commands
Security Manager Determine user’s privileges Control access to Projects, Jobs, etc. Custom Security Model –Role-based ACLs –Read, Write, Execute (Projects and Jobs) –Read, Create, Modify (System Configuration)
Event Manager & Logger Useful for future expansion Complex Hierarchy of Events All Events Logged –Log4J format
Event Hierarchy InfoserverEventUserEvent UserLoginFailedEvent UserLoginEvent UserLogoutEvent ProjectEvent (Other level-2 events) (Other level-3 events) Listeners
Saving and Loading Projects Security Manager Session Manager Repository Manager Event & Log Manager Project Manager DB Client Version Manager
Creating Jobs from Projects Security Manager Session Manager Repository Manager Event & Log Manager Project ManagerJob Manager DB Client
Scheduling Jobs to Run Security Manager Session Manager Repository Manager Event & Log Manager Scheduling Manager Job Manager DB Client
Project Summary Relational Database storage –Projects –Operational Information Job Scheduling Tailored Security Model Version control Logging
Future work Distributed servers Clients, thick and thin Support for more databases More advanced scheduling algorithms
Thanks Professor E. A. Rundensteiner Arun Shastry Greg Goldberg Rest of the TotalETL Team
Questions?