MartLoader 0.7 Convenient for distinguishing the 2 versions

Slides:



Advertisements
Similar presentations
Easily retrieve data from the Baan database
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Web Applications Development Using Coldbox Platform Eddie Johnston.
Components and Architecture CS 543 – Data Warehousing.
Application architectures
HADOOP ADMIN: Session -2
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
1 DAN FARRAR SQL ANYWHERE ENGINEERING JUNE 7, 2010 SCHEMA-DRIVEN EXPERIMENT MANAGEMENT DECLARATIVE TESTING WITH “DEXTERITY”
Christopher Jeffers August 2012
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Converting COBOL Data to SQL Data: GDT-ETL Part 1.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
Metadata Mòrag Burgon-Lyon University of Glasgow.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CS223: Software Engineering
Application architectures. Objectives l To explain the organisation of two fundamental models of business systems - batch processing and transaction processing.
Introduction The concept of a web framework originates from the basic idea that every web application obtains its foundations from a similar set of guidelines.
INTRO. To I.T Razan N. AlShihabi
The Holmes Platform and Applications
Computer System Structures
Progress Apama Fundamentals
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Hadoop.
Dockerize OpenEdge Srinivasa Rao Nalla.
The Development Process of Web Applications
LOCO Extract – Transform - Load
Spark Presentation.
Chapter 2: System Structures
Process Management Presented By Aditya Gupta Assistant Professor
OO Methodology OO Architecture.
Software as Data Structure
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Hierarchical Architecture
Workflow Best Practices
Chapter 2: Operating-System Structures
Design and Maintenance of Web Applications in J2EE
DHCP, DNS, Client Connection, Assignment 1 1.3
Service-centric Software Engineering
Chapter 2: System Structures
CIS16 Application Development – Programming with Visual Basic
Lecture 1: Multi-tier Architecture Overview
ARCH-1: Application Architecture made Simple
Module 01 ETICS Overview ETICS Online Tutorials
An Introduction to Software Architecture
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Agile testing for web API with Postman
Outline Chapter 2 (cont) OS Design OS structure
Metadata The metadata contains
Experience with the process automation at SORS
A QUICK START TO OPL IBM ILOG OPL V6.3 > Starting Kit >
MAPREDUCE TYPES, FORMATS AND FEATURES
Automation of Control System Configuration TAC 18
System calls….. C-program->POSIX call
Chapter 2 Operating System Overview
UFCEUS-20-2 Web Programming
NIEM Tool Strategy Next Steps for Movement
Web Application Development Using PHP
Integrated Statistical Production System WITH GSBPM
Presentation transcript:

MartLoader 0.7 Convenient for distinguishing the 2 versions Called “DbLoader” in 0.7: Convenient for distinguishing the 2 versions Name too generic? Written in Perl: Not suitable anymore (scalability issues among others) BioMart 0.8 is written in Java - Not a criticism of former developers of DbLoader

MartLoader 0.7 – I/O Input: Files: Denormalized datafiles Metadata file (Excel Spreadsheet) Same datafiles as 0.8 (as introduced by Joachim earlier) Metadata file differs: 0.8 will use front-end+machine-parseable format instead (XML or JSON) Output: Files: database bulk loading datafiles (in reverse-star format) Database: multiple loaded marts (from aforementioned datafiles)

MartLoader 0.7 – Execution 1/3 Parsing hardcoded metadata file (Data Model description):

MartLoader 0.7 – Execution 2/3 Checking TSV input datafiles format (described in Excel file Data Model):

MartLoader 0.7 – Execution 3/3 Runing remaining code from “runme.pl” perl script that contains sequential, non-customizable instructions for: Processing datafiles: File concatenations Values decoding Partial normalization (fields “deconvolution”) Validation (minimal) Joins Creating database tables Loading data

MartLoader 0.7 – Conclusion Not maintainable: duplicated code everywhere Not scalable: No customization possible Memory-dependent processes (in-memory joins) Not reliable: serious bugs reported Not user-friendly: no proper user interface Not presentable: main script is one giant file of ~1500 lines (so much for cohesion)

MartLoader 0.8 MartPlanner (Joachim's baby) MartExecutor (Anthony's baby) Over to Joachim

MartLoader 0.8 back-end Name suggestions are welcome (not “Runner”) Called “MartExecutor” for now Name suggestions are welcome (not “Runner”) A separate entity from MartPlanner (abstraction layer) Goal: Execute plan as per MartPlanner's instructions Datafiles processing (rewritting, validation, sort/join, ...) Database creation/loading - Not a criticism of former developers of DbLoader

MartExecutor – Planner→Executor communication File: exchange ML file (likely XML or JSON) Likely to evolve over time: Fairly small Fairly simple Essentially a directed graph with: Nodes+properties: steps (actions) Edges: step dependencies (for instance a UNIX join needs to wait for the 2 sorting steps to be finished)

MartExecutor – Implementation Scheduler: Quartz Scheduler Leading Java scheduler No scheduler-specific functionality needed (cron, concurrency) → MartPlanner's job Does not offer own dependency mechanism → Use batch processing library instead Batch processing: Spring Batch Spring Framework: java application platform (based on IoC principle) Spring Batch: batch framework Spring Batch Admin: web-based admin user interface for Spring Batch Quartz (again): can integrate nicely with Spring Batch if needed afterall! Notable Spring Batch Vocabulary: a job is made up of steps

MartExecutor – Execution (1/3) – Starting job Start job with either: CLI: Exact call is wrapped in a script Underlying call is to Spring Batch's class CommandLineJobRunner + MartPlanner's file as argument Web interface: Deploy Spring Batch Admin to Jetty (also wrapped in a script) Underlying call is a simple: mvn jetty:run Browse to page and Launch job: Demo later: using dummy configuraiton/steps

MartExecutor – Execution (2/3) – Monitoring Monitor steps as they run and how they relate to each other: JUNG: Using JUNG (graph library) and another abstraction layer (GraphML files) JGraph: Considering JGraph in the future (more powerful graph library)

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch xml configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML Spring Batch

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

MartExecutor – Execution (3/3) – Workflow Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch Loaded mart

MartExecutor – Error handling Validation: MartPlanner: as presented by Joachim MartExecutor: Dedicated validation steps: “decoding” validation: Unknown code “deconvolution” validation: Invalid value format ... On-the-go validation: orphan key (“join” validation) Marking of steps as: new, started or finished: using Spring Batch's in-memory database or persisted in XML configuration Failure/Recovery: Skip finished steps Clean-up started steps + restart them Start new steps

MartExecutor – Demo 1/2 Input: Hard-coded Spring Batch XML configuration file: Describes a subset of DbLoader (0.7) workflow Highlights typical dependencies among steps Presents monitoring tool would be the result of transforming MartPlanner's file (not ready)

MartExecutor – Demo 2/2 Processing: Fake steps (random sleeping processing)

MartExecutor – Improvements over 0.7 Using Java language: More suitable given project's size More libraries to help (especially the very powerful Spring Framework) Better enforcement of OO concepts Using abstraction layers promotes: Maintainability Scalability Using intensive continuous integration (Spring's strongest suit) that promotes reliability Using UNIX commands: Some of the best sort/join algorithms out there SQL can't easily be distributed Using of UNIX and grid engine obviously remains optional

MartLoader – Conclusion Another meeting necessary in a few weeks New and upcoming challenges required solid understanding of basic MartLoader functionality (hence today's meeting) Next meeting will (hopefully) cover: Progress made Feedback: Users first impressions and consequences Planner→Executor communication established Basic use cases implemented Presentation of new use cases and strategy: Incremental loading Updating data ...? Scalability tests results: 0.7 versus 0.8 UNIX+grid versus SQL ()

Thank you! Q&A time - Not a criticism of former developers of DbLoader