Presentation is loading. Please wait.

Presentation is loading. Please wait.

MartLoader 0.7 Convenient for distinguishing the 2 versions

Similar presentations


Presentation on theme: "MartLoader 0.7 Convenient for distinguishing the 2 versions"— Presentation transcript:

1 MartLoader 0.7 Convenient for distinguishing the 2 versions
Called “DbLoader” in 0.7: Convenient for distinguishing the 2 versions Name too generic? Written in Perl: Not suitable anymore (scalability issues among others) BioMart 0.8 is written in Java - Not a criticism of former developers of DbLoader

2 MartLoader 0.7 – I/O Input: Files: Denormalized datafiles
Metadata file (Excel Spreadsheet) Same datafiles as 0.8 (as introduced by Joachim earlier) Metadata file differs: 0.8 will use front-end+machine-parseable format instead (XML or JSON) Output: Files: database bulk loading datafiles (in reverse-star format) Database: multiple loaded marts (from aforementioned datafiles)

3 MartLoader 0.7 – Execution 1/3
Parsing hardcoded metadata file (Data Model description):

4 MartLoader 0.7 – Execution 2/3
Checking TSV input datafiles format (described in Excel file Data Model):

5 MartLoader 0.7 – Execution 3/3
Runing remaining code from “runme.pl” perl script that contains sequential, non-customizable instructions for: Processing datafiles: File concatenations Values decoding Partial normalization (fields “deconvolution”) Validation (minimal) Joins Creating database tables Loading data

6 MartLoader 0.7 – Conclusion
Not maintainable: duplicated code everywhere Not scalable: No customization possible Memory-dependent processes (in-memory joins) Not reliable: serious bugs reported Not user-friendly: no proper user interface Not presentable: main script is one giant file of ~1500 lines (so much for cohesion)

7 MartLoader 0.8 MartPlanner (Joachim's baby) MartExecutor (Anthony's baby)
Over to Joachim

8 MartLoader 0.8 back-end Name suggestions are welcome (not “Runner”)
Called “MartExecutor” for now Name suggestions are welcome (not “Runner”) A separate entity from MartPlanner (abstraction layer) Goal: Execute plan as per MartPlanner's instructions Datafiles processing (rewritting, validation, sort/join, ...) Database creation/loading - Not a criticism of former developers of DbLoader

9 MartExecutor – Planner→Executor communication
File: exchange ML file (likely XML or JSON) Likely to evolve over time: Fairly small Fairly simple Essentially a directed graph with: Nodes+properties: steps (actions) Edges: step dependencies (for instance a UNIX join needs to wait for the 2 sorting steps to be finished)

10 MartExecutor – Implementation
Scheduler: Quartz Scheduler Leading Java scheduler No scheduler-specific functionality needed (cron, concurrency) → MartPlanner's job Does not offer own dependency mechanism → Use batch processing library instead Batch processing: Spring Batch Spring Framework: java application platform (based on IoC principle) Spring Batch: batch framework Spring Batch Admin: web-based admin user interface for Spring Batch Quartz (again): can integrate nicely with Spring Batch if needed afterall! Notable Spring Batch Vocabulary: a job is made up of steps

11 MartExecutor – Execution (1/3) – Starting job
Start job with either: CLI: Exact call is wrapped in a script Underlying call is to Spring Batch's class CommandLineJobRunner + MartPlanner's file as argument Web interface: Deploy Spring Batch Admin to Jetty (also wrapped in a script) Underlying call is a simple: mvn jetty:run Browse to page and Launch job: Demo later: using dummy configuraiton/steps

12 MartExecutor – Execution (2/3) – Monitoring
Monitor steps as they run and how they relate to each other: JUNG: Using JUNG (graph library) and another abstraction layer (GraphML files) JGraph: Considering JGraph in the future (more powerful graph library)

13 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch xml configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file

14 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML

15 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML Spring Batch

16 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

17 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

18 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch

19 MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch Loaded mart

20 MartExecutor – Error handling
Validation: MartPlanner: as presented by Joachim MartExecutor: Dedicated validation steps: “decoding” validation: Unknown code “deconvolution” validation: Invalid value format ... On-the-go validation: orphan key (“join” validation) Marking of steps as: new, started or finished: using Spring Batch's in-memory database or persisted in XML configuration Failure/Recovery: Skip finished steps Clean-up started steps + restart them Start new steps

21 MartExecutor – Demo 1/2 Input: Hard-coded Spring Batch XML configuration file: Describes a subset of DbLoader (0.7) workflow Highlights typical dependencies among steps Presents monitoring tool would be the result of transforming MartPlanner's file (not ready)

22 MartExecutor – Demo 2/2 Processing: Fake steps (random sleeping processing)

23 MartExecutor – Improvements over 0.7
Using Java language: More suitable given project's size More libraries to help (especially the very powerful Spring Framework) Better enforcement of OO concepts Using abstraction layers promotes: Maintainability Scalability Using intensive continuous integration (Spring's strongest suit) that promotes reliability Using UNIX commands: Some of the best sort/join algorithms out there SQL can't easily be distributed Using of UNIX and grid engine obviously remains optional

24 MartLoader – Conclusion
Another meeting necessary in a few weeks New and upcoming challenges required solid understanding of basic MartLoader functionality (hence today's meeting) Next meeting will (hopefully) cover: Progress made Feedback: Users first impressions and consequences Planner→Executor communication established Basic use cases implemented Presentation of new use cases and strategy: Incremental loading Updating data ...? Scalability tests results: 0.7 versus 0.8 UNIX+grid versus SQL ()

25 Thank you! Q&A time - Not a criticism of former developers of DbLoader


Download ppt "MartLoader 0.7 Convenient for distinguishing the 2 versions"

Similar presentations


Ads by Google