Download presentation
Presentation is loading. Please wait.
Published byRudolph Cook Modified over 6 years ago
1
MartLoader 0.7 Convenient for distinguishing the 2 versions
Called “DbLoader” in 0.7: Convenient for distinguishing the 2 versions Name too generic? Written in Perl: Not suitable anymore (scalability issues among others) BioMart 0.8 is written in Java - Not a criticism of former developers of DbLoader
2
MartLoader 0.7 – I/O Input: Files: Denormalized datafiles
Metadata file (Excel Spreadsheet) Same datafiles as 0.8 (as introduced by Joachim earlier) Metadata file differs: 0.8 will use front-end+machine-parseable format instead (XML or JSON) Output: Files: database bulk loading datafiles (in reverse-star format) Database: multiple loaded marts (from aforementioned datafiles)
3
MartLoader 0.7 – Execution 1/3
Parsing hardcoded metadata file (Data Model description):
4
MartLoader 0.7 – Execution 2/3
Checking TSV input datafiles format (described in Excel file Data Model):
5
MartLoader 0.7 – Execution 3/3
Runing remaining code from “runme.pl” perl script that contains sequential, non-customizable instructions for: Processing datafiles: File concatenations Values decoding Partial normalization (fields “deconvolution”) Validation (minimal) Joins Creating database tables Loading data
6
MartLoader 0.7 – Conclusion
Not maintainable: duplicated code everywhere Not scalable: No customization possible Memory-dependent processes (in-memory joins) Not reliable: serious bugs reported Not user-friendly: no proper user interface Not presentable: main script is one giant file of ~1500 lines (so much for cohesion)
7
MartLoader 0.8 MartPlanner (Joachim's baby) MartExecutor (Anthony's baby)
Over to Joachim
8
MartLoader 0.8 back-end Name suggestions are welcome (not “Runner”)
Called “MartExecutor” for now Name suggestions are welcome (not “Runner”) A separate entity from MartPlanner (abstraction layer) Goal: Execute plan as per MartPlanner's instructions Datafiles processing (rewritting, validation, sort/join, ...) Database creation/loading - Not a criticism of former developers of DbLoader
9
MartExecutor – Planner→Executor communication
File: exchange ML file (likely XML or JSON) Likely to evolve over time: Fairly small Fairly simple Essentially a directed graph with: Nodes+properties: steps (actions) Edges: step dependencies (for instance a UNIX join needs to wait for the 2 sorting steps to be finished)
10
MartExecutor – Implementation
Scheduler: Quartz Scheduler Leading Java scheduler No scheduler-specific functionality needed (cron, concurrency) → MartPlanner's job Does not offer own dependency mechanism → Use batch processing library instead Batch processing: Spring Batch Spring Framework: java application platform (based on IoC principle) Spring Batch: batch framework Spring Batch Admin: web-based admin user interface for Spring Batch Quartz (again): can integrate nicely with Spring Batch if needed afterall! Notable Spring Batch Vocabulary: a job is made up of steps
11
MartExecutor – Execution (1/3) – Starting job
Start job with either: CLI: Exact call is wrapped in a script Underlying call is to Spring Batch's class CommandLineJobRunner + MartPlanner's file as argument Web interface: Deploy Spring Batch Admin to Jetty (also wrapped in a script) Underlying call is a simple: mvn jetty:run Browse to page and Launch job: Demo later: using dummy configuraiton/steps
12
MartExecutor – Execution (2/3) – Monitoring
Monitor steps as they run and how they relate to each other: JUNG: Using JUNG (graph library) and another abstraction layer (GraphML files) JGraph: Considering JGraph in the future (more powerful graph library)
13
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch xml configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file
14
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML
15
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML Spring Batch
16
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch
17
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch
18
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch
19
MartExecutor – Execution (3/3) – Workflow
Convert MartPlanner's instruction to Spring Batch XML configuration file Run Spring Batch based on it Regenerate GraphML file when starting and finishing a step (to reflect progress in monitoring window) MartLoader ML file Spring Batch XML GraphML file Spring Batch Loaded mart
20
MartExecutor – Error handling
Validation: MartPlanner: as presented by Joachim MartExecutor: Dedicated validation steps: “decoding” validation: Unknown code “deconvolution” validation: Invalid value format ... On-the-go validation: orphan key (“join” validation) Marking of steps as: new, started or finished: using Spring Batch's in-memory database or persisted in XML configuration Failure/Recovery: Skip finished steps Clean-up started steps + restart them Start new steps
21
MartExecutor – Demo 1/2 Input: Hard-coded Spring Batch XML configuration file: Describes a subset of DbLoader (0.7) workflow Highlights typical dependencies among steps Presents monitoring tool would be the result of transforming MartPlanner's file (not ready)
22
MartExecutor – Demo 2/2 Processing: Fake steps (random sleeping processing)
23
MartExecutor – Improvements over 0.7
Using Java language: More suitable given project's size More libraries to help (especially the very powerful Spring Framework) Better enforcement of OO concepts Using abstraction layers promotes: Maintainability Scalability Using intensive continuous integration (Spring's strongest suit) that promotes reliability Using UNIX commands: Some of the best sort/join algorithms out there SQL can't easily be distributed Using of UNIX and grid engine obviously remains optional
24
MartLoader – Conclusion
Another meeting necessary in a few weeks New and upcoming challenges required solid understanding of basic MartLoader functionality (hence today's meeting) Next meeting will (hopefully) cover: Progress made Feedback: Users first impressions and consequences Planner→Executor communication established Basic use cases implemented Presentation of new use cases and strategy: Incremental loading Updating data ...? Scalability tests results: 0.7 versus 0.8 UNIX+grid versus SQL ()
25
Thank you! Q&A time - Not a criticism of former developers of DbLoader
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.