Dtk-tools Benoit Raybaud, Research Software Manager.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Starfish: A Self-tuning System for Big Data Analytics.
KronoDesk® - Product Information
Chapter 15 Application of Computer Simulation and Modeling.
Geography 465 Overview Geoprocessing in ArcGIS. MODELING Geoprocessing as modeling.
FacilitiesDesk Product Overview. Looking For? A maintenance helpdesk? In other words a CMMS? An integrated solution for complete facilities management?
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
L18 - Studio 5000® and Logix Advanced Lab
WorkPlace Pro Utilities.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
© 2008 Ocean Data Systems Ltd - Do not reproduce without permission - exakom.com creation Dream Report O CEAN D ATA S YSTEMS O CEAN D ATA S YSTEMS The.
Cluster Reliability Project ISIS Vanderbilt University.
Components of Database Management System
Introduction to Hadoop and HDFS
Introduction of Geoprocessing Topic 7a 4/10/2007.

PERVASIVE COMPUTING MIDDLEWARE BY SCHIELE, HANDTE, AND BECKER A Presentation by Nancy Shah.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Event Data History David Adams BNL Atlas Software Week December 2001.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Cloud Computing Project By:Jessica, Fadiah, and Bill.
© 2007 IBM Corporation SOA on your terms and our expertise Software WebSphere Process Server and Portal Integration Overview.
Module 4 Planning for Group Policy. Module Overview Planning Group Policy Application Planning Group Policy Processing Planning the Management of Group.
Introduction of Geoprocessing Lecture 9. Geoprocessing  Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation.
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
Introduction of Geoprocessing Lecture 9 3/24/2008.
Monas MS is a software suite designed for displaying, processing and storing messages received in the centralized security and monitoring stations. Software.
Repository Manager 1.3 Product Overview Name Title Date.
SECTION 6 DESIGN STUDY. What’s in this section: –Design Variables –Design Studies Overview –Specifying an Objective –Execution Display Settings –Output.
Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved. PUBLIC PUBLIC CO900H L19 - Studio 5000® and Logix Advanced Lab.
Lesson 9: SOFTWARE ICT Fundamentals 2nd Semester SY
HedEx Lite Obtaining and Using Huawei Documentation Easily
QAD Financial Report Writer
Metataxis Can you really implement taxonomies in native SharePoint? Marc Stephenson March 2017.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Patrick Desbrow, CIO & VP of Engineering October 29, 2014
L25 - PlantPAx Process Application Development Lab I
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Hydrodynamic Galactic Simulations
ReproZip: Computational Reproducibility With Ease
Cisco Data Virtualization
Self Healing and Dynamic Construction Framework:
DTK-Tools Benoit Raybaud, Research Software Manager.
Joseph JaJa, Mike Smorul, and Sangchul Song
Preparing for a Hybrid SharePoint World
Enhancing Scholarly Communication with ReproZip
Chapter 15 QUERY EXECUTION.
Rapid fire performance testing of 250 websites
Learning to Program in Python
KronoDesk® - Listen, Help, Solve
What's New in eCognition 9
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Programming.
Marketing-to-Opportunity Scenario Overview
SharePoint 2019 Overview and Use SPFx Extensions
Overview of Workflows: Why Use Them?
What's New in eCognition 9
Professional Services
What's New in eCognition 9
Product Definition Scenario Overview
Web Application Development Using PHP
Contract Management Software from ContraxAware Simplify Your Contract Management Process.
Palestinian Central Bureau of Statistics
Presentation transcript:

dtk-tools Benoit Raybaud, Research Software Manager

dtk-tools Experiments and analysis IDM Weather and Demographics service Models On premise cluster Cloud environments dtk-tools To carry out experiments and analysis we rely on models. Those models need to be parametrized and files usually coming from the local machine or some sort of services They also need to run somewhere: Either the local machine or some high performance computing services Dtk-tools is the suite of tools allowing to put it all together Local files Local computer

Overview Provide a shared way for the different teams to create and execute workflows involving experiments, analysis, and calibrations Easy to reproduce, share and reuse those workflows Abstracts a lot of technical minutia Lower the technical expertise requirement to leverage available resources Accelerate the creation of large experiments Provide a central repository for utilities Before this framework was created each teams had their own way of running workflows and using the IDM ecosystem. By providing a common way to leverage the available resources we enabled users to easily share reuse and reproduce the workflows and to abstract a lot of technical minutia. This led to lowering the technical expertise required and allowed our users to focus more on the science and less on the software. Now that dtk-tools is widely adopted it also provides a centralized repository for all sorts of utilities.

Technical details Written in Python 3.6 Free, open source, and easily extensible Supports EMOD and CMS Disease specific packages Malaria HIV TB Typhoid Disease specific packages : The different diseases have custom reports for each of them and analyzers to consume them

Main features – Input files EMOD needs input files to represent experiments’ environment Heterogeneous sources Local files IDM Weather and Demographics Input service National services (NOAA, WorldPop, etc.) Not all models need weather

IDM Weather and Demographics Service Input files workflow IDM Weather and Demographics Service Climate for 2014 – 2016 Resolution: 2.5’ Custom climate Local files Custom temperature for a given location A usual workflow would be to query the IDM service for the climate in a given place, for a given timestamp. You can query the service directly within the tools and retrieve the set of files. Then Let’s imagine I want to override the temperature for a given location with a CSV file that I have created locally. Well, the tools let you easily use both the base weather and the custom temperature file to create a custom weather. We also provide utilities to transform this custom weather into model readable files. You can then use those files with your simulations. It is interesting to note that we also provide utilities to go the other way and make model files readable. Model consumable set of files

Main features – Model parameters Models need parameters and scenarios EMOD Model needs parameters and interventions definition files The tools provide Starting points with default files for different types of simulations Convenient ways of editing model parameters Shortcut functions allowing to accelerate the creation of interventions

Main features – Asset collections When running simulations, collects all the necessary files Packaging handled transparently by the tools Easy to share, reproduce, and reuse Namawala input files EMOD 2.13 executable New collection Local custom files Creation of a new asset collection from different collections

Main features – Sweeps Convenient and easy way of exploring parameters space Accelerate the creation and tagging of simulations Coverage Multiplier 20% 0.5 1 2 … 75% 5 Base configuration files Simulation Simulation Simulation Sweep Definition Coverage: 20% to 75% (5% step) Rainfall multiplier: 0.5, 1, 2, 5 12*4 = 48 Simulation Simulation Creation of a sweep

Main features – Templating Templating system to modify specific blocks within the model inputs Build complicated scenarios easily Base campaign Scenario definition 10% A 0.5 20% 1 30% 2 B 5 Simulation Intervention 1 Simulation Intervention 2 Simulation Simulation Intervention 3 Simulation Simulation Creation of an experiment based on templates

Main features – Analyzers Simulations can provide a lot of results but usually only a subset of those results are relevant Analyzers are extracting needed per simulation data and then combining results to support investigations Simulations outputs Analysis Simulations can provide a lot of results but usually only a subset of those results are relevant. Analyzers are extracting the relevant data from each simulations and then combine those together to support your investigations. This analyzer process works as follow: In one hand, we have a set of simulations outputs that can come from multiple experiments In the other hand we have our analysis that can include one or more analyzers (here 3) Each analyzers will automatically select the data it needs and produce the output. Analyzers can span a wide range of outputs, from charts to CSV files or heatmaps but they are flexible enough to basically be able to generate anything that Python can handle. Also those analyzers are not tied to a specific experiment or users and can be shared. Another major advantage is that you are not required to know where those outputs files are located, the tools are taking care of all the technical details for you to let you focus on how you want to process the data. Analysis process

Main features – Calibration Enables exploration of a multidimensional parameter space Fit the model outputs based on reference data Several algorithms available to optimize search in the parameter space Set model parameters that cannot be measured Reasonable parameter set when data is sparse

Calibration – Overview Study Site Base simulation Reference data Model Run the scenarios Produce outputs Comparison Compare with reference Assign a likelihood New parameters Likelihoods Search state Monitoring Next point algorithm Define next best set of parameters to try

How to get the tools? GitHub repository: https://github.com/InstituteforDiseaseModeling/dtk-tools Private repository Contact us: support@idmod.org

Thank you !