Download presentation
Presentation is loading. Please wait.
Published byAmber McDonald Modified over 6 years ago
1
Automating reproducible analyses with AnADAMA2 and bioBakery Workflows
Lauren McIver Automating reproducible analyses with AnADAMA2 and bioBakery Workflows Curtis Huttenhower Galeb Abu-Ali Gholamali (Ali) Rahnavard STAMPS 2017 Harvard T.H. Chan School of Public Health Department of Biostatistics
2
AnADAMA2: Another Automated Data Analysis Management Application
Creating efficient, reproducible workflows A set of modular tasks to transform inputs into outputs Data, tables, visualizations, statistics… Reproducible All tasks are logged Includes commands and software versions Automated documentation generation Efficient Only those tasks that need to be rerun will run Make-like operation using targets and dependencies Local and/or grid parallelization Jobs are dispatched/monitored/logged Resubmit if job exceeds time/memory
3
An example workflow from anadama2 import Workflow
workflow = Workflow( remove_options=["input", "output"]) workflow.do("ls /usr/bin/ | sort > [t:global_exe.txt]") workflow.do("ls $HOME/.local/bin/ | sort >[t:local_exe.txt]") workflow.do("join [d:global_exe.txt] [d:local_exe.txt] > [t:match_exe.txt]") workflow.go()
4
A fancier example workflow
5
Why workflows? Reproducibility Efficiency Modularity (=reusability)
Everything for an analysis is in one place. Provenance logged: commands, versions, times. Efficiency You will have to rerun your analysis. Modularity (=reusability) Simplicity Tasks are legible and understandable. Not to mention automatically documented.
6
bioBakery workflows
7
AnADAMA2 templated documentation
8
bioBakery automated reports
9
AnADAMA2 tutorial and the bioBakery Workflows
11
An even fancier example workflow
12
But all of that runs with a single command
13
AnADAMA2 local parallelization
from anadama2 import Workflow workflow = Workflow(remove_options=["input","output"]) downloads=["ftp://public-ftp.hmpdacc.org/HM16STR/by_sample/SRS fsa.gz", "ftp://public-ftp.hmpdacc.org/HM16STR/by_sample/SRS fsa.gz", "ftp://public-ftp.hmpdacc.org/HM16STR/by_sample/SRS fsa.gz"] for link in downloads: workflow.add_task( "wget -O [targets[0]] [args[0]]", targets=link.split("/")[-1], args=link) workflow.go() Workflow (written to a file named “download.py”) downloads three files. Input/output command line options are not used as all files are written to the current working directory. To run: $ python download.py Add the option “--local-jobs 3” to run all three downloads at once on your local machine. Rerun the exact command again and all tasks will be skipped because the files have already been downloaded. Rerun with the option “--skip-nothing” and all tasks will rerun. 13
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.