Biopackages.net Operating System Packages for Bioinformatics Allen Day
What is a package? Software, config files, documentation, and/or data encapsulated in a single file Metadata describing: Version, license, package “category” Dependencies What the package provides
GMOD target audience Small MODs
Package Dependency Graph Dependencies What the package provides chado chado-Hsa genome-Hsa-nibucsc-blat genome-Hsa-annotation-affymetrix genome-Hsa-annotation-gene postgresql-AffxSeq postgresql-server perl-bioperl obo-core perl-go-perl
Dependencies Build Dependency Installation Dependency
What is a Package Manager? Tools to manage installation, upgrade, uninstallation of packages Verify package integrity (checksums) Maintain system integrity Transactional Allow rollbacks Dependency checking Dependency graph recursion Allow software customization (patches)
Current Generation of PMs RPM Dpkg Apt Yum Emerge tgz/bz2 Windows Installer
Why bioinformatics packages? Consistency of installation process Bioinfo. package installs vary wildly, and commonly lack documentation Automatic dependency installation Perl modules especially bad – bioperl has 60+ modules in its dependency tree Integrity/Auditing of system state Know an installed package works, which version, how to replicate system setup Tighter integration with operating system Daemons, config & log file locations, etc.
What’s available? RPM packages only right now Primary focus on Fedora Core 2 Some RPMs also available for Fedora Core 3 RedHat 9 Cygwin
What’s available? Three primary foci Applications Libraries Data sets
Applications Gbrowse Textpresso BLAT daemon NCBI Toolkit (BLAST, etc) HMMer
What’s available? Libraries Bioperl R & Bioconductor Squid EMBOSS
What’s available? Data sets Genome & protein sequence Sequence features Ontologies All installed using a common directory structure
What’s available? UCSC tools (utilities, BLAT system service, CGI scripts) Bioperl R / Bioconductor GMOD apps (Gbrowse, Textpresso, …) Data packages Genome sequence (fa, nib, blastdb) Genome features (Affy probeset alignments, mRNA, etc)
GMOD Components Available chado-Hsagbrowsetextpresso gmod-web-Hsa turnkey chado das2-Hsa apollo-Hsa cmap-Hsa ‘Hsa’ can be substituted for your organism Currently built for ‘Cel’, ‘Hsa’, ‘Sce’ ucsc-BLATgenome-Hsa-nib
More details… chado chado-Hsa genome-Hsa-nibucsc-blat perl-go-perl genome-Hsa-annotation-affymetrix genome-Hsa-annotation-gene postgresql-AffxSeq postgresql-server perl-bioperl ……………
Gene Expression Components chado-HsaBioconductorR Quant/Norm Pipeline chado-GEC DAS/2 for Genotyping, GeneChip
Resources ~1000 RPMs for Fedora Core 2, 3 Available via yum See site for a configuration example.
TODO Support more architectures Build for Cygwin & OS X. RPM has been ported to both Automate package build process Build farm of multiple architectures, controllable via scheduler (GridEngine) Automate (if possible) inclusion of new software / data releases
TODO Build community interest and involvement Keep adding more packages! Keep existing packages current!
Acknowledgements Patrick Alger Jared Fox Brian O’Connor Todd Harris Lincoln Stein Stanley Nelson
Anatomy of a specfile Metadata Name Depends Provides Changelog Build & install script hooks %prep %build %install %post %preun