UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
High-performance bioinformatics Trivial/embarrassingly parallelizable – Mass of individual tasks (or divide up problems), run in parallel – E.g. analyze several sequences Non-trivial parallelism – Single task on many processors (data partitioning) – Example: Molecular dynamics
Resources for high-performance computing (HPC) Supercomputers – “a computer at the frontline of current processing capacity, particularly speed of calculation” Clusters – Processors in close proximity GRID computing – Distributed systems, (joined clusters)
UPPMAX Uppsala university’s resource for high performance computing (HPC) and related know-how – Computational clusters 6000 cores – Storage 1.4 PB parallel storage
A project at UPPMAX 13,152 MSEK from KAW/SNIC ( ) ~1 M cpuh/month on a shared cluster (kalkyl) ~1 PB cluster-attached parallel storage (bubo) Long term storage on SweStore (>1 PB) SMP machine, 64 core, 2TB RAM (halvan)
The cluster kalkyl 348 nodes with 8 cores each – 324 nodes with 24 GB – 16 nodes with 48 GB – 16 nodes with 72 GB – Total: 2784 cores SLURM queuing system
UPPNEX data flow
Knowledge Base / Community website
UPPNEX Application Experts Assist with NGS Analysis Available via mailing-list or by direct contact
Project growth
UPPNEX storage usage
Used CPU core h / month 1 week maintenance stop for move to new computer hall
A typical day at UPPMAX
UPPNEX software used
Conclusions: Community needs (storage) Access to high-availability storage Access to long term storage Sustainable file infrastructure
Support new types of HPC users and usage Keep up with the bioinformatics software flood Managing data growth (previously only computations) Conclusions: UPPNEX main challenges