David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf.

Slides:



Advertisements
Similar presentations
CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory.
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Calcul mental multiplications et divisions par multiplications par 0,1 0,01 0,001...
Protein Structure Prediction using ROSETTA
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
A many-core GPU architecture.. Price, performance, and evolution.
Functions.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Protein Sequence Classification Using Neighbor-Joining Method
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
GTL Facilities Characterization and Imaging of Molecular Machines Lee Makowski.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Systems Life Cycle A summary of what needs to be done.
Introduction of Apache Hama Edward J. Yoon, October 11, 2011.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Task Farming on HPCx David Henty HPCx Applications Support
Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
Human SNPs from short reads in hours using cloud computing Ben Langmead 1, 2, Michael C. Schatz 2, Jimmy Lin 3, Mihai Pop 2, Steven L. Salzberg 2 1 Department.
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
Ch 1.3 – Order of Operations
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
Swarm on the Biowulf2 Cluster Dr. David Hoover, SCB, CIT, NIH September 24, 2015.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
An Introduction to HDInsight June 27 th,
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
PhyloGrid: a development for a workflow in Phylogeny E. Montes 1, R. Isea 2 and R. Mayo 1 1 CIEMAT, Avda. Complutense, 22, Madrid, Spain 2 Fundación.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.
Multiplication Facts. 9 6 x 4 = 24 5 x 9 = 45 9 x 6 = 54.
© 2010 Pittsburgh Supercomputing Center Pittsburgh Supercomputing Center RP Update July 1, 2010 Bob Stock Associate Director
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
Multiplication Facts. 2x2 4 8x2 16 4x2 8 3x3.
Multiplication Facts Review: x 1 = 1 10 x 8 =
Biowulf: Molecular Dynamics and Parallel Computation Susan Chacko Scientific Computing Branch, Division of Computer System Services CIT, NIH.
CMS Multicore jobs at RAL Andrew Lahiff, RAL WLCG Multicore TF Meeting 1 st July 2014.
Multiplication Facts All Facts. 0 x 1 2 x 1 10 x 5.
Compute and Storage For the Farm at Jlab
OpenPBS – Distributed Workload Management System
Multiplication Facts.
Multiplication Facts.
Autofit and the Spectrum of Eugenol
Recap: introduction to e-science
Multiplication Facts.
Protein dynamics Folding/unfolding dynamics
Additive and Multiplicative Relationships
Helix - HPC/SLURM Tutorial
Cloud Distributed Computing Environment Hadoop
CICC Combines Grid Computing with Chemical Informatics
Processor Management Damian Gordon.
Phosphorylation and sequence disorder in microtubule-associated protein Tau.A, schematic illustration of the domain profile of Tau with all known phosphorylation.
CICC Chemical Compound Mining Workflows
Multiplication Facts.
Eva Nogales, Sjors H.W. Scheres  Molecular Cell 
Processor Management Damian Gordon.
Run time performance for all benchmarked software.
Presentation transcript:

David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf

Embarrassingly Parallel Problems GWAS, with huge numbers of SNPs Sequence analysis, assembly, and mapping Testing and validating statistical models Protein folding and threading Molecular docking and compound screening Tomographic reconstruction

Tsai et al., Mol. Biochem. Parasitology, online preprint 2008 Protein folding calculations with Rosetta++ 100,000 cpu hours Characterization of Surface Protein 3 from Malaria Parasite P. Falciparum

How to run multiple independent processes in parallel 16 independent processes input command outputinputoutput command

Biowulf Cluster Batch System batch job1 job1.out script batch job16 job16.out script

Node 1Node 2Node 3Node 4 job1job2job3job4 job1.outjob2.outjob3.outjob4.out biowulf% swarm -f file Swarm

Node 1 job1 job1.out biowulf% swarm -f file -b 4 Bundled Swarm

Swarm Facts Written and maintained by Helix Systems Staff swarm introduced in late % of all batch jobs run on the cluster since 2002 are swarm jobs ~60% of all wall time spent on swarm jobs swarm has been shared with clusters around the world

Swarm World Records Largest swarm: 683,445 commands Largest bundle: 24,000 commands per CPU

Future Challenges How to deal with larger multicore nodes? Node 1 Node 2Node 3