Harnessing the Power of Condor for Human Genetics

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

Uncertainty Quantification & the PSUADE Software
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
MALD Mapping by Admixture Linkage Disequilibrium.
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Comparative Genomics Bio Informatics Scott Gulledge.
Reconstructing Genealogies: a Bayesian approach Dario Gasbarra Matti Pirinen Mikko Sillanpää Elja Arjas Department of Mathematics and Statistics
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
High End Computing at Cardiff University Focus on Campus Grids James Osborne.
Open Science Grid: More compute power Alan De Smet
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Minerva Infrastructure Meeting – October 04, 2011.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Vladimir Litvin, Harvey Newman Caltech CMS Scott Koranda, Bruce Loftis, John Towns NCSA Miron Livny, Peter Couvares, Todd Tannenbaum, Jamie Frey Wisconsin.
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
Parallel Optimization Tools for High Performance Design of Integrated Circuits WISCAD VLSI Design Automation Lab Azadeh Davoodi.
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
CS177 Lecture 10 SNPs and Human Genetic Variation
Condor Project Computer Sciences Department University of Wisconsin-Madison Case Studies of Using.
Remote Cluster Connect Factories David Lesny University of Illinois.
Kyle Tretina with a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK Analysis of the Positively Selected and Non-Positively Selected.
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
California Pacific Medical Center
Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
The International Consortium. The International HapMap Project.
The genomes of recombinant inbred lines
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
GridQTL High Performance QTL analysis via the Grid/Cloud.
Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Gil McVean Department of Statistics
Constrained Hidden Markov Models for Population-based Haplotyping
Genetic Variation Genetic Variation in Populations
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Condor: Job Management
US CMS Testbed.
Understanding Supernovae with Condor
Job Matching, Handling, and Other HTCondor Features
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Haplotype Reconstruction
The Condor JobRouter.
IBD Estimation in Pedigrees
Wellington Cabrera Advisor: Carlos Ordonez
SNPs and CNPs By: David Wendel.
Presentation transcript:

Harnessing the Power of Condor for Human Genetics Bret A. Payseur Laboratory of Genetics University of Wisconsin

Our research: evolutionary genetics Analysis of DNA variation across human populations to understand: Roles of different evolutionary forces Prospects for finding genes that cause disease Analysis of crosses between mouse strains to understand: How anatomy evolves How new species arise

Our computational needs Multi-dimensional statistical inference: we measure many different (partially correlated) features of DNA variation Genome-scale analyses: we measure variation at thousands to millions of sites Replicates: we conduct population simulations to measure stochastic effects

Haplotype phasing Each human has two copies of each site on a chromosome (one from each parent) A T G C Site 1 Site 2

We want to know which variant goes with which on the chromosome Haplotype phasing We want to know which variant goes with which on the chromosome A T G C Site 1 Site 2

Haplotype phasing Genotyping technology cannot distinguish between these two possibilities in individuals that vary at both sites A T T A G C G C Configuration 1 Configuration 2

Solution: PHASE algorithm Uses Markov Chain Monte Carlo (MCMC) sampling scheme Uses coalescent simulations based on population genetic principles Identifies haplotypes for each individual with statistical uncertainty (posterior probability) State of the art method in human genetics

Scope of problem Goal: reconstruct phase in a human dataset of genomic proportions Dataset is large 720 regions of the genome 100 variable sites per region 3 populations 60 individuals per population Computational approach is intensive

720 regions x 3 populations x 8 hours = Scope of problem Average run time 8 hours 720 regions x 3 populations x 8 hours = 17,280 hours

5 Payseur lab computers: Scope of problem Running full time on 5 Payseur lab computers: 144 days!

ENTER CONDOR

Approach Create submit file for each job – automated using perl script Submit each job – automated using a perl script

CONDOR submit file universe = standard executable = PHASE error = phase.err log = phase.log should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = phase.in transfer_output_files = phase.out Requirements = ((OpSys == "LINUX") && ((Arch == "INTEL") || (Arch == "X86_64"))) Arguments = -MR -P1 phase.in phase.out queue

Running on vanilla universe Huge increase in efficiency Challenge Run times often exceeded allocated CPU time Many jobs did not finish

CONDOR solution Use condor_compile on the standard universe to allow checkpointing Expand machine pool to include X86_64/LINUX and INTEL/LINUX nodes

Result Vanilla universe Standard universe Jobs finished Required time 500 2 months 720 10 days

We have also used CONDOR to… Simulate genetic mapping of complex diseases in mice (Payseur and Place 2007; Genetics) Infer relationships among mouse strains used in biomedical research

We hope to use CONDOR for… EVERYTHING

Acknowledgments Miron Livny Zach Miller David Schwartz