Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harnessing the Power of Condor for Human Genetics

Similar presentations


Presentation on theme: "Harnessing the Power of Condor for Human Genetics"— Presentation transcript:

1 Harnessing the Power of Condor for Human Genetics
Bret A. Payseur Laboratory of Genetics University of Wisconsin

2 Our research: evolutionary genetics
Analysis of DNA variation across human populations to understand: Roles of different evolutionary forces Prospects for finding genes that cause disease Analysis of crosses between mouse strains to understand: How anatomy evolves How new species arise

3 Our computational needs
Multi-dimensional statistical inference: we measure many different (partially correlated) features of DNA variation Genome-scale analyses: we measure variation at thousands to millions of sites Replicates: we conduct population simulations to measure stochastic effects

4 Haplotype phasing Each human has two copies of each site on a chromosome (one from each parent) A T G C Site Site 2

5 We want to know which variant goes with which on the chromosome
Haplotype phasing We want to know which variant goes with which on the chromosome A T G C Site Site 2

6 Haplotype phasing Genotyping technology cannot distinguish between these two possibilities in individuals that vary at both sites A T T A G C G C Configuration 1 Configuration 2

7 Solution: PHASE algorithm
Uses Markov Chain Monte Carlo (MCMC) sampling scheme Uses coalescent simulations based on population genetic principles Identifies haplotypes for each individual with statistical uncertainty (posterior probability) State of the art method in human genetics

8 Scope of problem Goal: reconstruct phase in a human dataset of genomic proportions Dataset is large 720 regions of the genome 100 variable sites per region 3 populations 60 individuals per population Computational approach is intensive

9 720 regions x 3 populations x 8 hours =
Scope of problem Average run time 8 hours 720 regions x 3 populations x 8 hours = 17,280 hours

10 5 Payseur lab computers:
Scope of problem Running full time on 5 Payseur lab computers: 144 days!

11 ENTER CONDOR

12 Approach Create submit file for each job – automated using perl script Submit each job – automated using a perl script

13 CONDOR submit file universe = standard executable = PHASE
error = phase.err log = phase.log should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = phase.in transfer_output_files = phase.out Requirements = ((OpSys == "LINUX") && ((Arch == "INTEL") || (Arch == "X86_64"))) Arguments = -MR -P1 phase.in phase.out queue

14 Running on vanilla universe
Huge increase in efficiency Challenge Run times often exceeded allocated CPU time Many jobs did not finish

15 CONDOR solution Use condor_compile on the standard universe to allow checkpointing Expand machine pool to include X86_64/LINUX and INTEL/LINUX nodes

16 Result Vanilla universe Standard universe Jobs finished Required time
500 2 months 720 10 days

17 We have also used CONDOR to…
Simulate genetic mapping of complex diseases in mice (Payseur and Place 2007; Genetics) Infer relationships among mouse strains used in biomedical research

18 We hope to use CONDOR for…
EVERYTHING

19 Acknowledgments Miron Livny Zach Miller David Schwartz


Download ppt "Harnessing the Power of Condor for Human Genetics"

Similar presentations


Ads by Google