Download presentation
Presentation is loading. Please wait.
Published byClifford Melton Modified over 6 years ago
1
Harnessing the Power of Condor for Human Genetics
Bret A. Payseur Laboratory of Genetics University of Wisconsin
2
Our research: evolutionary genetics
Analysis of DNA variation across human populations to understand: Roles of different evolutionary forces Prospects for finding genes that cause disease Analysis of crosses between mouse strains to understand: How anatomy evolves How new species arise
3
Our computational needs
Multi-dimensional statistical inference: we measure many different (partially correlated) features of DNA variation Genome-scale analyses: we measure variation at thousands to millions of sites Replicates: we conduct population simulations to measure stochastic effects
4
Haplotype phasing Each human has two copies of each site on a chromosome (one from each parent) A T G C Site Site 2
5
We want to know which variant goes with which on the chromosome
Haplotype phasing We want to know which variant goes with which on the chromosome A T G C Site Site 2
6
Haplotype phasing Genotyping technology cannot distinguish between these two possibilities in individuals that vary at both sites A T T A G C G C Configuration 1 Configuration 2
7
Solution: PHASE algorithm
Uses Markov Chain Monte Carlo (MCMC) sampling scheme Uses coalescent simulations based on population genetic principles Identifies haplotypes for each individual with statistical uncertainty (posterior probability) State of the art method in human genetics
8
Scope of problem Goal: reconstruct phase in a human dataset of genomic proportions Dataset is large 720 regions of the genome 100 variable sites per region 3 populations 60 individuals per population Computational approach is intensive
9
720 regions x 3 populations x 8 hours =
Scope of problem Average run time 8 hours 720 regions x 3 populations x 8 hours = 17,280 hours
10
5 Payseur lab computers:
Scope of problem Running full time on 5 Payseur lab computers: 144 days!
11
ENTER CONDOR
12
Approach Create submit file for each job – automated using perl script Submit each job – automated using a perl script
13
CONDOR submit file universe = standard executable = PHASE
error = phase.err log = phase.log should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = phase.in transfer_output_files = phase.out Requirements = ((OpSys == "LINUX") && ((Arch == "INTEL") || (Arch == "X86_64"))) Arguments = -MR -P1 phase.in phase.out queue
14
Running on vanilla universe
Huge increase in efficiency Challenge Run times often exceeded allocated CPU time Many jobs did not finish
15
CONDOR solution Use condor_compile on the standard universe to allow checkpointing Expand machine pool to include X86_64/LINUX and INTEL/LINUX nodes
16
Result Vanilla universe Standard universe Jobs finished Required time
500 2 months 720 10 days
17
We have also used CONDOR to…
Simulate genetic mapping of complex diseases in mice (Payseur and Place 2007; Genetics) Infer relationships among mouse strains used in biomedical research
18
We hope to use CONDOR for…
EVERYTHING
19
Acknowledgments Miron Livny Zach Miller David Schwartz
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.