Presentation is loading. Please wait.

Presentation is loading. Please wait.

BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu.

Similar presentations


Presentation on theme: "BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu."— Presentation transcript:

1 BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu

2 BickhartADSA Meeting(2) 2013 Amount of sequence data SRA chart From Wikipedia Commons ~ 312.5 Human genome equivalents ~ 312500 Human genome equivalents

3 BickhartADSA Meeting(3) 2013 Why sequence DNA? l Best genotyping tool w BovineHD chip (~0.03% of the genome) w Whole Genome Seq (~90% of the genome) l New Disease Discovery w Low frequency variants w Sometimes not SNPs l Arrays are cost effective

4 BickhartADSA Meeting(4) 2013 Sequencing Stage Whole Genome Sequencing Based on Genomic DNA Samples turned into “libraries” Illumina HiSeq 2000 Sequencer Takes ~10-14 days for 100 x 100 Minimal hands-on time Produces 600 gigabases

5 BickhartADSA Meeting(5) 2013 Reads must be aligned to a reference genome Raw Sequencer OutputAlignment to the Genome Variant Detection This analysis is very disk-IO intensive.

6 BickhartADSA Meeting(6) 2013 So you decided to start sequencing l Total Time (sample to sequence): 3 weeks w That’s assuming nothing went wrong! w More realistic: months l Total Cost: ~$2400 per sample l Resulting Data w Large text files w ~300 gigabytes compressed l Analysis w Often underestimated w Can take months as well

7 BickhartADSA Meeting(7) 2013 Why you need to use a Pipeline Automates analysis Maximizes resource consumption You don’t want to burn out your PostDoc

8 BickhartADSA Meeting(8) 2013 CoSVarD l Easy Config File Input l “Divide and Conquer” l Flexible and customizable l Excel spreadsheets l Summary Statistics l All Variants

9 BickhartADSA Meeting(9) 2013 Configuration File Input

10 BickhartADSA Meeting(10) 2013 Output Summary l Full Sequence Alignment l CNVs, SNPs, INDELs l Genome-wide Copy Number l Gene Annotation

11 BickhartADSA Meeting(11) 2013 Holstein Bulls Sequenced DatasetNumber of Animals Millions of Reads Avg X coverage Low Cov.243,2695 X High Cov.92,53920 X Server: 100 GB Ram, 24 processor cores Processing time: Low Cov. 415 CPU days High Cov.317 CPU days 17.3 real days 13.2 real days

12 BickhartADSA Meeting(12) 2013 Identifying interesting SNPs Type (alphabetical order)CountPercent DOWNSTREAM641,6234.034% EXON5,7650.036% INTERGENIC10,483,57065.911% INTRON3,993,92125.11% NON_SYNONYMOUS_CODING47,6340.299% NON_SYNONYMOUS_START50% SPLICE_SITE_ACCEPTOR4730.003% SPLICE_SITE_DONOR4790.003% START_GAINED8700.005% START_LOST580% STOP_GAINED7250.005% STOP_LOST360% SYNONYMOUS_CODING54,8170.345% SYNONYMOUS_STOP330% UPSTREAM641,3814.032% Stop Gain

13 BickhartADSA Meeting(13) 2013 Genetic impact of Copy Number 123456789101112131415161718192021222324 PRP1 ODC Ferritin FABP2 Copy Number Color Scale 97532

14 BickhartADSA Meeting(14) 2013 Conclusions l Sequencing is a powerful tool w Not useful for everything w Future is in Whole Genome Seq l Analysis is a huge concern l Cosvard w Flexible and customizable w Powerful w Expected Public Release: End of Year

15 Acknowledgements BFGL – George Liu – Lingyang Xu AIPL – George Wiggans – Tabatha Cooper – Jana Hutchison – Paul VanRaden – John Cole Fernando Garcia of UNESP Harris Lewin of University of Illinois Jerry Taylor and Bob Schnabel of University of Missouri Funded by National Research Initiative (NRI) Grant No. 2007-35205-17869 and 2011-67015-30183 from USDA-NIFA

16 Sample Preparation Time is Substantial DNA Extraction: ~12 hours (30 mins) DNA QC: ~1-2 hours (1-2 hours) Library Construction: 48 hours (12 hours) Library QC: ~2-4 hours (1 hour) Total: 3-4 days (15.5 hours) *Parentheses indicate “hands-on” time

17 Storage Concerns What to save? – Raw data? – Processed results? How much workspace? Suggestions: – Workspace: 10 x compressed files – Save alignments – Backup REGULARLY!!!

18 We are here

19 Computational Logistics Desktop computers – Viable for single lanes – Long computation time Servers – Best solution – >100 gb Ram and > 16 processor cores Cloud – Amazon web services (http://aws.amazon.com/lifesciences/)http://aws.amazon.com/lifesciences/ – IAnimal/IPlant (http://www.iplantcollaborative.org/)http://www.iplantcollaborative.org/ Bottlenecks to consider – alignment: disk-IO – variant calling: memory & cpu


Download ppt "BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu."

Similar presentations


Ads by Google