Download presentation
Presentation is loading. Please wait.
Published byMorgan Pearson Modified over 9 years ago
1
BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu
2
BickhartADSA Meeting(2) 2013 Amount of sequence data SRA chart From Wikipedia Commons ~ 312.5 Human genome equivalents ~ 312500 Human genome equivalents
3
BickhartADSA Meeting(3) 2013 Why sequence DNA? l Best genotyping tool w BovineHD chip (~0.03% of the genome) w Whole Genome Seq (~90% of the genome) l New Disease Discovery w Low frequency variants w Sometimes not SNPs l Arrays are cost effective
4
BickhartADSA Meeting(4) 2013 Sequencing Stage Whole Genome Sequencing Based on Genomic DNA Samples turned into “libraries” Illumina HiSeq 2000 Sequencer Takes ~10-14 days for 100 x 100 Minimal hands-on time Produces 600 gigabases
5
BickhartADSA Meeting(5) 2013 Reads must be aligned to a reference genome Raw Sequencer OutputAlignment to the Genome Variant Detection This analysis is very disk-IO intensive.
6
BickhartADSA Meeting(6) 2013 So you decided to start sequencing l Total Time (sample to sequence): 3 weeks w That’s assuming nothing went wrong! w More realistic: months l Total Cost: ~$2400 per sample l Resulting Data w Large text files w ~300 gigabytes compressed l Analysis w Often underestimated w Can take months as well
7
BickhartADSA Meeting(7) 2013 Why you need to use a Pipeline Automates analysis Maximizes resource consumption You don’t want to burn out your PostDoc
8
BickhartADSA Meeting(8) 2013 CoSVarD l Easy Config File Input l “Divide and Conquer” l Flexible and customizable l Excel spreadsheets l Summary Statistics l All Variants
9
BickhartADSA Meeting(9) 2013 Configuration File Input
10
BickhartADSA Meeting(10) 2013 Output Summary l Full Sequence Alignment l CNVs, SNPs, INDELs l Genome-wide Copy Number l Gene Annotation
11
BickhartADSA Meeting(11) 2013 Holstein Bulls Sequenced DatasetNumber of Animals Millions of Reads Avg X coverage Low Cov.243,2695 X High Cov.92,53920 X Server: 100 GB Ram, 24 processor cores Processing time: Low Cov. 415 CPU days High Cov.317 CPU days 17.3 real days 13.2 real days
12
BickhartADSA Meeting(12) 2013 Identifying interesting SNPs Type (alphabetical order)CountPercent DOWNSTREAM641,6234.034% EXON5,7650.036% INTERGENIC10,483,57065.911% INTRON3,993,92125.11% NON_SYNONYMOUS_CODING47,6340.299% NON_SYNONYMOUS_START50% SPLICE_SITE_ACCEPTOR4730.003% SPLICE_SITE_DONOR4790.003% START_GAINED8700.005% START_LOST580% STOP_GAINED7250.005% STOP_LOST360% SYNONYMOUS_CODING54,8170.345% SYNONYMOUS_STOP330% UPSTREAM641,3814.032% Stop Gain
13
BickhartADSA Meeting(13) 2013 Genetic impact of Copy Number 123456789101112131415161718192021222324 PRP1 ODC Ferritin FABP2 Copy Number Color Scale 97532
14
BickhartADSA Meeting(14) 2013 Conclusions l Sequencing is a powerful tool w Not useful for everything w Future is in Whole Genome Seq l Analysis is a huge concern l Cosvard w Flexible and customizable w Powerful w Expected Public Release: End of Year
15
Acknowledgements BFGL – George Liu – Lingyang Xu AIPL – George Wiggans – Tabatha Cooper – Jana Hutchison – Paul VanRaden – John Cole Fernando Garcia of UNESP Harris Lewin of University of Illinois Jerry Taylor and Bob Schnabel of University of Missouri Funded by National Research Initiative (NRI) Grant No. 2007-35205-17869 and 2011-67015-30183 from USDA-NIFA
16
Sample Preparation Time is Substantial DNA Extraction: ~12 hours (30 mins) DNA QC: ~1-2 hours (1-2 hours) Library Construction: 48 hours (12 hours) Library QC: ~2-4 hours (1 hour) Total: 3-4 days (15.5 hours) *Parentheses indicate “hands-on” time
17
Storage Concerns What to save? – Raw data? – Processed results? How much workspace? Suggestions: – Workspace: 10 x compressed files – Save alignments – Backup REGULARLY!!!
18
We are here
19
Computational Logistics Desktop computers – Viable for single lanes – Long computation time Servers – Best solution – >100 gb Ram and > 16 processor cores Cloud – Amazon web services (http://aws.amazon.com/lifesciences/)http://aws.amazon.com/lifesciences/ – IAnimal/IPlant (http://www.iplantcollaborative.org/)http://www.iplantcollaborative.org/ Bottlenecks to consider – alignment: disk-IO – variant calling: memory & cpu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.