BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu.

Slides:

Advertisements

Similar presentations

2007 Paul VanRaden, Mel Tooker, and Nicolas Gengler Animal Improvement Programs Lab, Beltsville, MD, USA, and Gembloux Agricultural U., Belgium

Advertisements

What is genomesontheCloud ?

Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.

Using the whole read: Structural Variation detection with RPSR

Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏

Dawei Lin, Ph.D. Director, Bioinformatics Core UC Davis Genome Center July 20, 2008, SLIMS (Solexa sequencing.

2007 Paul VanRaden and Jeff O’Connell Animal Improvement Programs Lab, Beltsville, MD U MD College of Medicine, Baltimore, MD

Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.

“How Perl Saved the Human Genome Project” DATE: Early February, 1996 LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing.

Wiggans, 2013RL meeting, Aug. 15 (1) Dr. George R. Wiggans, Acting Research Leader Bldg. 005, Room 306, BARC-West (main office);

Whole Exome Sequencing for Variant Discovery and Prioritisation

2007 Paul VanRaden 1, Curt Van Tassell 2, George Wiggans 1, Tad Sonstegard 2, Jeff O’Connell 1, Bob Schnabel 3, Jerry Taylor 3, and Flavio Schenkel 4,

DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.

Copyright © 2011 Partek Incorporated. All rights reserved. Statistics Visualizations Annotations Start-to-Finish Analysis of Integrated Genomics.

2007 Paul VanRaden 1, George Wiggans 1, Curt Van Tassell 2, Tad Sonstegard 2, Jeff O’Connell 1, Bob Schnabel 3, Jerry Taylor 3, and Flavio Schenkel 4,

Bioinformatics Core Facility Ernesto Lowy February 2012.

WiggansARS Big Data Workshop – July 16, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,

GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB

ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees

Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.

J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD , USA Use of NGS.

2007 J. B. Cole 1,*, P. M. VanRaden 1, J. R. O'Connell 3, C. P. Van Tassell 1,2, T. S. Sonstegard 2, R. D. Schnabel 4, J. F. Taylor 4, and G. R. Wiggans.

Wiggans, 2013Japanese Genomics Tour (1) Dr. George R. WiggansDr. H. Duane Norman Acting Research LeaderInterim Administrator Animal Improvement Programs.

Data Storage What is storage? Storage refers to the media and methods used to keep information available for later use. Some things will be needed right.

PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.

G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD G.R. WiggansAg Discovery.

Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.

2007 Paul VanRaden, Curt Van Tassell, George Wiggans, Tad Sonstegard, and Jeff O’Connell Animal Improvement Programs Laboratory and Bovine Functional Genomics.

Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.

An Efficient Method of Generating Whole Genome Sequence for Thousands of Bulls Chuanyu Sun 1 and Paul M. VanRaden 2 1 National Association of Animal Breeders,

2007 Paul VanRaden and Mel Tooker Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA

Genomes and Genomics.

2007 Paul VanRaden, Mel Tooker, Jan Wright, Chuanyu Sun, and Jana Hutchison Animal Improvement Programs Lab, Beltsville, MD National Association of Animal.

2007 Paul VanRaden, George Wiggans, Jeff O’Connell, John Cole, Animal Improvement Programs Laboratory Tad Sonstegard, and Curt Van Tassell Bovine Functional.

1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.

Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.

Methods in genome wide association studies. Norú Moreno

P1097: Candidate causative mutation on BTA18 associated with calving and conformation traits in Holstein bulls J.B. Cole, 1 J.L. Hutchison, 1 D.J. Null,

WiggansCDCB industry meeting – Sept. 29, 2015 (1) George R. Wiggans Animal Genomics and Improvement Laboratory Agricultural Research Service, USDA Beltsville,

John B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD AIPL Report.

2007 Paul VanRaden, Melvin Tooker*, George Wiggans Animal Improvement Programs Laboratory 2009 Can you believe those genomic.

Wiggans, 2014ASAS-ADSA-CSAS Joint Annual Meeting (1) G.R. Wiggans* 1, T.A. Cooper 1, P.M. VanRaden 1, D.J. Null 1, J.L. Hutchison 1, O.M. Meland 2, M.E.

2007 Paul VanRaden Animal Improvement Programs Laboratory USDA Agricultural Research Service, Beltsville, MD, USA

WiggansARS Big Data Computing Workshop (1) 2013 George R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville,

Computational Research in the Battelle Center for Mathmatical medicine.

George R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA, Beltsville, MD Select Sires’

.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.

2007 Paul VanRaden and Melvin Tooker* Animal Improvement Programs Laboratory 2010 Gains in reliability from combining subsets.

Future Research in Genomics Curt Van Tassell AIPL Centennial Celebration November 28, 2008.

Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.

2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland.

Genome STRiP ASHG Workshop demo materials

P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA

Fine Mapping and Discovery of Recessive Mutations that Cause Abortions in Dairy Cattle P. M. VanRaden 1, D. J. Null 1 *, T.S. Sonstegard 2, H.A. Adams.

G.R. Wiggans* 1, P.M. VanRaden 1, L.R. Bacheller 1, F.A. Ross, Jr. 1, M.E. Tooker 1, J.L. Hutchison 1, T.S. Sonstegard 2, and C.P. Van Tassell 1,2 1 Animal.

2007 Paul VanRaden Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA 2008 New.

GridQTL High Performance QTL analysis via the Grid/Cloud.

A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.

2007 Paul VanRaden 1, Curt Van Tassell 2, George Wiggans 1, Tad Sonstegard 2, Bob Schnabel 3, Jerry Taylor 3, and Flavio Schenkel 4, Paul VanRaden 1, Curt.

XNAT IT Planning Chip Schweiss June 7, Basic Requirements HTTPS proxy + Tomcat.

From Reads to Results Exome-seq analysis at CCBR

Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.

Cancer Genomics Core Lab

Gil McVean Department of Statistics

Harnessing the Power of Condor for Human Genetics

Genomic Data Clustering on FPGAs for Compression

Visualization of Results from Genomic Predictions

MapView: visualization of short reads alignment on a desktop computer

Perspectives from Human Studies and Low Density Chip

Alignment and CNV analysis in cattle

Presentation transcript:

BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu

BickhartADSA Meeting(2) 2013 Amount of sequence data SRA chart From Wikipedia Commons ~ Human genome equivalents ~ Human genome equivalents

BickhartADSA Meeting(3) 2013 Why sequence DNA? l Best genotyping tool w BovineHD chip (~0.03% of the genome) w Whole Genome Seq (~90% of the genome) l New Disease Discovery w Low frequency variants w Sometimes not SNPs l Arrays are cost effective

BickhartADSA Meeting(4) 2013 Sequencing Stage Whole Genome Sequencing Based on Genomic DNA Samples turned into “libraries” Illumina HiSeq 2000 Sequencer Takes ~10-14 days for 100 x 100 Minimal hands-on time Produces 600 gigabases

BickhartADSA Meeting(5) 2013 Reads must be aligned to a reference genome Raw Sequencer OutputAlignment to the Genome Variant Detection This analysis is very disk-IO intensive.

BickhartADSA Meeting(6) 2013 So you decided to start sequencing l Total Time (sample to sequence): 3 weeks w That’s assuming nothing went wrong! w More realistic: months l Total Cost: ~$2400 per sample l Resulting Data w Large text files w ~300 gigabytes compressed l Analysis w Often underestimated w Can take months as well

BickhartADSA Meeting(7) 2013 Why you need to use a Pipeline Automates analysis Maximizes resource consumption You don’t want to burn out your PostDoc

BickhartADSA Meeting(8) 2013 CoSVarD l Easy Config File Input l “Divide and Conquer” l Flexible and customizable l Excel spreadsheets l Summary Statistics l All Variants

BickhartADSA Meeting(9) 2013 Configuration File Input

BickhartADSA Meeting(10) 2013 Output Summary l Full Sequence Alignment l CNVs, SNPs, INDELs l Genome-wide Copy Number l Gene Annotation

BickhartADSA Meeting(11) 2013 Holstein Bulls Sequenced DatasetNumber of Animals Millions of Reads Avg X coverage Low Cov.243,2695 X High Cov.92,53920 X Server: 100 GB Ram, 24 processor cores Processing time: Low Cov. 415 CPU days High Cov.317 CPU days 17.3 real days 13.2 real days

BickhartADSA Meeting(12) 2013 Identifying interesting SNPs Type (alphabetical order)CountPercent DOWNSTREAM641, % EXON5, % INTERGENIC10,483, % INTRON3,993, % NON_SYNONYMOUS_CODING47, % NON_SYNONYMOUS_START50% SPLICE_SITE_ACCEPTOR % SPLICE_SITE_DONOR % START_GAINED % START_LOST580% STOP_GAINED % STOP_LOST360% SYNONYMOUS_CODING54, % SYNONYMOUS_STOP330% UPSTREAM641, % Stop Gain

BickhartADSA Meeting(13) 2013 Genetic impact of Copy Number PRP1 ODC Ferritin FABP2 Copy Number Color Scale 97532

BickhartADSA Meeting(14) 2013 Conclusions l Sequencing is a powerful tool w Not useful for everything w Future is in Whole Genome Seq l Analysis is a huge concern l Cosvard w Flexible and customizable w Powerful w Expected Public Release: End of Year

Acknowledgements BFGL – George Liu – Lingyang Xu AIPL – George Wiggans – Tabatha Cooper – Jana Hutchison – Paul VanRaden – John Cole Fernando Garcia of UNESP Harris Lewin of University of Illinois Jerry Taylor and Bob Schnabel of University of Missouri Funded by National Research Initiative (NRI) Grant No and from USDA-NIFA

Sample Preparation Time is Substantial DNA Extraction: ~12 hours (30 mins) DNA QC: ~1-2 hours (1-2 hours) Library Construction: 48 hours (12 hours) Library QC: ~2-4 hours (1 hour) Total: 3-4 days (15.5 hours) *Parentheses indicate “hands-on” time

Storage Concerns What to save? – Raw data? – Processed results? How much workspace? Suggestions: – Workspace: 10 x compressed files – Save alignments – Backup REGULARLY!!!

We are here

Computational Logistics Desktop computers – Viable for single lanes – Long computation time Servers – Best solution – >100 gb Ram and > 16 processor cores Cloud – Amazon web services ( – IAnimal/IPlant ( Bottlenecks to consider – alignment: disk-IO – variant calling: memory & cpu