CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services
Semiconductor DNA Sequencing Ion Proton Ion Torrent “Sequencing on a Chip”
Semiconductor Sequencing in a Nutshell “It’s a computational pH meter”
Metagenomics Environmental samples of communities of organisms water, soil samples human & animal microbiomes mine tailings, oil spills deep sea, polar ice etc.
Metagenomics Pipeline CSU Cray supercomputer; Oak Ridge Titan supercomputer Torrent/Proton sequencers Megan NCBI nucleotide databases
Metagenomics Tools Ion Proton Sequencer In: Sample DNA Out: 50M DNA fragments NCBI nucleotide database DNA fragments 15M+ records Do the math: 50M * 15M = queries mpiBLAST Highly parallelized Blast algorithm NGS sample DNA Query NCBI DB CSU Cray XT6m 2,016 CPU cores
Metagenomics Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins Florida Everglades water samples (4) “What species are in the water?” CSU NextGen Sequencing Core: Ion Proton; 2 weeks CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week Results
Metagenomics Rarefaction curves Estimate species richness Asymptotic? Find rare species
Computational Resources Oak Ridge Titan Cray XK7 Supercomputer 300K CPU cores; 50M GPU cores mpiBlast NCBI nucleotide DB Query 100% of sample DNA CSU Cray XT6m Supercomputer 2,016 CPU cores mpiBlast NCBI nucleotide DB Query 1% of sample DNA Strong scaling
Summary Big Data Issues Semiconductor sequencer data Large-scale database queries High-performance computing