Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.

Similar presentations


Presentation on theme: "Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland."— Presentation transcript:

1 Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland

2 Next-generation sequencing has revolutionized both biological research and clinical medicine, with sequencing of entire human genomes being used to predict drug responsiveness and to diagnose disease (for example Choi 2009).

3 http://www.pnas.org/content/106/45/19096/F3.expansion.html http://www.pnas.org/content/106/45/19096.full.pdf+html In contrast to traditional Sanger sequencing, next-generation sequencing datasets have shorter read lengths and higher error rates. This can create challenges for downstream analysis since even a small error rate will result in a large number of sequencing reads that contain errors due to the abundance of sequencing reads. Indeed, Illumina MiSeq data produces reads with an error rate of 0.1% (Glenn 2011), yet this corresponds to only ~85% of the 150 bp sequencing reads (.999 150 ) being error-free. Sequencing error in read

4  This module is designed for a genetics or molecular biology class. It will require 3 lecture/seminar class periods with optional additional Linux-based lab activities  Prior to beginning this module, students should be familiar with:  Sample preparation techniques for DNA sequencing  DNA replication and the enzymes that synthesize DNA  Nucleic acid and nucleotide structure

5  Initial evaluation of the quality of eukaryotic genome sequencing data  Implementation of error correction techniques  Comparison of the quality of sequencing data before and after error correction  Completed small eukaryotic genome data on Illumina platform  If students will not be performing command-line programming themselves, this data should be analyzed with:  Jellyfish to produce data on k-mer frequencies that students can use to generate a histogram in Excel  Quake to perform error correction so that students can be provided with pre- and post-error correction datasets

6  At the completion of this module, students will be able to:  Describe the important differences between highthroughput and traditional (low throughput) experiments  Explain the reasons for variations in the quality of highthroughput datasets  Utilize computational tools to quantify errors in sequencing data  Interpret the quality of a sequencing experiment and be able to implement effective quality control measures

7  Excel or other Analytical packages to create a k-mer frequency distribution  Galaxy to create a boxplot of PHRED33 scores  Optional: Quake and Jellyfish on Linux system to generate k-mer data and perform error correction

8  This module will develop students’ abilities to:  Apply the process of science ▪ Design experiment from methodological design through data analysis ▪ Analyze and interpret data  Ability to use modeling and simulation ▪ Design experimental strategies and predict outcomes  Ability to use quantitative reasoning ▪ Depict data using histograms and boxplots ▪ Interpret graphs and use the results of their analysis to modify error correction strategies

9  Intro to sequencing history and platforms  Discuss typical sources of error in sequencing reads  Discuss sequence output formats and PHRED33 scores  Upload raw data to Galaxy  Optional: Quake in Linux to manipulate parameters and improve quality http://www.nimr.mrc.ac.uk/mill-hill- essays/bringing-it-all-back-home-next- generation-sequencing-technology-and-you#

10  Introduce software packages that can be used to assess data quality  Demonstrate breaking sequencing reads into k-mers  Use Excel or Jellyfish to create k-mer graph  Use Excel or Jellyfish to create k-mer graph following manipulation of error correction parameters (variations in k-mer size) K-mer frequency distibution

11  Discussion of using PHRED33 scores to assess data quality  Create boxplots of PHRED33 scores in Galaxy for raw data  Create boxplots of PHRED33 scores in Galaxy for data post Quake correction  can have students compare outcomes following Quake correction with different parameters Raw Data Data post Quake correction

12  Why has next-generation sequencing technology led to a revolution in biology/medicine?  Discuss and predict how chemical and physical mechanisms lead to errors  Comparison of sequence improvement based on different parameters  How do software packages determine which base is in error and which is correct if sequencing reads conflict?  Why is it important to have a numerical measure of error in addition to the nucleotide sequence?

13  This module will be performed as a team-based project with students preparing and handing in a report at the end. Students will be able to:  Predict predominant types or sources of error based on experimental design and sequencing platform  Prepare a boxplot using Galaxy for an exemplary dataset and use the boxplot to evaluate the quality of the sequence data  Effectively improve the quality of any set of NGS reads prior to assembly

14  https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish  www.en.wikipedia.org/wiki/FASTQ_format www.en.wikipedia.org/wiki/FASTQ_format  Kenney DR, Schatz MC, Salzberg SL. 2010. Quake:quality- aware detection and correction of sequencing errors. Genome Biology 11:R116  Marcais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27:764-770. [Jellyfish program]  http://res.illumina.com/documents/products/techspotlights/tec hspotlight_sequencing.pdf http://res.illumina.com/documents/products/techspotlights/tec hspotlight_sequencing.pdf  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2581791/pdf/uk mss-2586.pdf http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2581791/pdf/uk mss-2586.pdf


Download ppt "Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland."

Similar presentations


Ads by Google