Presentation is loading. Please wait.

Presentation is loading. Please wait.

The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231.

Similar presentations


Presentation on theme: "The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231."— Presentation transcript:

1 The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231. Nanopore sequencing allows direct sequencing of single-stranded DNA molecules in real time. To explore whether Nanopore sequencing can discriminate methylated bases, we sequenced two plasmids with and without methylation at the dam and dcm methylation sites (GmeATC and CmeCWGG motifs respectively). By analyzing the resistance of 5-mers occupying a pore we observed significant differences between methylated and unmethylated sequences at the sites of both types of methylation, presumably due to the difference in the obstruction of current flow across the pore by the methyl groups. This finding indicates that Nanopore sequencing has the potential to robustly detect methylated bases, and that the current basecalling model would be improved by incorporating models of modified bases. Additionally, we observed a large fraction of 5-mers that significantly deviate from the default METRICHOR basecalling models, indicating that re-calibration is necessary to correct systematic biases in those models and improve overall sequencing accuracy. ABSTRACT 1. Discover whether resistance difference can discriminate methylated 5-mers from unmethylated ones within the pore 2. If so, develop methods to identify methylated bases within a genome with reference 3. Investigate nanopore resistance model behavior and differences caused by methylation. OBJECTVES Two pUC19 plasmids were prepared with and without the presence of two methylation enzymes: dam and dcm. Each sample was sequenced on the minION nanopore sequencer according to manufacture instructions and processed through the METRICHOR sequence analysis and basecaller. FASTQ sequences were extracted from the FAST5 files with PORETOOLS and subsequently aligned with BWA to the pUC19 reference sequence with 302x and 509x coverage of the control and methylated samples respectively. Once the read alignments to the reference are generated from the resulting BAM files, the event alignments to the reference are recalculated by a modified version of NANOPOLISH’s eventalign to obtain metrics of each event, including the observed current and the expected current from the pore-model that METRICHOR generated. Statistics were then calculated between the observed and expected current values for the events of the reads. MATERIALS & METHODS RESULTS (cont) CONCLUSIONS Nanopore sequencing shows great promise to be a novel and robust method for identifying methylated sites within a genome. We demonstrated that every methylated site leads to a systematic deviation from the current models. We suggest incorporating methylated bases into the basecalling models to increase their accuracy. REFERENCES Loman NJ, Quinlan AR. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics. 2014 Dec 1;30(23):3399-401. doi: 10.1093/bioinformatics/btu555. Epub 2014 Aug 20. PubMed PMID: 25143291; PubMed Central PMCID: PMC4296151. Li H, Durbin R. Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754- 60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PubMed PMID: 19451168; PubMed Central PMCID: PMC2705234. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. bioRxiv doi: http://dx.doi.org/10.1101/015552http://dx.doi.org/10.1101/015552 Simpson JT. Nanopolish - Signal-level algorithms for MinION data https://github.com/jts/nanopolish https://github.com/jts/nanopolish Quick J, Quinlan AR, Loman NJ. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. Gigascience. 2014 Oct 20;3:22. doi: 10.1186/2047-217X-3-22. eCollection 2014. Erratum in: Gigascience. 2015;4:6. PubMed PMID: 25386338; PubMed Central PMCID: PMC4226419. ACKNOWLEDGEMENTS Many thanks to Rich Roberts for providing the pUC19 samples. Thanks to the JGI team members: Matt Blow, Chris Daum, Matt Zane, Chew Yee Ngan, Cindy Choi, Hope Tice, Shweta Deshpande, Chia-Lin Wei, Len Pennacchio and Eddy Rubin Contact Rob Egan (RSEgan@lbl.gov) for any questions Identification of Methylation sites on pUC19 using Oxford Nanopore Sequencing Rob Egan, Don Kang and Zhong Wang RESULTS Figure 2: Average of the change of the current z-scores between the modified and control samples by position along pUC13 separated by each event type (template-forward, template-reverse, complement-forward and complement-reverse). Blue and red vertical lines are the CCWGG and GATC methylation motifs respectively. Figure 3: Rolling window over all 5-mer events that overlap a given base position of pUC19, of the z-score differences between the control and methylated samples. Blue and red vertical lines are the CCWGG and GATC methylation motifs respectively. Figure 1: The 15 most significantly different 5-mer z-scores (>0.33) between the control (green) and methylated (blue) samples of pUC19. Note that the GATC and CCAGG motifs dominate the set, where methylated GATC significantly reduces the current and methylated CCAGG significantly increases the current. Of the 15 kmers, only the adjacent kmers CGGAC and GGACA can not be explained by methylation events. RESULTS (cont) Figure 4: The 11 5-mer models most severely biased from the E_coli_R73 dataset provided by Quick et al. Note 4 of the 11 include the GATC motif, suggesting that the basecalling of E. coli, which was methylated during sequencing, would be improved if calibrated for a methylated sample. We prepared two samples of pUC19 with and without dam / dcm methylation sites populated (GmeATC / CmeCWGG respectively). For each 5-mer in the basecalling model, we calculated the z-score between the observed current and the expected current of the basecalling model. We then compared the distribution z-scores for both the control and methylated sample of pUC19. Figure 1 plots the 15 most significantly changed z-score distributions between the two samples and 13 of them are from 5- mers that participate in the two studied methylation motifs, with the most significantly separated being TGATC. Interestingly all of the methylated GATC motifs have decreased current, while the methylated CCAGG motif has increased current from the unmodified version. When we compared the nanopore basecalling models for events aligned to a reference, there are 4 distinct observation types per reference base that must be considered: the strand direction relative to the reference (forward / reverse) and two directions of the 2D read (template and complement). Figure 2 shows the difference of the mean z-score for the two samples for each of the 4 observations types over the pUC19 reference genome and there is a strong correlation with each of the methylation sites (depicted with a colored vertical line). Additionally Figure 3 shows the average difference in z- scores between the modified and control samples over a rolling window of all the 5-mer events that cover a base of pUC19, and only the methylated motifs are present over 0.4. Lastly we took the E_coli_R73 dataset published by Quick et al, and found 11 5-mers with a mean z-score > 0.45 and observe that the 4 with the deviations are NGATC, which is not surprising given that the sequenced E. coli would have the dam motif (GmeATC) modified across its genome. The remaining 7 suggest a systemic bias in the basecalling model that has room for improvement. In their earlier E_coli_R7_NONI dataset with previous chemistries and software, at the same >0.45 z-score level, 88 5- mers indicated systemic bias in the basecalling model (not shown due to space), so this technology is rapidly improving.


Download ppt "The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported under Contract No. DE-AC02-05CH11231."

Similar presentations


Ads by Google