Download presentation
Presentation is loading. Please wait.
Published byHugo Eaton Modified over 6 years ago
1
Fig. 1. Sample NGS data in FASTQ format (SRA's srr032209), with parts being shortened and numbered: (1) read identifiers; (2) sequence of bases; (3) ‘+’ followed by optional comments; and (4) Q-scores. From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
2
Fig. 2. Some statistics for each of the three described datasets: distribution of minimal and maximal Q-scores over the population of reads (a and b), and distribution of Q-scores (c) and T-gaps (d). From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
3
Fig. 3. Workflow of our investigation
Fig. 3. Workflow of our investigation. Dashed arrows depict a representation of Q-scores alone. From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
4
Fig. 4. Relative mapping accuracy of the three lossy transformations, measured for the dataset SRR The 100% baseline employs standard Q-scores, resulting in 12.1 (out of a total of 18.8) million reads being mapped successfully by MAQ. The horizontal dotted line shows where relative mapping accuracy is 99%. The two vertical dotted lines indicate the lowest values for |Σ| at which this accuracy is still attainable (|Σ|=19 for Truncating and UniBinning; |Σ|=5 for LogBinning). From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
5
Fig. 5. Compression effectiveness for SRR using various block sizes and different lossless transformations: (a) No transformation; (b) MinShifting; (c) FreqOrdering; (d) GapTranslating. The x-axis represents block size. As reference, the zero-order self-information is indicated as solid black lines. From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
6
Fig. 6. Compression effectiveness achieved with LogBinning for SRR032209, with k and |Σ| varying on the horizontal, and lossless transformations varying on the vertical dimensions. Black lines represent the self-information, which is unaffected by block size. From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
7
Fig. 7. Compression and decompression (including transformation) times averaged over three trials for SRR Lossless transformation GapTranslating and no lossy transformation for (a) k=1 and (b) k=256; and with no lossless transformation and lossy transformation LogBinning at |Σ|=5 for (c) k=1 and (d) k=256. From: Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics. 2011;28(5): doi: /bioinformatics/btr689 Bioinformatics | © The Author Published by Oxford University Press. All rights reserved. For Permissions, please
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.