What should a bioinformatician know about DNA sequencing, and why?
Update this table: remove SOLiD, add Life Technologies Ion Proton (PGM), Illumina MiSeq Update all with latest info on read length
What are the error types and rates of the different platforms?
Quality scores Phred Q = -10 log 10 (e) Quality scoreProb wrong base callAccuracy of base call 101/1090% 201/10099% 301/ % 401/10, % 501/100, %
Wikipedia.org
FASTQ format 4 lines, sequence + quality (+optional description) GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + optional repeat of line 1, often left as just the + character to save space !''*((((***+))%%++)(%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 But beware! At least 3 different FASTQ file standards, indistinguishable in format, but incompatible with each other Wikipedia.org
FASTQ variants NameASCII range, offsetQ score typeQ score range Sanger standard; fastq-sanger , 33PHRED0 to 93 (raw 0-40) Solexa/Illumina <1.3 fastq-solexa , 64Solexa-5 to 62 (raw -5-40) Illumina 1.3+ fastq-illumina , 64PHRED0 to 62 (raw 0-40) Illumina , 64PHRED3 to 62 (raw 3-40) Illumina , 33PHRED0 to 93 (raw 0-41)
What use is the quality score?
What factors should be considered in the choice of a DNA sequencing platform?