Download presentation
Presentation is loading. Please wait.
Published byHugh Gaines Modified over 9 years ago
1
NGS data format and General Quality Control
2
Data format “Flowchart” Sequencer raw data FastqSAM/BAM
3
Fastq file Used to record raw reads coming off the sequencers Each record contains four lines Parameters were usually set by the sequencer, such as read length
4
Fastq file
5
Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).FASTA Line 2 is the raw sequence letters. The read length is the length of the string. Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. http://en.wikipedia.org/wiki/FASTQ_format
6
General quality control of raw reads Using FASTQC – A tool that implements some general rules – Basic Statistics – Per base sequence quality – Per sequence quality scores – Per base sequence content – Per base GC content – Per sequence GC content – Per base N content – Sequence Length Distribution – Sequence Duplication Levels – Overrepresented sequences – Kmer Content
7
Quality scores
8
Perbase “N” percentage
9
Sample FASTQC reports Good quality : http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_ sequence_short_fastqc/fastqc_report.html http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_ sequence_short_fastqc/fastqc_report.html Bad quality: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_s equence_fastqc/fastqc_report.html http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_s equence_fastqc/fastqc_report.html
10
Data format “Flowchart” SequencerFastqSAM/BAM
11
SAM stands for Sequence Alignment Map BAM is the binary form of SAM Used for mapped/aligned reads Generated by NGS mapper/aligners
12
SAM
13
BAM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.