Download presentation
Presentation is loading. Please wait.
Published byClaire Watson Modified over 9 years ago
1
Sequencing Data Quality Saulo Aflitos
2
Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion Assembly - Concepts
3
Scaffold (≈ 2Mbp) Paired-End Mate-Pair LowComplexityRegion Pseudo Molecule (Super Scaffold) Scaffolding
4
Assembly
5
Repeats?! Scaffolding
6
Goldberg SMD et al. 2006 1x 3x2x 3x 1x Consensus Reads Contig Depth of Coverage Reality
7
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA NAAACGTACGTAAAANAAACGTACGTAAAA A/C A C 95% ±550% ±10 Heterozygozity
8
50.37 265.89 48.65 41.61 57.60 Raw Filtered Consequences of Data Cleaning
9
Sequencing Shotgun RNAseq
10
Sequencing Paired End Mate Pair
11
Shred Size Selection Adapter Sequencing Genome Ultrasound Physical RE Gel Beads ID Binding to Surface Circularization Illumina 454 PacBio Sample Preparation
12
Shredding
13
Size Selection
14
100bp Insert Size 150bp-2Kbp Illumina PE Read Length Sequencing
15
Insert Size 2K-20Kbp Read Length 500bp 454 MP 150bp Sequencing
16
Data
17
Machine Name Read ID (unique) Encoded Quality 0-40 Chance of being wrong FastQ
18
FastQ Format
19
13 0.05 5% FastQ Statistics
20
Cleaning
21
Sequence duplication Per base N-content Per base GC content Per base sequence quality Per sequence quality Sequence length distribution Per base sequence content Contamination screen fastq screen Per sequence GC content FastQC Quality Checking Tool
22
SolexaQA Cleaning Tool
26
Exercise Create “cleaning” folder – mkdir cleaning; cd cleaning Inside it, run: wget -O saulo.bash http://goo.gl/Tx8g6http://goo.gl/Tx8g6 Run it with: bash saulo.bash This will download FastQC and SolexaQA – FASTQC HELP : http://goo.gl/EE8M7http://goo.gl/EE8M7 – FASTQC TUTORIAL: http://goo.gl/rihyAhttp://goo.gl/rihyA – FASTQC MANUAL : http://goo.gl/9yihChttp://goo.gl/9yihC – SolexaQA Help : http://solexaqa.sourceforge.net/http://solexaqa.sourceforge.net/ Run FastQC:./FastQC/fastqc & File > open [Files of Type = FastQ files]
27
Exercise Verify the two.fq files (you can use less ): – bad_MiSeq_dataset.fq – good_MiSeq_dataset.fq Clean the bad dataset with SolexaQA’s DynamicTrim.pl script: – perl SolexaQA_v.2.1/DynamicTrim.pl ► bad_MiSeq_dataset.fq -h 25 Verify the improvement (or not) by opening – bad_MiSeq_dataset.fq.trimmed
28
?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.