Download presentation
Presentation is loading. Please wait.
Published byKathlyn Loren Rose Modified over 8 years ago
1
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日
2
Motivation Why transcriptome sequencing (RNA-seq)? Gene expression / differential expression Reconstruct transcripts Exon-exon-junction detection (genome annotation) Alternative splicing/isoforms SNP detection...
3
RNA-seq with Illumina Wang L, Brutnell T et al. (2010) Brief Funct Genomic Proteomic 9:118-28.
4
Constructing transcripts from RNA-Seq data Haas & Zody, 2010 http://www.nature.com/nbt/journal/v28/n5/fig_tab/nbt0510-421_F1.htmlhttp://www.nature.com/nbt/journal/v28/n5/fig_tab/nbt0510-421_F1.html
5
Why de novo assembly? No reference genome available for species Genomic sequence: Incomplete (even reference genomes!) Fragmented Altered
6
Run FASTQC first! Quality trimming Based on quality scores
7
Base quality
8
Data Quality Assessment Good Quality scores across reads Bad Filtering needed
9
Data Quality Assessment - FastQC GoodGC Distribution Bad
10
Adapter contamination
11
Data Quality Assessment Recommendations - Generate quality plots for all read libraries - Trim and/or filter data if needed Always trim and filter for de novo transcriptome assembly - Regenerate quality plots after trimming and filtering to determine effectiveness
12
TRIMMOMATIC example This will perform the following: Remove adapters Remove leading low quality or N bases (below quality 3) Remove trailing low quality or N bases (below quality 3) Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 Drop reads below the 36 bases long Read and write files in gzipped format $ java -cp trimmomatic-0.15.jar org.usadellab.trimmomatic.TrimmomaticPE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
13
Workflow
14
Assembly strategy - k-mer construction Create all substrings of length k from the reads read
15
Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
16
Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
17
Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
18
Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
19
Assembly strategy - k-mer construction Generate de Bruijn graph read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
20
Assembly strategy - k-mer construction Generate de Bruijn graph read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
21
Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
22
Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
23
Assembly strategy - k-mer construction Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
24
Assembly strategy - k-mer construction Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
25
Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
26
Assembly strategy - splice isoforms Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12, 671-682.
27
Software An (incomplete) selection Trinity (single k-mer) Broad Institute and Hebrew University of Jerusalem Trans-ABySS (multiple k-mers) Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency Velvet-Oases (single & multiple k-mers) EMBL-EBI / MPI for Molecular Genomics SOAPdenovo-Trans Beijing Genomics Institute ( 华大 ) CLC pipeline (not free) CLCbio
28
Example assembly Zhao et al. (2011). BMC Bioinformatics 12 Suppl 1, S2.
29
Runtime and memory usage Zhao et al. (2011). BMC Bioinformatics 12 Suppl 1, S2. 106.8M read pairs, on 20 CPUs
30
Assemblers De novo assemblers are prone to miss lowly expressed transcripts Multi k-mer approaches can improve assembly results Pool RNA-seq reads from different samples Assembler overview AssemblerRunning timeMemory requirements Trinity+++ Velvet-Oases○++ Trans-ABySS○- SOAPdenovo-○
31
Trinity
32
Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis
33
Trinity Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis
34
Trinity Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis
35
Assembly QC - continuity Average length, min and max length, combined total length (N%) N50 captures how much of the assembly is covered by relatively large contigs “Half of all the sequences I’ve assembled are in contigs larger than {N50} bp …”
36
Assembly QC Ask these questions: Accuracy – How many of the assembled contigs map to genome? Accuracy – What are the contigs that do not align? (BLAST to nr) Completeness – How many previously annotated genes covered by the contigs? How many full-length? Contiguity – Does a single contig cover each gene? Compare results from multiple programs Martin et al., 2010
37
练习 Assemble rice RNA-seq data Compare two different assemblers Compare to standard rice gene models (MSU7.0)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.