Transcriptome Assembly
Transcriptome Assembly Goals of de novo Transcriptome Assembly Gene discovery and sequence comparison in non-model organisms Generation of a reference sequence for mapping and testing for differential gene expression Contrasts with Genome Assembly Varying levels of coverage (Alternative) splicing and post-transcriptional modifications
Transcriptome Assembly Data Sources Illumina (by far the most common), 454, Ion Torrent Library considerations Strand-specific (dUTP/UDG method) Dual-indexing Paired-end reads and long reads Fragment RNA not cDNA in vitro normalization (not common) Image: Illumina, Inc.
Transcriptome Assembly Assemblers Trinity Widely used High quality Integrated with annotation and DE pipelines SOAPdenovo-Trans Lower CPU and RAM requirements k-mer can be specified Oases (Velvet) Trans-ABySS Newbler (454) MaLTA (Ion Torrent) Image: Broad Institute
Transcriptome Assembly Pre-Assembly Read Processing Trimming and quality filtering Trimmomatic Cutadapt in silico normalization Trinity (built-in) khmer
Transcriptome Assembly Computational Resources Linux RAM intensive (Trinity: 1 GB per 1 M reads) Public Resources XSEDE Greenfield Server at PSC Data Intensive Acadmeic Grid (DIAG) NCGAS Galaxy portal Image: XSEDE
Transcriptome Assembly Annotation First question: Do I need to annotate?
Transcriptome Assembly Annotation Tools TransDecoder Trinotate PASA CPU-Intensive Lots of transcripts, lots of homology searches