Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transcriptomics II De novo assembly

Similar presentations


Presentation on theme: "Transcriptomics II De novo assembly"— Presentation transcript:

1 Transcriptomics II De novo assembly

2 Sequencing Read Processing
Trimmomatic ILLUMINA CLIP – removes specified adapters SLIDINGWINDOW – removes regions falling below quality threshold LEADING/TRAILING – removes if below quality threshold MINLEN – drops reads below length TOPHRED33/TOPHRED64 – converts quality scores java -jar Trimmomatic-0.32/trimmomatic-0.32.jar SE/PE -threads 16 -phred33 -trimlog SP**trimlog SP**.fq SP**trimmed.fq ILLUMINACLIP:/TruSeq2-SE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:75

3 De novo assembly De novo assembly using Trinity software
Memory and time intensive 1GB RAM per 1M sequences 1 hour per 1M sequences (more processors!) Consider in silico normalization perl trinityrnaseq-2.0.3/Trinity --max_memory 240G --CPU 20 --left All_1_trimmed.fq --right All_2_trimmed.fq --SS_lib_type FR --seqType fq normalize_reads --min_contig_length full_cleanup --output Trinity_Pb_Normalized &> Trinity_Pb_Normalized.log

4 Why Trinity? -Paired reads -Isoform differentiation -Full length transcripts

5 Assembly Results Total trinity 'genes': 421044
Total trinity transcripts: Percent GC: 44.20 Contig N50: 2160 Median contig length: 444 Average contig: Too many assembled transcripts Use CD-HIT/RSEM to prune

6 CD-HIT-EST and RSEM RSEM – RNA-Seq by Expectation Maximization: Estimates gene and transcript level abundance Prune reads with FPKM (1 per billion) perl trinityrnaseq_r /util/filter_fasta_by_rsem_values.pl --rsem_output RSEM.isoforms.results --fasta Trinity.fasta --output FPKM_0.001.fasta --fpkm_cutoff 0.001 Total trinity genes = , Total trinity transcripts = CD-HIT: Combines sequences based upon similarity 100% Identity cd-hit-v /cd-hit-est -i TB_Manuscript2\ FINAL\ DATA/Trinity.fasta -c 1.0 -n 8 -o cd-hit-v /Trinity_CDHIT100 -T 20 -M Total trinity genes = , Total trinity transcripts =

7 Annotation BLAST– Sequence homology B2G4PIPE/BLAST2GO – GO term
use computing cluster, run array job if parallel implementations not available B2G4PIPE/BLAST2GO – GO term Command line version/graphical version KEGG – Pathway analysis Available as stand alone or within BLAST2GO

8

9 Time to relax….not quite.

10 Map - Bowtie v. Sailfish Bowtie – little upfront investment, but need to map millions/billions of reads Sailfish – large upfront investment in K-mer library, but no need to map billions of reads.

11 Quantify Run RSEM on each individual sample
Use Trinity Pipeline to combine samples into a single expression table Gene level Transcript level Use edgeR (Empirical analysis of digital gene expression data in R) within Trinity Pipeline

12 Trinity DGE Pipeline/Post Analysis
Data Generated A sample to sample DGE comparison Consensus DGE comparison Visualizations Possible Volcano Plots Heatmaps Cluster Dendrograms Utilize Data Use clusters and up- and down- regulated subgroups to identify genes, GO terms and pathways that experience changes in regulation.

13

14

15

16

17


Download ppt "Transcriptomics II De novo assembly"

Similar presentations


Ads by Google