Transcriptomics II De novo assembly

Transcriptomics II De novo assembly

Sequencing Read Processing
Trimmomatic ILLUMINA CLIP – removes specified adapters SLIDINGWINDOW – removes regions falling below quality threshold LEADING/TRAILING – removes if below quality threshold MINLEN – drops reads below length TOPHRED33/TOPHRED64 – converts quality scores java -jar Trimmomatic-0.32/trimmomatic-0.32.jar SE/PE -threads 16 -phred33 -trimlog SP**trimlog SP**.fq SP**trimmed.fq ILLUMINACLIP:/TruSeq2-SE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:75

De novo assembly De novo assembly using Trinity software
Memory and time intensive 1GB RAM per 1M sequences 1 hour per 1M sequences (more processors!) Consider in silico normalization perl trinityrnaseq-2.0.3/Trinity --max_memory 240G --CPU 20 --left All_1_trimmed.fq --right All_2_trimmed.fq --SS_lib_type FR --seqType fq normalize_reads --min_contig_length full_cleanup --output Trinity_Pb_Normalized &> Trinity_Pb_Normalized.log

Why Trinity? -Paired reads -Isoform differentiation -Full length transcripts

Assembly Results Total trinity 'genes': 421044
Total trinity transcripts: Percent GC: 44.20 Contig N50: 2160 Median contig length: 444 Average contig: Too many assembled transcripts Use CD-HIT/RSEM to prune

CD-HIT-EST and RSEM RSEM – RNA-Seq by Expectation Maximization: Estimates gene and transcript level abundance Prune reads with FPKM (1 per billion) perl trinityrnaseq_r /util/filter_fasta_by_rsem_values.pl --rsem_output RSEM.isoforms.results --fasta Trinity.fasta --output FPKM_0.001.fasta --fpkm_cutoff 0.001 Total trinity genes = , Total trinity transcripts = CD-HIT: Combines sequences based upon similarity 100% Identity cd-hit-v /cd-hit-est -i TB_Manuscript2\ FINAL\ DATA/Trinity.fasta -c 1.0 -n 8 -o cd-hit-v /Trinity_CDHIT100 -T 20 -M Total trinity genes = , Total trinity transcripts =

Annotation BLAST– Sequence homology B2G4PIPE/BLAST2GO – GO term
use computing cluster, run array job if parallel implementations not available B2G4PIPE/BLAST2GO – GO term Command line version/graphical version KEGG – Pathway analysis Available as stand alone or within BLAST2GO

Time to relax….not quite.

Map - Bowtie v. Sailfish Bowtie – little upfront investment, but need to map millions/billions of reads Sailfish – large upfront investment in K-mer library, but no need to map billions of reads.

Quantify Run RSEM on each individual sample
Use Trinity Pipeline to combine samples into a single expression table Gene level Transcript level Use edgeR (Empirical analysis of digital gene expression data in R) within Trinity Pipeline

Trinity DGE Pipeline/Post Analysis
Data Generated A sample to sample DGE comparison Consensus DGE comparison Visualizations Possible Volcano Plots Heatmaps Cluster Dendrograms Utilize Data Use clusters and up- and down- regulated subgroups to identify genes, GO terms and pathways that experience changes in regulation.

Transcriptomics II De novo assembly

Similar presentations

Presentation on theme: "Transcriptomics II De novo assembly"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transcriptomics II De novo assembly

Similar presentations

Presentation on theme: "Transcriptomics II De novo assembly"— Presentation transcript:

Similar presentations

About project

Feedback