The Web frame for NGS output
NGS sequencing Tertiary Analysis Secondary Analysis Primary Analysis Base calling/ Sequence trimming Secondary Analysis Assembly or Ref mapping Tertiary Analysis Calculate Mapping data/ expression profile Functional inference
Tentative Procedure for RNA –Seq Analysis No-model Organism Tentative Procedure for RNA –Seq Analysis QC Discard the low –confident sequences for 3 groups (three time points) Program: SolexaQA (http://solexaqa.sourceforge.net/) Assembly Merge all reads from 3 Groups for assembly to form Contigs Program: Trinity (http://trinityrnaseq.sourceforge.net/), 100GB RAM requested Mapping Map pair-end reads from each group on Contigs/ Annotate Contigs Program: LAST (http://last.cbrc.jp/), BLASTx, InterproScan Expression Estimate the expression value for each contig in each group (FPKM) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/) Functional inference Functional enrichment analysis in GO and KEGG Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO
Tentative Procedure for RNA –Seq Analysis No-model Organism for Eel transcriptomics Tentative Procedure for RNA –Seq Analysis QC Discard the low –confident sequences generated from each library in Hi-seq 200, RNA-seq data, Pairend Program: SolexaQA (http://solexaqa.sourceforge.net/) Assembly Merge all reads from various libraries for assembly to form Contigs Program: Trinity (http://trinityrnaseq.sourceforge.net/), 100GB RAM requested Mapping Map pair-end reads from each group on Contigs/ Annotate Contigs Program: LAST (http://last.cbrc.jp/), BLASTx, InterproScan Expression Profiling Estimate the expression value for each contig in each group (FPKM) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/) Functional inference Functional enrichment analysis in GO and KEGG Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO
Tentative Procedure for RNA –Seq Analysis No-model Organism Tentative Procedure for RNA –Seq Analysis QC 去除品質較差的定序結果 Program: SolexaQA (http://solexaqa.sourceforge.net/), SeqTrim Assembly 由短序列基因定序結果中,組合出可能的基因表現模組(Merge all reads from 3 Groups for assembly to form Contigs) Program: Trinity, MIRA, Valvet, etc, multiple CPUs with over 100GB RAM requested Mapping 以組合出來的長序列基因片段為主體,將短序列歸位到基因主體上(Map pair-end reads from each group on Contigs) Program: Bowtie, LAST (http://last.cbrc.jp/) Expression 計算與統計不同樣品間同一段基因表現的概況,鑑別出有差異表現基因群(Estimate the expression value for each contig in each group (FPKM)) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/), rseqC (http://code.google.com/p/rseqc/) Functional inference 將找出的基因群進行功能性分析,找出在不同時間與組織下,與再生機制相關之調控途徑(Functional enrichment analysis in GO and KEGG) Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO Validation 以Q-PCR來確認與再生相關之基因群表現概況 設計新的實驗來促進或是干擾再生機制,再透過NGS來找出更為精細的調控細節
QC by Graphs in SelexaQA
Annotations for each Contig Contig in FASTA (N.A) Translated sequence (AA) in longest ORF Then perform Sequence Search (BLASTp) on NR, KEGG, GO, pFam (Interpro)
Database Structure PK = Contig ID BLASTx pFAM KEGG GO FPKM PK = Contig ID Ref: http://sysbio.iis.sinica.edu.tw/page
Query 1: text-based approach Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Immun Detail for each contig
Query 2 by Sequences BLASTn/ megablast/ tBLASTx Library Compare Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Worm Contigs Reference code : http://sysbio.iis.sinica.edu.tw/page/blast.php
Blast Result Detail for each contig
Detail for Each Contig Interpro/ pFAM
Query 3: Library Comparison Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Dynamic comparison like DDD Pool A Submit Pool B P-value
Table for BLASTX output (DB: NR) Matched length/Query length Query_ID Hit ID Hit_annotation Hit_organism Query coverage E-value Contig 1 BAD74118.1 elongation factor-1 alpha (EF-1alpha) Pelodiscus sinensis 97% 0.0 Contig 2
Table For KEGG Tables For pFam & GO As the output from each program #seq_id hit_seq alignment_length identity (%) e_value KO_ID Definition pathway Note comp3_c0_seq1 xla:386604 449 0.84 K03231 elongation factor 1-alpha ko03013 RNA transport ko05134 Legionellosis Tables For pFam & GO As the output from each program Primary Key
The Result in one sheet Contig 1 PF00009/GTP_EFTU PF00010/ xxxxxxxx Annotation from BLASTx Results of Pfamscan GO KEGG_KO KEGG Pathway FPKM _cond1 FPKM _cond2 FPKM _cond3 Contig 1 BAD74118.1/ elongation factor-1 alpha (EF-1alpha) [Pelodiscus sinensis] PF00009/GTP_EFTU PF00010/ xxxxxxxx GO:0003924 GTPase activity GO:0005525 GTP binding K03231/galactose oxidase ko00052 Galactose metabolism 190 200 3 Contig 2 - PF00067.17/ p450 378 22 1000 Contig 3 CCCC PPPP 333 45 31
Library Compare 0 hr 48 hrs 24 hrs
The Way of Redundancy Reduction Input 700Million reads 500,000 genes 48,000 Genes Refinement Final Set 1st Trinity Run Abundance Sorting Mapping by BOWTIE2 (LAST?), pick longest one as reduced set