The Web frame for NGS output

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

BLAST Sequence alignment, E-value & Extreme value distribution.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Sequence alignment, E-value & Extreme value distribution
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
SAGExplore web server tutorial for Module II: Genome Mapping.
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
School B&I TCD Bioinformatics Database homology searching May 2010.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
RNA Sequencing I: De novo RNAseq
Assignment feedback Everyone is doing very well!
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
The Protein Identifier Cross-Reference (PICR) service.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
TrypDB Analysis Workflow Common Analysis T Cruzi Analysis T Brucei Analysis L Braziliensis Analysis L Infantum Analysis L Major Analysis Mercator.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Legend Global = Subgraph call Make Data Dir = Step Load Genomic Sequence & Annotation = Subgraph reference Proteome Analysis = Optional step [Taxon] Pk.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Robert Edgar Independent scientist
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
Computing challenges in working with genomics-scale data
Bacterial infection by lytic virus
Cancer Genomics Core Lab
A Practical Guide to NCBI BLAST
EDNA analyze Wang Ying & Huang Junman.
Transcriptomics II De novo assembly
Basics of BLAST Basic BLAST Search - What is BLAST?
Sequence based searches:
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Functional Annotation of Transcripts
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Chapter 12: Query Processing
Transcriptome Assembly
Sequencing Data Analysis
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
This tutorial is designed to be used in a “follow along” fashion
INFORMATION FLOW AARTHI & NEHA.
BLAST.
Identification and Characterization of pre-miRNA Candidates in the C
Comparative Genomics.
Basic Local Alignment Search Tool
Maximize read usage through mapping strategies
Chapter 12 Query Processing (1)
Basic Local Alignment Search Tool (BLAST)
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Additional file 2: RNA-Seq data analysis pipeline
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
TF candidate selection pipeline.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Sequencing Data Analysis
Presentation transcript:

The Web frame for NGS output

NGS sequencing Tertiary Analysis Secondary Analysis Primary Analysis Base calling/ Sequence trimming Secondary Analysis Assembly or Ref mapping Tertiary Analysis Calculate Mapping data/ expression profile Functional inference

Tentative Procedure for RNA –Seq Analysis No-model Organism Tentative Procedure for RNA –Seq Analysis QC Discard the low –confident sequences for 3 groups (three time points) Program: SolexaQA (http://solexaqa.sourceforge.net/) Assembly Merge all reads from 3 Groups for assembly to form Contigs Program: Trinity (http://trinityrnaseq.sourceforge.net/), 100GB RAM requested Mapping Map pair-end reads from each group on Contigs/ Annotate Contigs Program: LAST (http://last.cbrc.jp/), BLASTx, InterproScan Expression Estimate the expression value for each contig in each group (FPKM) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/) Functional inference Functional enrichment analysis in GO and KEGG Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO

Tentative Procedure for RNA –Seq Analysis No-model Organism for Eel transcriptomics Tentative Procedure for RNA –Seq Analysis QC Discard the low –confident sequences generated from each library in Hi-seq 200, RNA-seq data, Pairend Program: SolexaQA (http://solexaqa.sourceforge.net/) Assembly Merge all reads from various libraries for assembly to form Contigs Program: Trinity (http://trinityrnaseq.sourceforge.net/), 100GB RAM requested Mapping Map pair-end reads from each group on Contigs/ Annotate Contigs Program: LAST (http://last.cbrc.jp/), BLASTx, InterproScan Expression Profiling Estimate the expression value for each contig in each group (FPKM) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/) Functional inference Functional enrichment analysis in GO and KEGG Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO

Tentative Procedure for RNA –Seq Analysis No-model Organism Tentative Procedure for RNA –Seq Analysis QC 去除品質較差的定序結果 Program: SolexaQA (http://solexaqa.sourceforge.net/), SeqTrim Assembly 由短序列基因定序結果中,組合出可能的基因表現模組(Merge all reads from 3 Groups for assembly to form Contigs) Program: Trinity, MIRA, Valvet, etc, multiple CPUs with over 100GB RAM requested Mapping 以組合出來的長序列基因片段為主體,將短序列歸位到基因主體上(Map pair-end reads from each group on Contigs) Program: Bowtie, LAST (http://last.cbrc.jp/) Expression 計算與統計不同樣品間同一段基因表現的概況,鑑別出有差異表現基因群(Estimate the expression value for each contig in each group (FPKM)) Program: CummeRbund, an R/Bioconductor package (http://cufflinks.cbcb.umd.edu/), rseqC (http://code.google.com/p/rseqc/) Functional inference 將找出的基因群進行功能性分析,找出在不同時間與組織下,與再生機制相關之調控途徑(Functional enrichment analysis in GO and KEGG) Program: Due to no-model organism, we may have to create the mapping identifier in KEGG and GO Validation 以Q-PCR來確認與再生相關之基因群表現概況 設計新的實驗來促進或是干擾再生機制,再透過NGS來找出更為精細的調控細節

QC by Graphs in SelexaQA

Annotations for each Contig Contig in FASTA (N.A) Translated sequence (AA) in longest ORF Then perform Sequence Search (BLASTp) on NR, KEGG, GO, pFam (Interpro)

Database Structure PK = Contig ID BLASTx pFAM KEGG GO FPKM PK = Contig ID Ref: http://sysbio.iis.sinica.edu.tw/page

Query 1: text-based approach Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Immun Detail for each contig

Query 2 by Sequences BLASTn/ megablast/ tBLASTx Library Compare Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Worm Contigs Reference code : http://sysbio.iis.sinica.edu.tw/page/blast.php

Blast Result Detail for each contig

Detail for Each Contig Interpro/ pFAM

Query 3: Library Comparison Full –text search on Annotation tables Sequence Search/ BLAST Library Compare Dynamic comparison like DDD Pool A Submit Pool B P-value

Table for BLASTX output (DB: NR) Matched length/Query length Query_ID Hit ID Hit_annotation Hit_organism Query coverage E-value Contig 1 BAD74118.1 elongation factor-1 alpha (EF-1alpha) Pelodiscus sinensis 97% 0.0 Contig 2

Table For KEGG Tables For pFam & GO As the output from each program #seq_id hit_seq alignment_length identity (%) e_value KO_ID Definition pathway Note comp3_c0_seq1 xla:386604 449 0.84 K03231 elongation factor 1-alpha ko03013   RNA transport ko05134   Legionellosis Tables For pFam & GO As the output from each program Primary Key

The Result in one sheet Contig 1 PF00009/GTP_EFTU PF00010/ xxxxxxxx Annotation from BLASTx Results of Pfamscan GO KEGG_KO KEGG Pathway FPKM _cond1 FPKM _cond2 FPKM _cond3 Contig 1 BAD74118.1/ elongation factor-1 alpha (EF-1alpha) [Pelodiscus sinensis] PF00009/GTP_EFTU PF00010/ xxxxxxxx GO:0003924 GTPase activity  GO:0005525 GTP binding  K03231/galactose oxidase  ko00052 Galactose metabolism 190 200 3 Contig 2 - PF00067.17/ p450 378 22 1000 Contig 3 CCCC PPPP 333 45 31

Library Compare 0 hr 48 hrs 24 hrs

The Way of Redundancy Reduction Input 700Million reads 500,000 genes 48,000 Genes Refinement Final Set 1st Trinity Run Abundance Sorting Mapping by BOWTIE2 (LAST?), pick longest one as reduced set