RNA Sequencing I: De novo RNAseq

Slides:



Advertisements
Similar presentations
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Advertisements

2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Weixi Zhong Mentor: Dr. Andrew Cameron Center for Computational Regulatory Genomics California Institute of Technology.
Homology Based Analysis of the Human/Mouse lncRNome
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Available tomato microarray platforms Tom1: cDNA spotted array; 12K spots; 8K unigenes; Developed at Cornell University. Publicly available. Tom2: Long.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
RNA-seq Analysis in Galaxy
High Throughput Sequencing
mRNA-Seq: methods and applications
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
BACKGROUND Have a gene involved in neurological disease, its function unclear Knockout is lethal, so… Designed a conditional knockout (cKO) mouse where.
Li and Dewey BMC Bioinformatics 2011, 12:323
Tomato genome annotation pipeline in Cyrille2
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
Next Generation DNA Sequencing
Next Generation Sequencing. Overview of RNA-seq experimental procedures. Wang L et al. Briefings in Functional Genomics 2010;9: © The Author.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
 Read quality  Adaptor trimming  Read sequence collapse Preprocessing Genome mapping  Map read to the spruce genome (Pabies1.0- genome.fa) using Patman
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
SMARTAR: small RNA transcriptome analyzer Geuvadis RNA analysis meeting April 16 th 2012 Esther Lizano and Marc Friedländer Xavier Estivill lab Programme.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
No reference available
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
Accessing and visualizing genomics data
What is BLAST? Basic BLAST search What is BLAST?
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
What is BLAST? Basic BLAST search What is BLAST?
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
The Transcriptional Landscape of the Mammalian Genome
Basics of BLAST Basic BLAST Search - What is BLAST?
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Figure 3. Schematic of the parameters to assess junctions in SpliceMap
GEP Annotation Workflow
Gene architecture and sequence annotation
From: TopHat: discovering splice junctions with RNA-Seq
Genome organization and Bioinformatics
Identify D. melanogaster ortholog
RNA sequencing (RNA-Seq) and its application in ovarian cancer
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
The transcript profiles in the three human cell lines based on RNA sequencing (RNA‐seq). The transcript profiles in the three human cell lines based on.
Assembly of Solexa tomato reads
Basic Local Alignment Search Tool
Quantitative analyses using RNA-seq data
Schematic representation of a transcriptomic evaluation approach.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

RNA Sequencing I: De novo RNAseq P. Tang (鄧致剛); RRC. Gan (甘瑞麒) Bioinformatics Center, Chang Gung University.

Why Measure Gene Expression? Unique set of genes are expressed at different growth conditions and at different stages.

Experimental Workflow De novo Transcriptome Analysis Transcriptome Analysis with Regerence cDNA/RNA fragment

Library Preparation vs Sequencing randomness Fragmentation of mRNA/cDNA was performed through the physical or chemical methods during the experiment of transcriptome analysis. If the randomness of fragmentation is poor, reads would more frequently generated from specific regions of the original transcripts and the following analysis will be affected.

De novo Transcriptome Sequencing Assembly is the only option when working with a creature with no genome sequence, alignment of contigs may be to ESTs, cDNAs etc RNAseq reads Filer clean reads Remove reads which containing adaptors Remove reads in which unknown bases are more than 5% Remove low quality reads (more than half of the bases' qualities are less than 5) De novo assembly Contigs Functional Annotation - BLASTx NCBI nr - BLASTx Uuiprot - Protein domain/motif search - Gene Ontology - KEGG - Specific databases

De novo Assembler Velvet Maq SOAP de novo http://www.ebi.ac.uk/~zerbino/velvet/ http://maq.sourceforge.net/ http://soap.genomics.org.cn/

Parameters for Assemble Important Parameters: Percentage of Overlap - 100%, 80%, 50%, 20%? 2. Percentage of allowed mismatches - 10% or 20%?

Assembled/Aligned Reads Contig/Gene Total reads in a contig/gene (mapped reads) Forward reads Reverse reads Non-specific reads Non-perfect reads Unique reads (Total reads – non specific reads)

Gene Expression Annotation Gene coverage Gene coverage is the percentage of a gene been covered by reads. This value equals to ratio of the number of bases in a gene covered by unique mapping reads to number of total bases in that gene Gene expression levels The calculation of Unigene expression uses RPKM method (Reads Per kb per Million reads) The RPKM method is able to eliminate the influence of different gene length and sequencing discrepancy on the calculation of gene expression. Therefore, the calculated gene expression can be directly used for comparing the difference of gene expression among samples C = number of reads that uniquely aligned to gene A, N = total number of reads that uniquely aligned to all genes, L = number of bases on gene A.

Sense vs Anti-sense Transcripts Human Mouse

BLAST E-vale Score % Identity % Length

Stand-alone BLAST http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

UniProt UniProtKB UniRef 100 UniRef 90 UniRef 50

Gene Ontology

KEGG

Transcriptome Sequencing with Reference To be continue