Differential Expression of RNA-Seq Data

Slides:

Advertisements

Similar presentations

SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,

Advertisements

Processing of miRNA samples and primary data analysis

DEG Mi-kyoung Seo.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.

Statistical Analysis of Microarray Data

GCB/CIS 535 Microarray Topics John Tobias November 8th, 2004.

Timed. Transects Statistics indicate that overall species Richness varies only as a function of method and that there is no difference between sites.

RNA-seq Analysis in Galaxy

Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.

Wfleabase.org/docs/tileMEseq0905.pdf Notes and statistics on base level expression May 2009Don Gilbert Biology Dept., Indiana University

Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.

RNAseq analyses -- methods

Transcriptome Analysis

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences Introduction to analysis of microarray data David Edwards.

Statistical analysis of expression data: Normalization, differential expression and multiple testing Jelle Goeman.

1 Identifying differentially expressed genes from RNA-seq data Many recent algorithms for calling differentially expressed genes: edgeR: Empirical analysis.

Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.

Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.

Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

No reference available

Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College

Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.

Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.

A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.

Canadian Bioinformatics Workshops

Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).

Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Bioinformatics core facility, OUS/UiO

Statistics Behind Differential Gene Expression

Simon v RNA-Seq Analysis Simon v

RNA Quantitation from RNAseq Data

Moderní metody analýzy genomu

Biases and their Effect on Biological Interpretation

apeglm: Shrinkage Estimators for Differential Expression of RNA-Seq

Next – generation Transcriptome Analysis Workshop

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

Tutorial 6 : RNA - Sequencing Analysis and GO enrichment

The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,

Differential Gene Expression

Understanding and Validating Experimental Expectations

Additional file 8: Estimation of biological variations

Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.

Day 4 Session 22: Questions and follow-up…. James C. Fleet, PhD

Significance Analysis of Microarrays (SAM)

Analysing ChIP-Seq Data

Artefacts and Biases in Gene Set Analysis

Comparative Analysis of Single-Cell RNA Sequencing Methods

Significance Analysis of Microarrays (SAM)

Assessing changes in data – Part 2, Differential Expression with DESeq2

Artefacts and Biases in Gene Set Analysis

Working with RNA-Seq Data

The DHX15–CMTR1 interaction inhibits translation of a subset of genes.

How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.

Volume 7, Issue 3, Pages e12 (September 2018)

NMDS clustering of sample types and differential expression analysis.

Global Hypertranscription in the Mouse Embryonic Germline

Additional file 2: RNA-Seq data analysis pipeline

Volume 10, Issue 2, Pages (August 2011)

Quantitative analyses using RNA-seq data

Sequence Analysis - RNA-Seq 2

ACF1 loss perturbs gene expression in early embryos.

Statistics for genomics

Additional file 4: Comparison of RNA-Seq and RT-qPCR expression analyses of candidate genes between RIL165 and RIL387.

Gene expression profiles of T cells.

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

Differential Expression of RNA-Seq Data UND Genomics Core

Differential gene expression in R DESeq2 https://bioconductor.org/packages/release/bioc/html/DESeq2.html edgeR https://bioconductor.org/packages/release/bioc/html/edgeR.html To get help https://support.bioconductor.org/ Both require count data, from featureCounts, not cufflinks

Steps in Differential Expression Analysis Normalize read counts Calculate Dispersion QC Statistical testing

Normalization Both DESeq2 and edgeR only account for factors that influence read counts between samples Sequencing depth RNA composition RNA composition bias occurs when few transcripts represent a large portion of the reads resulting in other transcripts being underestimated

RNA composition bias Both samples are sequenced at same depth. Sample A has 5 genes expressed at the same level. Sample B has the same genes as sample A plus two highly expressed genes that take up 50% of the reads Genes appear to be down regulated when they’re not

Dispersion Dispersion is a measure of variability Two sources of variability in RNA-Seq experiments: Technical variation – This is the variation that would happen if you were to re-sequence a sample Biological variation- differences between different samples Both DESEq2 and edgeR share information between genes to get better dispersion estimates

Dispersion DESeq2

Dispersion EdgeR Biological CV % variation of read counts around replicates Expected Values: ~ 10% for experiments with using cell-lines or genetically identical organisms ~ 40% for experiments using human samples.

QC - PCA

Batch effects Control Treat

QC -Multidimensional Scaling

DE testing For each gene, test if average gene expression in Condition A is significantly different than the average gene expression in Condition B Gene ID A1 A2 B1 B2 0610005C13Rik 5 4 2 0610007P14Rik 117 119 82 83 0610009L18Rik 39 40 30 22 0610009O20Rik 347 303 164 126

DESeq2 results baseMean: The average of normalized counts across all samples. log2FoldChange: The fold change for specific comparison in log2 scale. lfcSE: The standard error of the log2 Fold Change estimate. stat: The test statistic. Equals log2FoldChange/lfcSE pvalue: P-value for that gene. padj: P-value adjusted for multiple comparisons.

edgeR results table LogFC Fold Change between Condition A and Condition B in log2 scale LogCPM Average log2 Counts per Million across Condition A and B. PValue unadjusted p-value for each gene FDR p-value adjusted for multiple comparisons. Default is 0.05

Why adjust for multiple testing? P-value is the probability that the observed difference is due to chance. For each test, a significance level of 0.05 = 5% error that the test result is significant by chance Testing 30,000 genes, 30000 x 0.05 = 1500 1500 genes could be significantly DE due to chance alone Control these false discoveries with the False Discovery Rate (FDR)

You try! Follow along with an Differential expression analysis using DESeq2 described in the file DESeq2_handout.docx