Why weight? Variance modelling for designed RNA-seq experiments

Slides:

Advertisements

Similar presentations

RNA-Seq as a Discovery Tool

Advertisements

RNA-Seq based discovery and reconstruction of unannotated transcripts

Peter Tsai Bioinformatics Institute, University of Auckland

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.

RNA-Seq and RNA Structure Prediction

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.

Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.

RNAseq analyses -- methods

Verna Vu & Timothy Abreo

Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.

Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.

Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.

Techniques for Analysing Microarrays Which genes are involved in ovarian and prostate cancer?

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

No reference available

Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.

Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.

DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Risheng Chen et al BMC Genomics

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Differential Methylation Analysis

Statistics Behind Differential Gene Expression

High-throughput data used in bioinformatics

An Introduction to RNA-Seq Data and Differential Expression Tools in R

Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017

apeglm: Shrinkage Estimators for Differential Expression of RNA-Seq

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,

Optimizing Biological Data Integration

Detect alternative splicing

Differential Gene Expression

Gene expression.

Statistical Data Analysis

Differential Expression from RNA-seq

Gene expression estimation from RNA-Seq data

Gene Regulation Ability of an organisms to control which genes are present in response to the environment.

Comparative Analysis of Single-Cell RNA Sequencing Methods

Computational Tools for Stem Cell Biology

A Short Tutorial on Causal Network Modeling and Discovery

Subspace Clustering for Microarray Data Analysis:

EXTENDING GENE ANNOTATION WITH GENE EXPRESSION

Pan Du, Simon Lin Robert H. Lurie Comprehensive Cancer Center

RNA sequencing (RNA-Seq) and its application in ovarian cancer

Statistical Data Analysis

Continues Probability Distributions and Estimation

Assessing changes in data – Part 2, Differential Expression with DESeq2

Working with RNA-Seq Data

MGMR progress report, 24/08/11

Novel p53 target genes identified by RNA-Seq, pSILAC and ChIP-Seq.

Psych 231: Research Methods in Psychology

Volume 7, Issue 2, Pages (August 2010)

Volume 10, Issue 2, Pages (August 2011)

Quantitative analyses using RNA-seq data

Sequence Analysis - RNA-Seq 2

Schematic representation of a transcriptomic evaluation approach.

Genome resolved metagenomics

Computational Tools for Stem Cell Biology

(A) Western blot probing nuclear extract from wild-type (wt) and the newly generated ACF1 mutant (AcfC) embryos (0–16 h). (A) Western blot probing nuclear.

Fig. 2 E2F1 affects alternative splicing of E2F target genes.

Differential Expression of RNA-Seq Data

The Technology and Biology of Single-Cell RNA Sequencing

Integrated analysis of gene expression and copy number alterations.

Presentation transcript:

Why weight? Variance modelling for designed RNA-seq experiments Abstract: Outlier samples are relatively common in RNA-seq experiments and the root cause of such variation is generally unknown. In small experiments, the analyst is left with the difficult decision of what to do: removing the offending sample may reduce variation, but at a cost of reducing power, which can limit our ability to detect biologically meaningful changes. A compromise is to use all of the available data, but to down-weight the observations from the outlier sample in the analysis. In this poster we describe a statistical approach that allows this by modelling heterogeneity at both the sample and observational level in the differential expression analysis. Using both simulations and real data, we tease apart scenarios where this strategy leads to a more powerful analysis. Our approach is implemented in the open-source limma package available from Bioconductor (http://www.bioconductor.org). Matthew Ritchie Walter + Eliza Hall Institute ABiC, 11th October 2014

RNA-seq in genomics research Use of high-throughput sequencing (‘next-’ or ‘second’-generation) technologies to study gene expression Many applications: differential expression, transcript discovery, alternative splicing, allele-specific expression Focus today is on adapting methods for differential expression analysis to work well on messy data sets RNA-Sequencing is becoming increasingly popular for the study of differentially gene expression. The RNA-Seq technology sequences short reads and aligns them back to the genome. The number of reads which map to a particular gene, exon, or some other feature, is recorded giving data in the form of counts. Chip-seq data also comes in the form of counts, and although I’ll be talking about RNA-Seq data, the methods mentioned may also apply to Chip-Seq data. Table of counts Gene ID A1 A2 B1 B2 ENSG00000124208 478 619 4830 7165 ENSG00000182463 27 20 48 55 ENSG00000125835 132 200 560 408 ENSG00000125834 42 60 131 99 ENSG00000197818 21 29 52 44 ENSG00000125831 ENSG00000215443 4 9 7 ENSG00000222008 30 23 ENSG00000101444 46 63 54 53 ENSG00000101333 2256 2793 2702 2976 … … tens of thousands more …

A tale of two experiments…

A tale of two experiments… outlier

A tale of two experiments… outlier outlier

A tale of two experiments… outlier Remove samples 16% expt 1 30% data in expt 2 outlier? outlier

Play the weighting game High precision Higher weight Low precision Lower weight More variable observations given lower weight in differential expression analysis

Voom: Mean-Variance trend Observational level weights – deal with trend in variability observed as abundance changes. Available in the limma package from Bioconductor Law et al. Genome Biology (2014) Sqrt( standard deviation) Lowess line mean (log2 cpm)

Sample-specific weights Assume: RNA-seq experiment has some replication How well does each sample agree with the others ? 1. For each gene: Average(X) Deviation = X – Average(X) 2. Average the Deviations for each sample 3. Penalise samples for disagreeing with others Ritchie et al. BMC Bioinformatics (2006)

Modified algorithm to allow block/group structure e.g. weights estimated separately for single low precision sample and remaining samples that are assigned higher (equal) weights

Simulating data with different fold-changes

Simulating data with outlier samples Key: Group 1: 1 2 3 Group 2: 4 5 6 Outlier: 6

Weighted analyses lead to fewer false discoveries Take the top 200 genes from each simulation and tally up the number of Key: 1. Voom weights 2. No weights 3. Samples weights 4. Block weights 5. Remove outlier

Weights improve power to detect differential expression Key: 1. Voom weights 2. No weights 3. Samples weights 4. Block weights 5. Remove outlier

Better rankings for known genes regulated by Smchd1 Protocadherins Voom Sample Weights Block Remove Outlier Experiment 1 0.00135 0.00021 0.00008 0.000175 Experiment 2 0.0581 0.00614 0.0235 0.0707 Structural Maintenance of Chromosomes Hinge Domain containing 1 (Smchd1) Has a role in X inactivation and in regulating monoallelic gene expression. Genomic imprinting and regulation of the clustered protocadherins Gene set testing using ROAST Wu et al. Bioinformatics (2010) Mould et al. Epigenetics & Chromatin (2013) Gendrel et al. MCB (2013)

Better rankings for known genes regulated by Smchd1 Imprinted genes Voom Sample Weights Block Remove Outlier Experiment 1 0.0231 0.000105 0.000115 0.00067 Experiment 2 0.0826 0.00817 0.0232 0.0355 Gene set testing using ROAST Wu et al. Bioinformatics (2010) Mould et al. Epigenetics & Chromatin (2013) Gendrel et al. MCB (2013)

Summary: why weight? Simulations show that combining voom with sample-level weights gives better results in terms of lowest numbers of false positives and improved power Better results also obtained on real RNA-seq data We allow flexibility in how the sample weights are assigned – sample-specific by default. Modelling of block/group-specific structure also possible voomWithQualityWeights() The limma package can incorporate weights at each stage of the analysis (differential expression and gene set testing) to deliver a more powerful analysis

Other Applications Human tumor samples Single cell RNA-seq

Acknowledgments Aliaksei Holik Marie-Liesse Asselin-Labat Stephen Wilcox Shalin Naik Jessica Tran Ben Kile Catherine Carmichael QIMR-Berghofer Graham Kay Funding Cynthia Liu Shian Su Andy Chen Gordon Smyth Toby Sargeant Natasha Jansz Kelan Chen Darcy Moore Jamie Gearing Huei San Leong Marnie Blewitt

A data hack for medical research problems. The weekend brings together software developers, user experience designers, data analysts and visualisers working directly with researchers to create better analysis tools. http://www.healthhack.com.au #healthhack http://au.okfn.org