Single cell RNAseq Kathie Mihindukulasuriya, PhD

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

An Introduction to Studying Expression Data Through RNA-seq
RNA-Seq based discovery and reconstruction of unannotated transcripts
RNAseq.
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Microarray analysis Golan Yona ( original version by David Lin )
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Applied Biosystems 7900HT Fast Real-Time PCR System I. Real-time RT-PCR analysis of siRNA-induced knockdown in mammalian cells (Amit Berson, Mor Hanan.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
mRNA-Seq: methods and applications
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Whole Genome Expression Analysis
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
RNAseq analyses -- methods
Verna Vu & Timothy Abreo
Microarray - Leukemia vs. normal GeneChip System.
The iPlant Collaborative
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Library QA & QC Day 1, Video 3
Canadian Bioinformatics Workshops
DNA Microarray. Microarray Printing 96-well-plate (PCR Products) 384-well print-plate Microarray.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Transcriptomics History and practice.
Setting Up Copy Number Variation Assays
RNA Quantitation from RNAseq Data
Next generation sequencing
An Introduction to RNA-Seq Data and Differential Expression Tools in R
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Microarray - Leukemia vs. normal GeneChip System.
Lab meeting
Gene expression.
Expression of the Genome
Functional Genomics in Evolutionary Research
Kallisto: near-optimal RNA seq quantification tool
Computational Methods for Analysis of Single Cell RNA-Seq Data
Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing)
Differential Expression from RNA-seq
Design and Analysis of Single-Cell Sequencing Experiments
Comparative Analysis of Single-Cell RNA Sequencing Methods
CHAPTER 12 DNA Technology and the Human Genome
Computer Science & Engineering Department University of Connecticut
Transcriptomics History and practice.
Cell Cycle Analysis & Effect on scRNA-Seq Analysis Workflow
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Volume 28, Issue 18, Pages e2 (September 2018)
Single-Cell Transcriptomic Analysis of Tumor Heterogeneity
Sequence Analysis - RNA-Seq 2
Statistics for genomics
Sequence Analysis - RNA-Seq 1
RNA-Seq Data Analysis UND Genomics Core.
Design Issues Lecture Topic 6.
Presentation transcript:

Single cell RNAseq Kathie Mihindukulasuriya, PhD Senior Scientist, Cruchaga Lab Department of Psychiatry Washington University in St. Louis

Plan: Single cell RNA-seq vs bulk RNA-seq Current single cell protocols and platforms Processing single cell RNA-seq data Biology based analysis Current challenges in single cell RNA-seq processing and analysis

Bulk RNAseq vs single cell RNASeq

Technique Advantages Challenges Bulk RNAseq More economical averaged gene expression across thousands of cells (may lose key signals) - Deconvolution may restore these signals (CIBERSORT, xCell) Deeper sequencing Single cell RNAseq Can identify rare cell types Dropout problem Can identify transcriptional differences in different cell types preserving the initial relative abundance of mRNA in a cell technical noise and biological variation Data noisier and more complex than bulk RNAseq technical noise and biological variation make analysis more challenging Hybrid - Use scRNAseq to estimate cell type proportions in bulkRNAseq (BSEQ-sc, Cell Population Mapping (CPM)

What are some types of questions that can be answered by scRNAseq?

scRNA-seq protocols full-length transcript sequencing approaches 3’ end or 5’ end Smart-seq2, Fluidigm C1 (96 cell) 3’: Drop-seq, Seq-Well, Chromium, DroNC-seq, Fluidigm C1 (800 cell) 5’: STRT-seq, Pluses increased number of mappable reads Suitable for: cell-type discovery, assessing tissue composition, allelic gene expression analysis, isoform discovery UMIs (multiplexing of samples, improved gene expression quantification and throughput) Lower cost Minuses cannot be multiplexed via sample pooling into a single tube for library generation (increased cost and labor) No UMIs (no digital quantification of transcripts) Not suitable for alternative splicing (AS) detection, allelic expression exploration and RNA-editing identification Less sensitivity than full-length

scRNA-seq platforms Pros Cons Fluidigm C1 (microfluidics) / Fluidigm C1 mRNA Seq HT Allow visual inspection of captured cells (can exclude empty wells and doublets) Customizable lower false positives than tube-based technologies less bias than tube-based technologies C1 = full-length transcript / HT = 3’ 300–7,000 genes per cell - Only 2 inlets for samples - Low throughput (up to 96 cells/ 800 cells) - >10,000/ 1,000 cells required for capture - Relatively long prep time (2 runs per day) capture efficiency depends on uniformity of cell size and shape High cost of cartridges Cells must be fresh and processed immediately Droplet-based (10X Genomics Chromium) - Very high throughput - Up to 8 samples per run - System cost relatively low 500–1,500 genes per cell Limited customizability (little control over cell input; susceptible to selection biases) Plate methods (SMART-seq2) - Can simultaneously measure gnome DNA and transcriptome - not restricted by cell size, shape, homogeneity, or total numbers (suitable for very rare cell populations) - Economical (uses off the shelf reagents) ~4,000–7,000 genes per cell - No UMIs and barcodes (no gene level quantification or multiplexing of samples)

Fluidigm C1

Fluidigm C1

Fluidigm C1

Fluidigm C1

Fluidigm C1

Fluidigm C1

Fluidigm C1

Fluidigm C1

scRNA-seq platforms Pros Cons Fluidigm C1 (microfluidics) / Fluidigm C1 mRNA Seq HT Allow visual inspection of captured cells (can exclude empty wells and doublets) Customizable lower false positives than tube-based technologies less bias than tube-based technologies C1 = full-length transcript / HT = 3’ 300–7,000 genes per cell - Only 2 inlets for samples - Low throughput (up to 96 cells/ 800 cells) - >10,000/ 1,000 cells required for capture - Relatively long prep time (2 runs per day) capture efficiency depends on uniformity of cell size and shape High cost of cartridges Cells must be fresh and processed immediately Droplet-based (10X Genomics Chromium) Very high throughput Up to 8 samples per run System cost relatively low 500–1,500 genes per cell Limited customizability (little control over cell input; susceptible to selection biases) Plate methods (SMART-seq2) Can simultaneously measure gnome DNA and transcriptome not restricted by cell size, shape, homogeneity, or total numbers (suitable for very rare cell populations) Economical (uses off the shelf reagents) ~4,000–7,000 genes per cell - No UMIs and barcodes (no gene level quantification or multiplexing of samples)

Methods of single-cell isolation: Droplet-based Methods of single-cell isolation: Limiting dilution: not very efficient Micromanipulation:  Time consuming; low throughput FACS: highly purified single cells IF cells express cell surface marker

Methods of single-cell isolation: Droplet-based Methods of single-cell isolation: Laser capture microdissection isolate cells from solid samples  Microfluidic technology low sample consumption low analysis cost precise fluid control Decreased risk of external contamination   CellSearch Antibody conjugated to magnetic particles To isolate desired cells Good for rare cell types 

Droplet-based cell lysis -> reverse transcription into first-strand cDNA -> second-strand synthesis -> cDNA amplification UMIs: - 4–10 random nucleotides that are introduced with the primer used for cDNA generation before amplification multiple reads with the same UMI sequence map to the same gene = one molecule Cell barcodes: labeling of cDNA by a cell-specific DNA sequence that allows multiplexing at an early stage

Droplet-based cell lysis -> reverse transcription into first-strand cDNA -> second-strand synthesis -> cDNA amplification UMIs: - 4–10 random nucleotides that are introduced with the primer used for cDNA generation before amplification multiple reads with the same UMI sequence map to the same gene = one molecule Cell barcodes: labeling of cDNA by a cell-specific DNA sequence that allows multiplexing at an early stage

scRNA-seq platforms Pros Cons Fluidigm C1 (microfluidics) / Fluidigm C1 mRNA Seq HT Allow visual inspection of captured cells (can exclude empty wells and doublets) Customizable lower false positives than tube-based technologies less bias than tube-based technologies C1 = full-length transcript / HT = 3’ 300–7,000 genes per cell - Only 2 inlets for samples - Low throughput (up to 96 cells/ 800 cells) - >10,000/ 1,000 cells required for capture - Relatively long prep time (2 runs per day) capture efficiency depends on uniformity of cell size and shape High cost of cartridges Cells must be fresh and processed immediately Droplet-based (10X Genomics Chromium) Very high throughput Up to 8 samples per run System cost relatively low 500–1,500 genes per cell Limited customizability (little control over cell input; susceptible to selection biases) Plate methods (SMART-seq2) Can simultaneously measure gnome DNA and transcriptome not restricted by cell size, shape, homogeneity, or total numbers (suitable for very rare cell populations) Economical (uses off the shelf reagents) ~4,000–7,000 genes per cell - No UMIs and barcodes (no gene level quantification or multiplexing of samples)

Plate-based Template Switching Oligonucleotide

Remove barcodes from cell-free mRNA Processing scRNA-seq data Map reads to genome, not transcriptome Decreases multi-mapping reads Critical for snRNA-seq Splice-aware aligners (STAR) Pseudoaligners (faster) Associate reads with genes or transcripts - featureCounts - HTSeq remove PCR noise using UMIs demultiplexing to identify cells Remove barcodes from cell-free mRNA (much lower average read count than barcodes derived from intact cells)

Processing scRNA-seq data Remove low-quality ‘cells’ based on mapping statistics: overrepresentation of mitochondrial RNAs, ribosomal RNAs (>40%), spike-ins, adapters and/or reads that map outside of exons Normalization to correct for unwanted variation among cells caused by technical variation remove batch effects Biology-based analysis (like differential expression)

Some examples of biology-based analysis Purpose: to directly investigate AD brain changes in cell proportion and gene expression using single cell resolution Del-Aguila, J.L. et al. A single- nuclei RNA sequencing study of Mendelian and sporadic AD in the human brain. bioRxiv. Mar. 30, 2019. doi: http://dx.doi.org/10.1101/593756.

To identify different cell types in brain samples by a CGS approach (unsupervised graph-based clustering) and then annotated by cell type using marker genes  t-distributed Stochastic Neighbor Embedding (tSNE) plot is a dimensionality reduction technique Differences with PCA: tSNE always produces a 2D separation tSNE is non-deterministic (you won't get exactly the same output each time you run it) tSNE tends to cope better with non-linear signals in your data, (less impact of outliers; visible separation between relevant groups is improved) 4. After tSNE input features are no longer identifiable, and you cannot make any inference based only on the output of t-SNE NOTE: very computationally intensive (may need to apply another dimensionality reduction technique like PCA first)

To identify different cell types in brain samples: Classic Gene Set (CGS) from Pooled Subjects: (Seurat FindVariableGenes -> 2,360 genes -> calculate 100 PCs -> identified the optimal number of PCs (65) 6 cell types 25 clusters

To identify different cell types in brain samples: Consensus Gene Set (ConGen) from each subject: (Seurat FindVariableGenes -> 2,447 (S1); 2,354 (S2); 1,972 (S3) -> R function intersection to identify common genes (1,434) -> calculate 100 PCs -> identified the optimal number of PCs (25) 14 cell types; better resolution

Cluster annotation Evaluating the expression of maker genes for neurons, astrocytes, oligodendrocytes, microglia, oligodendrocyte precursor cells, endothelial cells, excitatory and inhibitory neurons (from literature) -> Seurat DotPlot to visualize the average gene expression for the marker genes in each cluster

Workflow Analysis Plan

Single cell analysis: current challenges - Biggest challenge: missing data (excess zeros) “Dropout” - technical (not captured) - biological (really no expression) sampling (just not deep enough sequencing) can’t distinguish between these dropout = largest source of variation How to deal with missing data? Increase read depth Impute the missing data based on clustered cells (DrImpute, CIDR, MAGIC, scimpute) Impute the missing data based on bulk RNAseq data (SCRABBLE) Use biological knowledge – gene-gene coexpression (netNMF-sc)

Single cell analysis: current challenges Explosion of methods and software, but not yet clear best practices https://github.com/seandavi/awesome-single-cell Doublet Identification demuxlet - [shell] - Multiplexed droplet single-cell RNA-sequencing using natural genetic variation DoubletFinder - [R] - Doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. BioRxiv DoubletDecon - [R] - Cell-State Aware Removal of Single-Cell RNA-Seq Doublets. [BioRxiv](DoubletDecon: Cell-State Aware Removal of Single-Cell RNA-Seq Doublets) DoubletDetection - [R, Python] - A Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices. An R implementation is in development. Scrublet - [Python] - Computational identification of cell doublets in single-cell transcriptomic data. BioRxiv

Single cell analysis: current challenges Assigning cell types to clusters of cells: - dimensionality reduction (tSNE, PCA, UMAP) -> unsupervised clustering -> annotation of clusters Use of marker genes Known marker genes Expression high enough to be measured (not always true for known cell surface markers) Subjective (different researchers choose different markers) Novel cell types? Use of annotated training data (e.g. reference atlas) comparisons with annotated reference data using automatically chosen genes that optimally discriminate between cell types (scmap, SingleR) - allow the assignment of cells to an intermediate or unassigned type (CHETAH) Challenge: human data often clusters by individual, rather than cell type

Single cell analysis: current challenges How to combine datasets for analysis: scmap: projection of single-cell RNA-seq data across data sets scMerge: using genes that do not to change across all samples and a robust algorithm to infer pseudoreplicates between datasets. 

Single cell analysis: current challenges Look to see advances in single cell RNA seq cancer research for solutions to problems