Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing)

Slides:



Advertisements
Similar presentations
An Introduction to Studying Expression Data Through RNA-seq
Advertisements

12/04/2017 RNA seq (I) Edouard Severing.
Peter Tsai Bioinformatics Institute, University of Auckland
Next-generation sequencing
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
High Throughput Sequencing
Reading the Blueprint of Life
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
RNAseq analyses -- methods
The iPlant Collaborative
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
No reference available
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions) Biases in RNA-Seq.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
RNA Quantitation from RNAseq Data
Next generation sequencing
The Transcriptional Landscape of the Mammalian Genome
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Quality Control & Preprocessing of Metagenomic Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Gene expression.
Research in Computational Molecular Biology , Vol (2008)
Computational Methods for Analysis of Single Cell RNA-Seq Data
Lecture 4. Topics in Gene Regulation and Epigenomics (Basics)
Differential Expression from RNA-seq
Stephen Clark – Reik Lab, Babraham Institute
Design and Analysis of Single-Cell Sequencing Experiments
Analysing ChIP-Seq Data
Gene expression estimation from RNA-Seq data
Measuring transcriptomes with RNA-Seq
Computational Tools for Stem Cell Biology
Genomes and Their Evolution
Discovery tools for human genetic variations
Summary and Recommendations
Exploring and Understanding ChIP-Seq data
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Getting the numbers comparable
Working with RNA-Seq Data
Introduction to Sequencing
Gene Expression Analysis
Quantitative analyses using RNA-seq data
Summary and Recommendations
Sequence Analysis - RNA-Seq 2
BF528 - Sequence Analysis Fundamentals
Schematic representation of a transcriptomic evaluation approach.
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Computational Tools for Stem Cell Biology
Peiyong Jiang, K.C. Allen Chan, Y.M. Dennis Lo
STARmap for 3D transcriptome imaging and molecular cell typing.
Single cell RNAseq Kathie Mihindukulasuriya, PhD
The Technology and Biology of Single-Cell RNA Sequencing
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Lecture outline Single-cell sequencing: why and how Specifics about single-cell RNA sequencing Computational methods for processing and analyzing single-cell sequencing data Focusing on single-cell RNA sequencing Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Single-Cell Sequencing: Why and How Part 1 Single-Cell Sequencing: Why and How

Samples involved in sequencing Traditional: bulk samples Alternatives not available previously Results: Mixture of many cells Superposition of data Reasons: Losing cell-specific information Relatively simple procedure Providing sufficient materials Missing rare cell types Image credit: Owens, Nature 491(7422):27-29, (2012) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Heterogeneity in bulk samples Different cells may have heterogeneous sequences/activities: Different cell types Different sub-clones Different species ... Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Heterogeneity examples Blood samples Image credit: https://www.ncbi.nlm.nih.gov/pubmedhealth/PMHT0022042/?figure=1; Barreto et al., Journal of Pharmacy Practice 27(5):440-446, (2014) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Heterogeneity examples Blood samples Image credit: Villani et al., Science 356(6335):eaah4573, (2017) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Heterogeneity examples Tumor heterogeneity Image source: http://patogeralpunf.wixsite.com/generalpathology/neoplasms Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Heterogeneity examples Metagenomics Image source: https://teachthemicrobiome.weebly.com/sequencing-the-microbiome.html Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Intermediate solution Multi-region sequencing Questions: How many regions? Which regions? How to know whether the decisions are good? What if the sample is too small? Image credit: Gerlinger et al., New England Journal of Medicine 366(10):883-892, (2012) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Single-cell sequencing Pushing the multi-region sequencing idea to the extreme, individual single cells are sequenced Main difficulties: Isolating single cells DNA amplification Data processing Quality control, error correction and bias removal Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Single-cell isolation Image credit: Hu et al., Frontiers in Cell and Developmental Biology 4:116, (2016) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Amplification Example: whole-genome amplification Image credit: Gawad et al., Nature Reviews Genetics 17(3):175-188, (2016) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Types Single-cell... DNA sequencing RNA sequencing (scRNA-seq) ATAC-seq ChIP-seq Bisulfite sequencing Hi-C ... Multiple types in the same cell Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Types Image credit: Clark et al., Genome Biology 17:72, (2016) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Single-cell RNA-seq Image source: Wikipedia Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Genome + transcriptome DR-seq: DNA-seq and RNA-seq in the same cell Image credit: Dey et al., Nature Biotechnology 33(3):285-289, (2015) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Methylome + transcriptome scM&T-seq: BS-seq and RNA-seq in the same cell Image credit: Angermueller et al., Nature Methods 13(3):229-232, (2016) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Throughput Image credit: Svensson et al., arXiv :1704.01379v2, (2017) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Issues of resulting data Bias in captured cells Non-uniform amplification Mixing data from different protocols Amplification of errors Allele dropout Sampling bias of DNA fragments Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Part 2 Computational Methods for Processing and Analyzing Single-Cell Sequencing Data

Processing pipeline Image credit: Stegle et al., Nature Reviews Genetics 16(3):133-145, (2015) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Quantitative standards Spike-ins: artificial RNAs/RNAs from another species with known quantity External RNA Control Consortium (ERCC) set: 92 synthetic spikes based on bacterial sequences Unique molecular identifiers (UMIs): short (6-10nt) DNA sequences for barcoding molecules of interest before amplification Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Quality control Standard steps for NGS/RNA-seq Base quality Nucleotide composition k-mer counts Read trimming Read lengths Alignment rate Duplication rate Contamination Sample mix-up Batch effects Reproducibility based on replicates ... Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Quality control Comparing with quantitative standards for evaluating biases Amplification bias 3’ bias RNA degradation Checking the total number of aligned reads and proportion of spike-in reads Checking similarity among single cells Looking for outliers Comparing with bulk sequencing results Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Quantification Quantification measures such as RPKM and FPKM do not work well for scRNA-seq due to: Low read counts, large sampling error Dropouts Different cell sizes/transcript levels in different cells Additional types of bias 3’ bias makes normalization by transcript length not appropriate More common to use a certain form of normalized absolute count Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Normalization Strategies: Fraction of reads mapped to endogenous RNA: normalization across samples Size factor for spike-ins: adjusting for sequencing depth Size factor for endogenous RNAs: adjusting for cell size Number of distinct UMIs for each gene: unaffected by amplification bias Further adjusting based on spike-ins Normalization across genes If the focus is relative expression levels among cells Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Normalization Image credit: Stegle et al., Nature Reviews Genetics 16(3):133-145, (2015) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Confounding factors Image credit: Stegle et al., Nature Reviews Genetics 16(3):133-145, (2015) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Dimension reduction and clustering t-Distributed Stochastic Neighbor Embedding (t-SNE) Minimizing the KL-divergence between cell-cell similarity in the original space and the reduced (usually 2D) space Similarity between two cells in the original space: modeled by Gaussian distribution Similarity between two cells in the reduced space: modeled by a Student-t distribution Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

t-SNE example Image credit: Lake et al., Nature Biotechnology 36(1):70-80, (2018) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Comparing with other methods Image source: http://satijalab.org/seurat/get_started_v1_2.html Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Hierarchical clustering Image credit: Navin et al., Nature 472(7341):90-94, (2011) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Consensus clustering Some clustering methods are not very robust and could produce very different clusters with different: Parameter values Initializations Sampling of data points Randomness of the clustering procedure One way to deal with it is to repeat with many settings in parallel and combine the results Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Consensus clustering Image credit: Kiselev et al., Nature Methods 14(5):483-486, (2017) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Mapping cell types/clusters across time Image credit: Wang et al., Genome Research 27(11):1783-1794, (2017) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Pseudo-time trajectories Ordering of cells (e.g., by polygonal reconstruction) Image credit: Trapnell et al., Nature Biotechnology 32(4):381-386, (2014) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Pseudo-time trajectories Image credit: Kowalczyk et al., Genome Research 25(12):1860-1872, (2015) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Spatial mapping of single cells vISH: virtual in situ hybridization Image credit: Karaiskos et al., Science 358(6360):194-199, (2017) Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Other analyses Identifying differentially expressed genes Identifying marker genes Identifying outlier cells Reconstructing regulatory networks Studying kinetics of transcription Burst size and burst frequency Studying patterns of stochastic gene expression Correlating with other levels of information Genetic variations Allele-specific expression DNA accessibility DNA methylation ... Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Validation of analysis results Comparing with known cell type/stage-specific markers Expression in bulk samples FISH in individual cells Time-lapse microscopy data Within-cluster similarity Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018

Summary High-throughput single-cell sequencing Main challenges Single cell isolation Amplification Processing Types Single-cell RNA-sequencing Quality check, error correction, bias removal Downstream analyses Last update: 20-Feb-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018