Working with RNA-Seq Data

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
. Differentially Expressed Genes, Class Discovery & Classification.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
BIOMARKER STUDIES IN CLINICAL TRIALS Vicki Seyfert-Margolis, PhD.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
RNAseq analyses -- methods
Verna Vu & Timothy Abreo
A Short Overview of Microarrays Tex Thompson Spring 2005.
Central dogma: the story of life RNA DNA Protein.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
No reference available
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
High-throughput genomic profiling of tumor-infiltrating leukocytes
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
David Amar, Tom Hait, and Ron Shamir
Simon v RNA-Seq Analysis Simon v
RNA Quantitation from RNAseq Data
Placental Bioinformatics
Moderní metody analýzy genomu
Figure 2. DNA methylation mediated MORT gene silencing is linked to luminal, receptor positive breast cancers. (A) MORT expression level plotted versus.
Gene expression from RNA-Seq
An Artificial Intelligence Approach to Precision Oncology
RNA-Seq analysis in R (Bioconductor)
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
FINAL PROJECT- Key dates
Differential Gene Expression
Gene expression.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Kallisto: near-optimal RNA seq quantification tool
Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.
Many Sample Size and Power Calculators Exist On-Line
Design and Analysis of Single-Cell Sequencing Experiments
Computational Tools for Stem Cell Biology
Volume 2, Issue 4, Pages (April 2008)
Volume 11, Issue 11, Pages (June 2015)
Loyola Marymount University
Expression profiling of snoRNAs in normal hematopoiesis and AML
Gene expression and genomic profiling reveal estrogen-independent ER transcriptional activity. Gene expression and genomic profiling reveal estrogen-independent.
Dimension reduction : PCA and Clustering
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Volume 7, Issue 5, Pages (June 2014)
Volume 158, Issue 2, Pages (July 2014)
Gene Expression Analysis
The Omics Dashboard.
Additional file 2: RNA-Seq data analysis pipeline
Volume 11, Issue 11, Pages (June 2015)
Loyola Marymount University
Proteomics Informatics David Fenyő
Sequence Analysis - RNA-Seq 2
Volume 26, Issue 12, Pages e5 (March 2019)
Loyola Marymount University
Computational Tools for Stem Cell Biology
PMPs survive STZ cytotoxicity and produce larger colonies with a higher yield of insulin+ progeny. PMPs survive STZ cytotoxicity and produce larger colonies.
Loyola Marymount University
The prostate cancer risk variant rs regulates multiple gene expression through extreme long-range chromatin interaction to control tumor progression.
Figure 1. Identification of differentially expressed messenger RNAs (mRNAs) in the The Cancer Genome Atlas (TCGA) BRCA database. (A) Heat map of the log2-fold.
Supplementary Figure S1
Genome-wide Functional Analysis Reveals Factors Needed at the Transition Steps of Induced Reprogramming  Chao-Shun Yang, Kung-Yen Chang, Tariq M. Rana 
Volume 8, Issue 5, Pages e8 (May 2019)
RUNX3 depletion induces cellular senescence and inflammatory cytokine expression in cells undergoing TGFβ-mediated EMT. A, Cells were transfected with.
Volume 28, Issue 3, Pages e7 (July 2019)
Differential Expression of RNA-Seq Data
Subtype classification of breast functional screening results.
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Presentation transcript:

Working with RNA-Seq Data Part 1: Working with RNA-Seq Data

RNA-seq: overview Genome .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….

RNA-seq: overview Genome Gene A Gene B Gene C

RNA-seq: overview Genome Gene A Gene B Gene C Transcr. A Transcr. A Transcr. C

RNA-seq: overview Genome Reads Gene A Gene B Gene C Transcr. A Transcr. C Reads

RNA-seq: overview Genome Reads Gene A Gene B Gene C Transcr. A Transcr. C Reads Transcr. A Transcr. C

RNA-seq: some details Genome Shattering Gene A Gene B Gene C Transcr. Transcr. A Transcr. C Transcr. C Shattering

RNA-seq: some details Genome Adapters ligation Gene A Gene B Gene C Transcr. Transcr. Transcr. A Transcr. Transcr. C Adapters ligation

RNA-seq: some details Genome PCR amplification Gene A Gene B Gene C Transcr. Transcr. Transcr. A Transcr. Transcr. C PCR amplification

RNA-seq: some details Genome “Reading” Gene A Gene B Gene C Transcr. Transcr. A Transcr. Transcr. C “Reading”

RNA-seq: per-sample processing Preprocessing: Adapters removal plus additional trimming Removing PCR duplicates Mapping Mapping on the set of known transcripts Mapping on genome (and potential identification of novel transcripts) Combined strategy Quantification of expression levels

RNA-seq: Comments PCR removal should be used with caution to avoid removing natural duplicates (valuable links: http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965708/ - DNA-seq and variant calling https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597324/ - RNA-seq, ChIP-seq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3871669/ - trimming

RNA-seq: processing

RNA-seq: processing

RNA-seq: expression level quantification Standard measures read counts (raw, expected) FPKM – fragments per kilo base per million mapped reads: Number of reads mapped on the gene / ((total number of mapped reads – in millions) x (gene length – in kilobases)) TPM – transcripts per million For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all TPMg is one million. But constants C are different for different samples.

RNA-seq: expression level quantification Alternative definition of TPM: (Number of reads mapped on the gene x read mean length x 106) / (gene length x T), where T is the sum over all genes of (Number of reads mapped on the gene x read mean length) / gene length Each term here represents the number of sampled transcripts corresponding to a gene, and T estimates the total number of sampled transcripts (molecules). Thus, TPM is the estimate of the number of transcripts corresponding to a gene in every million transcripts. Details: Wagner G.P., Kin K., Lynch V.J. (Theory Biosci., 2012) https://www.ncbi.nlm.nih.gov/pubmed/22872506

RNA-seq: expression level quantification Linear scale vs Log-scale Relative differences are biologically more meaningful than absolute. Computations are simplified if a log-scaling is performed: Log-scaled measure = log2 (linear-scale measure + shift) For relatively large values a difference equal to 1 in log-scale is a 2x difference in linear scale; difference equal to 3 in log-scale is a 8x difference in linear scale, etc.; difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction.

Comparison: the role of preprocessing No preprocessing

Comparison: the role of preprocessing No PCR duplicate removal

Comparison: the role of preprocessing Standard

Comparison: the role of preprocessing (output)

Comparison: the role of preprocessing

Comparison: the role of preprocessing

Extended pipeline

Extended pipeline

BREAK B R E A K

Differential expression and pathway / gene set enrichment analysis Part 2: Differential expression and pathway / gene set enrichment analysis

Differential expression analysis Quantities related to the degree of differential expression: Difference between mean expression levels – fold change (please, pay attention to scale); Statistical significance – p-value, adjusted p-value (e.g., FDR) Expression level magnitude (caution with low- expressed genes from the analysis).

Differential expression analysis

Differential expression analysis

Gene set / pathway enrichment analysis Possible options: Use only lists (thresholding required): one of the standard tools here is The Database for Annotation, Visualization and Integrated Discovery – DAVID (https://david.ncifcrf.gov/home.jsp, https://david- d.ncifcrf.gov/). Take into consideration degrees of differential expression; Additionally take into consideration pathway topology.

Gene set / pathway enrichment analysis

Gene set / pathway enrichment analysis

BREAK B R E A K

Unsupervised analysis Part 3: Unsupervised analysis

Unsupervised analysis: PCA

Unsupervised analysis: PCA

Unsupervised analysis: PCA

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering

Unsupervised analysis: hierarchical clustering Dendrogram

Unsupervised analysis: hierarchical clustering Dendrogram

Unsupervised analysis: PCA (15 genes)

Unsupervised analysis: PCA (15 genes)

Unsupervised analysis: hierarchical clustering, 15 genes Dendrogram

Unsupervised analysis: hierarchical clustering, 15 genes Dendrogram Luminal C-low N-like Basal

Gene annotation: ENSG to Gene Symbols plus GO

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes

Unsupervised analysis: K-means, 15 genes “The SUM52PE cell line was derived from a pleural effusion and was found to be negative for ER and PR expression, however the original primary tumor from this patient was positive for both hormone receptors”. Chavez KJ, Garimella SV, Lipkowitz S. Triple negative breast cancer cell lines: one tool in the search for better treatment of triple negative breast cancer. Breast Dis. 2010; 32(1-2):35-48. Ethier SP, Kokeny KE, Ridings JW, Dilts CA. erbB family receptor expression and growth regulation in a newly isolated human breast cancer cell line. Cancer Res. 1996; 56(4): 899-907.

BREAK B R E A K

Supervised analysis: classification Part 4: Supervised analysis: classification

Supervised analysis: SVM with a linear kernel as an example

Supervised analysis: SVM with a linear kernel as an example

Supervised analysis: SVM with a linear kernel as an example

Supervised analysis: SVM with a linear kernel as an example

Supervised analysis: SVM with a linear kernel as an example

Supervised analysis: SVM with a linear kernel as an example ?

Supervised analysis: SVM with a linear kernel as an example ?

Supervised analysis: available methods Linear Discriminant Analysis (LDA) Quadratic Discriminant Analysis (QDA) Random Forest Support Vector Machine (SVM) Naïve Bayes

Supervised analysis: 15 genes

BREAK B R E A K

Separation of TCGA and breast cancer PDX samples BREAK HANDSON Separation of TCGA and breast cancer PDX samples

Analysis of a subset of breast cancer PDX samples BREAK HANDSON Analysis of a subset of breast cancer PDX samples