Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH

Slides:

Advertisements

Similar presentations

Visualizing RNA-Seq Differential Expression Results with CummeRbund

Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Peter Tsai Bioinformatics Institute, University of Auckland

RNA-seq analysis case study Anne de Jong 2015

NGS Analysis Using Galaxy

RNA-Seq Visualization

Expression Analysis of RNA-seq Data

Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.

RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop RNA-Seq visualization with cummeRbund.

Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.

The iPlant Collaborative

The iPlant Collaborative

Bioinformatics for biologists

No reference available

RNA-Seq visualization with CummeRbund

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.

Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

16BIT IITR Data Collection Module If you have not already done so, download and install R from download.

Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.

Introductory RNA-seq Transcriptome Profiling

RNA Quantitation from RNAseq Data

Placental Bioinformatics

WS9: RNA-Seq Analysis with Galaxy (non-model organism )

Programming in R Intro, data and programming structures

Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop

ParaDIME : (Parallel Differential Methylation analysis)

RNA-Seq visualization with CummeRbund

Next – generation Transcriptome Analysis Workshop

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

Tutorial 6 : RNA - Sequencing Analysis and GO enrichment

Pol II Docking and Pausing at Growth and Stress Genes in C. elegans

Transcriptomics II De novo assembly

Chip – Seq Peak Calling in Galaxy

Using ArrayExpress.

Volume 4, Issue 6, Pages e4 (June 2017)

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Kallisto: near-optimal RNA seq quantification tool

Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.

Comparative Analysis of Single-Cell RNA Sequencing Methods

Volume 21, Issue 1, Pages e6 (July 2017)

EXTENDING GENE ANNOTATION WITH GENE EXPRESSION

Volume 4, Issue 6, Pages e4 (June 2017)

Volume 19, Issue 3, Pages (April 2017)

Learning to count: quantifying signal

RNA sequencing (RNA-Seq) and its application in ovarian cancer

Assessing changes in data – Part 2, Differential Expression with DESeq2

Volume 16, Issue 8, Pages (August 2016)

Genome-wide analysis of p53 occupancy.

Generating ChIP-seq profiles from 18G core needle biopsies from radical prostatectomy samples. Generating ChIP-seq profiles from 18G core needle biopsies.

Gene Expression Analysis

Pol II Docking and Pausing at Growth and Stress Genes in C. elegans

Volume 10, Issue 10, Pages (October 2017)

Volume 21, Issue 9, Pages (November 2017)

Volume 16, Issue 2, Pages (February 2015)

Transcriptomics Data Visualization Using Partek Flow Software

Transcriptomics – towards RNASeq – part III

ChIP-seq analyses in primary breast tissue.

ChIP-seq analyses in primary in prostate tissue.

Fig. 3 Conserved genomic association of PRC1 activity in different leukemic cells. Conserved genomic association of PRC1 activity in different leukemic.

Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen Cell Reports

Volume 13, Issue 10, Pages (December 2015)

EN1-associated chromatin complexes in breast cancer cells.

BRD4 expression and genomic distribution in B-CLL.

Differential Expression of RNA-Seq Data

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH Analysis of genomes and transcriptomes using RNA-seq and ChIP-seq Practical session Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH ICGEB – Practical Course "Bioinformatics: Computer Methods in Molecular Biology” June 26-30 / 2017

RNA-seq workflow for the tutorial

Slides ftp://ftp.ncbi.nlm.nih.gov/pub/marino/teaching/ICGEB/2016/

Differential gene expression from RNA-Seq data

Differential gene expression from RNA-Seq data 1. Get to the RNA-Seq directory and launch R user0@head:~$ cd marino-data/RNA-Seq/ user0@head:~/marino-data/RNA-Seq$ ll total 344372 -rw-r----- 1 user0 user0 348194816 Apr 7 20:58 GSE27003GPL9115_DGE_22ba48b764533b15733122c3e8e01ae1.db -rw-r----- 1 user0 user0 4438128 Apr 7 20:58 GSE27003GPL9115_DGE_RNASeq_22ba48b764533b15733122c3e8e01ae1_report.pdf user0@head:~/marino-data/RNA-Seq$ R R version 3.3.0 (2016-05-03) -- "Supposedly Educational” Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >

Differential gene expression from RNA-Seq data 2. Load the cummeRbund library - http://compbio.mit.edu/cummeRbund/ and perform basic statistic operations > library(cummeRbund) Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames, duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: RSQLite Loading required package: DBI Loading required package: ggplot2 Loading required package: reshape2 Loading required package: fastcluster Attaching package: 'fastcluster' hclust Loading required package: rtracklayer Loading required package: GenomicRanges Loading required package: IRanges Loading required package: Gviz Loading required package: grid >

Differential gene expression from RNA-Seq data > list.files() [1] "GSE27003GPL9115_DGE_22ba48b764533b15733122c3e8e01ae1.db" [2] "GSE27003GPL9115_DGE_RNASeq_22ba48b764533b15733122c3e8e01ae1_report.pdf” > cuff<-readCufflinks(dbFile="GSE27003GPL9115_DGE_22ba48b764533b15733122c3e8e01ae1.db") > dens<-csDensity(genes(cuff)) > dens Warning messages: 1: Removed 1997 rows containing non-finite values (stat_density). 2: Removed 4973 rows containing non-finite values (stat_density). > The density plot will show you the distribution of your RNA-seq read counts (fpkm)

Differential gene expression from RNA-Seq data 3. Display a boxplot of the expression values and a volcano plot > b<-csBoxplot(genes(cuff)) > b >

Differential gene expression from RNA-Seq data > v<-csVolcanoMatrix(genes(cuff)) > v >

Differential gene expression from RNA-Seq data 4. Extract the differentially expressed genes and plot a heatmap for both conditions > mySigGeneIds<-getSig(cuff,alpha=0.05,level='genes') > myGenes<-getGenes(cuff,mySigGeneIds) Getting gene information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting isoforms information: Getting CDS information: Getting TSS information: Getting promoter information: distData Getting splicing information: Getting relCDS information: >

Differential gene expression from RNA-Seq data > h.rep<-csHeatmap(myGenes,cluster='both',replicates=F) Using tracking_id, sample_name as id variables Using as id variables > h.rep >

Differential gene expression from RNA-Seq data > h.rep<-csHeatmap(myGenes,cluster='both',replicates=T) Using tracking_id, sample_name as id variables Using as id variables > h.rep >

Differential gene expression from RNA-Seq data > h.rep<-csHeatmap(myGenes,cluster='both',replicates=T) Using tracking_id, sample_name as id variables Using as id variables > h.rep >

Differential gene expression from RNA-Seq data

ChIP-seq analysis with DiffBind This package is useful for manipulating ChIP-seq signal in R, for comparing signal across files and for performing tests of diffrential binding. user0@head:~$ cd workspace/chipseq/extra/ user0@head:~/workspace/chipseq/extra$ ls config.csv peaks reads tamoxifen_allfields.csv tamoxifen.csv tamoxifen_GEO.csv tamoxifen_GEO.R testdata The dataset for this example consists of ChIPs against the transcription factor ERa using five breast cancer cell lines. Three of these cell lines are responsive to tamoxifen treatment, while two others are resistant to tamoxifen. There are at least two replicates for each of the cell lines, with one cell line having three replicates, for a total of eleven sequenced libraries. Of the five cell lines, two are based on MCF7 cells: the regular tamoxifen responsive line, as well as MCF7 cells specially treated with tamoxifen until a tamoxifen resistant cell line is obtained.

ChIP-seq analysis with DiffBind user0@head:~/workspace/chipseq/extra$ R R version 3.3.0 (2016-05-03) -- "Supposedly Educational" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(DiffBind)

ChIP-seq analysis with DiffBind > list.files() [1] "config.csv" "peaks" [3] "reads" "tamoxifen_allfields.csv" [5] "tamoxifen_GEO.csv" "tamoxifen_GEO.R" [7] "tamoxifen.csv" "testdata"

ChIP-seq analysis with DiffBind > read.csv("tamoxifen.csv") SampleID Tissue Factor Condition Treatment Replicate 1 BT4741 BT474 ER Resistant Full-Media 1 2 BT4742 BT474 ER Resistant Full-Media 2 3 MCF71 MCF7 ER Responsive Full-Media 1 4 MCF72 MCF7 ER Responsive Full-Media 2 5 MCF73 MCF7 ER Responsive Full-Media 3 6 T47D1 T47D ER Responsive Full-Media 1 7 T47D2 T47D ER Responsive Full-Media 2 8 MCF7r1 MCF7 ER Resistant Full-Media 1 9 MCF7r2 MCF7 ER Resistant Full-Media 2 10 ZR751 ZR75 ER Responsive Full-Media 1 11 ZR752 ZR75 ER Responsive Full-Media 2 bamReads ControlID bamControl 1 reads/Chr18_BT474_ER_1.bam BT474c reads/Chr18_BT474_input.bam 2 reads/Chr18_BT474_ER_2.bam BT474c reads/Chr18_BT474_input.bam 3 reads/Chr18_MCF7_ER_1.bam MCF7c reads/Chr18_MCF7_input.bam 4 reads/Chr18_MCF7_ER_2.bam MCF7c reads/Chr18_MCF7_input.bam 5 reads/Chr18_MCF7_ER_3.bam MCF7c reads/Chr18_MCF7_input.bam 6 reads/Chr18_T47D_ER_1.bam T47Dc reads/T47D_input.bam 7 reads/Chr18_T47D_ER_2.bam T47Dc reads/T47D_input.bam 8 reads/Chr18_TAMR_ER_1.bam TAMRc reads/TAMR_input.bam 9 reads/TAMR_ER_2.bam TAMRc reads/TAMR_input.bam 10 reads/Chr18_ZR75_ER_1.bam ZR75c reads/ZR75_input.bam 11 reads/Chr18_ZR75_ER_2.bam ZR75c reads/ZR75_input.bam Peaks PeakCaller 1 peaks/BT474_ER_1.bed.gz bed 2 peaks/BT474_ER_2.bed.gz bed 3 peaks/MCF7_ER_1.bed.gz bed 4 peaks/MCF7_ER_2.bed.gz bed 5 peaks/MCF7_ER_3.bed.gz bed 6 peaks/T47D_ER_1.bed.gz bed 7 peaks/T47D_ER_2.bed.gz bed 8 peaks/TAMR_ER_1.bed.gz bed 9 peaks/TAMR_ER_2.bed.gz bed 10 peaks/ZR75_ER_1.bed.gz bed 11 peaks/ZR75_ER_2.bed.gz bed >

ChIP-seq analysis with DiffBind > ta <- dba(sampleSheet="tamoxifen.csv") BT4741 BT474 ER Resistant Full-Media 1 bed BT4742 BT474 ER Resistant Full-Media 2 bed MCF71 MCF7 ER Responsive Full-Media 1 bed MCF72 MCF7 ER Responsive Full-Media 2 bed MCF73 MCF7 ER Responsive Full-Media 3 bed T47D1 T47D ER Responsive Full-Media 1 bed T47D2 T47D ER Responsive Full-Media 2 bed MCF7r1 MCF7 ER Resistant Full-Media 1 bed MCF7r2 MCF7 ER Resistant Full-Media 2 bed ZR751 ZR75 ER Responsive Full-Media 1 bed ZR752 ZR75 ER Responsive Full-Media 2 bed > ta 11 Samples, 2845 sites in matrix (3795 total): ID Tissue Factor Condition Treatment Replicate Caller Intervals 1 BT4741 BT474 ER Resistant Full-Media 1 bed 1080 2 BT4742 BT474 ER Resistant Full-Media 2 bed 1122 3 MCF71 MCF7 ER Responsive Full-Media 1 bed 1556 4 MCF72 MCF7 ER Responsive Full-Media 2 bed 1046 5 MCF73 MCF7 ER Responsive Full-Media 3 bed 1339 6 T47D1 T47D ER Responsive Full-Media 1 bed 527 7 T47D2 T47D ER Responsive Full-Media 2 bed 373 8 MCF7r1 MCF7 ER Resistant Full-Media 1 bed 1438 9 MCF7r2 MCF7 ER Resistant Full-Media 2 bed 930 10 ZR751 ZR75 ER Responsive Full-Media 1 bed 2346 11 ZR752 ZR75 ER Responsive Full-Media 2 bed 2345 > pdf("Correlation-occupancy-data.pdf") > plot(ta) > dev.off() null device 1 >

ChIP-seq analysis with DiffBind Go to: http://23.251.138.125/~user0/workspace/chipseq/extra/Correlation-occupancy-data.pdf

ChIP-seq analysis with DiffBind > data(tamoxifen_counts) > ta2 <- tamoxifen > ta2 <- dba.contrast(ta2, categories=DBA_CONDITION) > ta2 <- dba.analyze(ta2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pdf("Correlation-significantly-differentially-bound.pdf") > plot(ta2, contrast=1) > dev.off() null device 1 > ta2 11 Samples, 2845 sites in matrix: ID Tissue Factor Condition Treatment Replicate Caller Intervals FRiP 1 BT4741 BT474 ER Resistant Full-Media 1 counts 2845 0.16 2 BT4742 BT474 ER Resistant Full-Media 2 counts 2845 0.15 3 MCF71 MCF7 ER Responsive Full-Media 1 counts 2845 0.27 4 MCF72 MCF7 ER Responsive Full-Media 2 counts 2845 0.17 5 MCF73 MCF7 ER Responsive Full-Media 3 counts 2845 0.23 6 T47D1 T47D ER Responsive Full-Media 1 counts 2845 0.10 7 T47D2 T47D ER Responsive Full-Media 2 counts 2845 0.06 8 MCF7r1 MCF7 ER Resistant Full-Media 1 counts 2845 0.20 9 MCF7r2 MCF7 ER Resistant Full-Media 2 counts 2845 0.13 10 ZR751 ZR75 ER Responsive Full-Media 1 counts 2845 0.32 11 ZR752 ZR75 ER Responsive Full-Media 2 counts 2845 0.22 1 Contrast: Group1 Members1 Group2 Members2 DB.DESeq2 1 Resistant 4 Responsive 7 677 >

ChIP-seq analysis with DiffBind Go to: http://23.251.138.125/~user25/workspace/chipseq/extra/Correlation-significantly-differentially-bound.pdf

ChIP-seq analysis with DiffBind > data(tamoxifen_counts) > ta2 <- tamoxifen > ta2 <- dba.contrast(ta2, categories=DBA_CONDITION) > ta2 <- dba.analyze(ta2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pdf("Correlation-significantly-differentially-bound.pdf") > plot(ta2, contrast=1) > dev.off() null device 1 >

ChIP-seq analysis with DiffBind > ta2 11 Samples, 2845 sites in matrix: ID Tissue Factor Condition Treatment Replicate Caller Intervals FRiP 1 BT4741 BT474 ER Resistant Full-Media 1 counts 2845 0.16 2 BT4742 BT474 ER Resistant Full-Media 2 counts 2845 0.15 3 MCF71 MCF7 ER Responsive Full-Media 1 counts 2845 0.27 4 MCF72 MCF7 ER Responsive Full-Media 2 counts 2845 0.17 5 MCF73 MCF7 ER Responsive Full-Media 3 counts 2845 0.23 6 T47D1 T47D ER Responsive Full-Media 1 counts 2845 0.10 7 T47D2 T47D ER Responsive Full-Media 2 counts 2845 0.06 8 MCF7r1 MCF7 ER Resistant Full-Media 1 counts 2845 0.20 9 MCF7r2 MCF7 ER Resistant Full-Media 2 counts 2845 0.13 10 ZR751 ZR75 ER Responsive Full-Media 1 counts 2845 0.32 11 ZR752 ZR75 ER Responsive Full-Media 2 counts 2845 0.22 1 Contrast: Group1 Members1 Group2 Members2 DB.DESeq2 1 Resistant 4 Responsive 7 677 >

ChIP-seq analysis with DiffBind > tadb <- dba.report(ta2) > tadb GRanges object with 677 ranges and 6 metadata columns: seqnames ranges strand | Conc Conc_Resistant <Rle> <IRanges> <Rle> | <numeric> <numeric> 1291 chr18 [34597700, 34598200] * | 5.33 0.02 2452 chr18 [64490684, 64491184] * | 6.36 1.39 2571 chr18 [69433116, 69433616] * | 4.57 -0.79 2771 chr18 [74536113, 74536613] * | 3.93 -0.79 976 chr18 [26860992, 26861492] * | 7.3 3.1 ... ... ... ... . ... ... 1405 chr18 [38482733, 38483233] * | 3.23 0.99 1695 chr18 [45053220, 45053720] * | 2.77 0.81 1650 chr18 [43648626, 43649126] * | 3.88 2.31 1702 chr18 [45489315, 45489815] * | 1.54 -0.22 1506 chr18 [41736699, 41737199] * | 1.84 0 Conc_Responsive Fold p-value FDR <numeric> <numeric> <numeric> <numeric> 1291 5.97 -5.95 1.24e-10 3.21e-07 2452 7 -5.61 2.26e-10 3.21e-07 2571 5.21 -6 3.59e-09 3.41e-06 2771 4.57 -5.35 6.56e-09 4.67e-06 976 7.92 -4.82 8.74e-09 4.97e-06 ... ... ... ... ... 1405 3.76 -2.77 0.0116 0.0489 1695 3.28 -2.47 0.0117 0.0492 1650 4.34 -2.03 0.0117 0.0492 1702 2.03 -2.25 0.0118 0.0494 1506 2.34 -2.34 0.0118 0.0494 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths >

ChIP-seq analysis with DiffBind > counts <- dba.report(ta2, bCounts=TRUE) > x <- mcols(counts)[1,-c(1:6)] > x <- unlist(x) > (xord <- x[match(ta2$samples$SampleID, names(x))]) BT4741 BT4742 MCF71 MCF72 MCF73 T47D1 T47D2 MCF7r1 MCF7r2 ZR751 ZR752 1.70 0.56 36.00 23.57 52.47 14.71 11.02 0.59 1.21 156.08 144.10 > cond <- factor(ta2$samples[,"Condition"]) > condcomb <- factor(paste(ta2$samples[,"Condition"], ta2$samples[,"Tissue"])) > pdf("Counts-over-the-conditions.pdf") > par(mar=c(15,5,2,2)) > stripchart(log(xord) ~ condcomb, method="jitter", vertical=TRUE, las=2, ylab="log2 normalized counts") > dev.off() pdf 2 >

ChIP-seq analysis with DiffBind Go to: http://23.251.138.125/~user25/workspace/chipseq/extra/Counts-over-the-conditions.pdf