Download presentation
Presentation is loading. Please wait.
1
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH
Analysis of genomes and transcriptomes using RNA-seq and ChIP-seq Practical session Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH ICGEB – Practical Course "Bioinformatics: Computer Methods in Molecular Biology” June / 2017
2
RNA-seq workflow for the tutorial
3
Slides ftp://ftp.ncbi.nlm.nih.gov/pub/marino/teaching/ICGEB/2016/
4
Differential gene expression from RNA-Seq data
5
Differential gene expression from RNA-Seq data
1. Get to the RNA-Seq directory and launch R cd marino-data/RNA-Seq/ ll total -rw-r user0 user Apr 7 20:58 GSE27003GPL9115_DGE_22ba48b764533b c3e8e01ae1.db -rw-r user0 user Apr 7 20:58 GSE27003GPL9115_DGE_RNASeq_22ba48b764533b c3e8e01ae1_report.pdf R R version ( ) -- "Supposedly Educational” Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
6
Differential gene expression from RNA-Seq data
2. Load the cummeRbund library - and perform basic statistic operations > library(cummeRbund) Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames, duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: RSQLite Loading required package: DBI Loading required package: ggplot2 Loading required package: reshape2 Loading required package: fastcluster Attaching package: 'fastcluster' hclust Loading required package: rtracklayer Loading required package: GenomicRanges Loading required package: IRanges Loading required package: Gviz Loading required package: grid >
7
Differential gene expression from RNA-Seq data
> list.files() [1] "GSE27003GPL9115_DGE_22ba48b764533b c3e8e01ae1.db" [2] "GSE27003GPL9115_DGE_RNASeq_22ba48b764533b c3e8e01ae1_report.pdf” > cuff<-readCufflinks(dbFile="GSE27003GPL9115_DGE_22ba48b764533b c3e8e01ae1.db") > dens<-csDensity(genes(cuff)) > dens Warning messages: 1: Removed 1997 rows containing non-finite values (stat_density). 2: Removed 4973 rows containing non-finite values (stat_density). > The density plot will show you the distribution of your RNA-seq read counts (fpkm)
8
Differential gene expression from RNA-Seq data
3. Display a boxplot of the expression values and a volcano plot > b<-csBoxplot(genes(cuff)) > b >
9
Differential gene expression from RNA-Seq data
> v<-csVolcanoMatrix(genes(cuff)) > v >
10
Differential gene expression from RNA-Seq data
4. Extract the differentially expressed genes and plot a heatmap for both conditions > mySigGeneIds<-getSig(cuff,alpha=0.05,level='genes') > myGenes<-getGenes(cuff,mySigGeneIds) Getting gene information: FPKM Differential Expression Data Annotation Data Replicate FPKMs Counts Getting isoforms information: Getting CDS information: Getting TSS information: Getting promoter information: distData Getting splicing information: Getting relCDS information: >
11
Differential gene expression from RNA-Seq data
> h.rep<-csHeatmap(myGenes,cluster='both',replicates=F) Using tracking_id, sample_name as id variables Using as id variables > h.rep >
12
Differential gene expression from RNA-Seq data
> h.rep<-csHeatmap(myGenes,cluster='both',replicates=T) Using tracking_id, sample_name as id variables Using as id variables > h.rep >
13
Differential gene expression from RNA-Seq data
> h.rep<-csHeatmap(myGenes,cluster='both',replicates=T) Using tracking_id, sample_name as id variables Using as id variables > h.rep >
14
Differential gene expression from RNA-Seq data
15
ChIP-seq analysis with DiffBind
This package is useful for manipulating ChIP-seq signal in R, for comparing signal across files and for performing tests of diffrential binding. cd workspace/chipseq/extra/ ls config.csv peaks reads tamoxifen_allfields.csv tamoxifen.csv tamoxifen_GEO.csv tamoxifen_GEO.R testdata The dataset for this example consists of ChIPs against the transcription factor ERa using five breast cancer cell lines. Three of these cell lines are responsive to tamoxifen treatment, while two others are resistant to tamoxifen. There are at least two replicates for each of the cell lines, with one cell line having three replicates, for a total of eleven sequenced libraries. Of the five cell lines, two are based on MCF7 cells: the regular tamoxifen responsive line, as well as MCF7 cells specially treated with tamoxifen until a tamoxifen resistant cell line is obtained.
16
ChIP-seq analysis with DiffBind
R R version ( ) -- "Supposedly Educational" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(DiffBind)
17
ChIP-seq analysis with DiffBind
> list.files() [1] "config.csv" "peaks" [3] "reads" "tamoxifen_allfields.csv" [5] "tamoxifen_GEO.csv" "tamoxifen_GEO.R" [7] "tamoxifen.csv" "testdata"
18
ChIP-seq analysis with DiffBind
> read.csv("tamoxifen.csv") SampleID Tissue Factor Condition Treatment Replicate 1 BT4741 BT ER Resistant Full-Media 2 BT4742 BT ER Resistant Full-Media 3 MCF71 MCF7 ER Responsive Full-Media 4 MCF72 MCF7 ER Responsive Full-Media 5 MCF73 MCF7 ER Responsive Full-Media 6 T47D1 T47D ER Responsive Full-Media 7 T47D2 T47D ER Responsive Full-Media 8 MCF7r1 MCF7 ER Resistant Full-Media 9 MCF7r2 MCF7 ER Resistant Full-Media 10 ZR751 ZR ER Responsive Full-Media 11 ZR752 ZR ER Responsive Full-Media bamReads ControlID bamControl 1 reads/Chr18_BT474_ER_1.bam BT474c reads/Chr18_BT474_input.bam 2 reads/Chr18_BT474_ER_2.bam BT474c reads/Chr18_BT474_input.bam 3 reads/Chr18_MCF7_ER_1.bam MCF7c reads/Chr18_MCF7_input.bam 4 reads/Chr18_MCF7_ER_2.bam MCF7c reads/Chr18_MCF7_input.bam 5 reads/Chr18_MCF7_ER_3.bam MCF7c reads/Chr18_MCF7_input.bam 6 reads/Chr18_T47D_ER_1.bam T47Dc reads/T47D_input.bam 7 reads/Chr18_T47D_ER_2.bam T47Dc reads/T47D_input.bam 8 reads/Chr18_TAMR_ER_1.bam TAMRc reads/TAMR_input.bam reads/TAMR_ER_2.bam TAMRc reads/TAMR_input.bam 10 reads/Chr18_ZR75_ER_1.bam ZR75c reads/ZR75_input.bam 11 reads/Chr18_ZR75_ER_2.bam ZR75c reads/ZR75_input.bam Peaks PeakCaller 1 peaks/BT474_ER_1.bed.gz bed 2 peaks/BT474_ER_2.bed.gz bed 3 peaks/MCF7_ER_1.bed.gz bed 4 peaks/MCF7_ER_2.bed.gz bed 5 peaks/MCF7_ER_3.bed.gz bed 6 peaks/T47D_ER_1.bed.gz bed 7 peaks/T47D_ER_2.bed.gz bed 8 peaks/TAMR_ER_1.bed.gz bed 9 peaks/TAMR_ER_2.bed.gz bed 10 peaks/ZR75_ER_1.bed.gz bed 11 peaks/ZR75_ER_2.bed.gz bed >
19
ChIP-seq analysis with DiffBind
> ta <- dba(sampleSheet="tamoxifen.csv") BT4741 BT474 ER Resistant Full-Media 1 bed BT4742 BT474 ER Resistant Full-Media 2 bed MCF71 MCF7 ER Responsive Full-Media 1 bed MCF72 MCF7 ER Responsive Full-Media 2 bed MCF73 MCF7 ER Responsive Full-Media 3 bed T47D1 T47D ER Responsive Full-Media 1 bed T47D2 T47D ER Responsive Full-Media 2 bed MCF7r1 MCF7 ER Resistant Full-Media 1 bed MCF7r2 MCF7 ER Resistant Full-Media 2 bed ZR751 ZR75 ER Responsive Full-Media 1 bed ZR752 ZR75 ER Responsive Full-Media 2 bed > ta 11 Samples, 2845 sites in matrix (3795 total): ID Tissue Factor Condition Treatment Replicate Caller Intervals 1 BT4741 BT ER Resistant Full-Media bed 2 BT4742 BT ER Resistant Full-Media bed 3 MCF71 MCF7 ER Responsive Full-Media bed 4 MCF72 MCF7 ER Responsive Full-Media bed 5 MCF73 MCF7 ER Responsive Full-Media bed 6 T47D1 T47D ER Responsive Full-Media bed 7 T47D2 T47D ER Responsive Full-Media bed 8 MCF7r1 MCF7 ER Resistant Full-Media bed 9 MCF7r2 MCF7 ER Resistant Full-Media bed 10 ZR751 ZR ER Responsive Full-Media bed 11 ZR752 ZR ER Responsive Full-Media bed > pdf("Correlation-occupancy-data.pdf") > plot(ta) > dev.off() null device 1 >
20
ChIP-seq analysis with DiffBind
Go to:
21
ChIP-seq analysis with DiffBind
> data(tamoxifen_counts) > ta2 <- tamoxifen > ta2 <- dba.contrast(ta2, categories=DBA_CONDITION) > ta2 <- dba.analyze(ta2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pdf("Correlation-significantly-differentially-bound.pdf") > plot(ta2, contrast=1) > dev.off() null device 1 > ta2 11 Samples, 2845 sites in matrix: ID Tissue Factor Condition Treatment Replicate Caller Intervals FRiP 1 BT4741 BT ER Resistant Full-Media counts 2 BT4742 BT ER Resistant Full-Media counts 3 MCF71 MCF7 ER Responsive Full-Media counts 4 MCF72 MCF7 ER Responsive Full-Media counts 5 MCF73 MCF7 ER Responsive Full-Media counts 6 T47D1 T47D ER Responsive Full-Media counts 7 T47D2 T47D ER Responsive Full-Media counts 8 MCF7r1 MCF7 ER Resistant Full-Media counts 9 MCF7r2 MCF7 ER Resistant Full-Media counts 10 ZR751 ZR ER Responsive Full-Media counts 11 ZR752 ZR ER Responsive Full-Media counts 1 Contrast: Group1 Members1 Group2 Members2 DB.DESeq2 1 Resistant Responsive >
22
ChIP-seq analysis with DiffBind
Go to:
23
ChIP-seq analysis with DiffBind
> data(tamoxifen_counts) > ta2 <- tamoxifen > ta2 <- dba.contrast(ta2, categories=DBA_CONDITION) > ta2 <- dba.analyze(ta2) converting counts to integer mode gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates > pdf("Correlation-significantly-differentially-bound.pdf") > plot(ta2, contrast=1) > dev.off() null device 1 >
24
ChIP-seq analysis with DiffBind
> ta2 11 Samples, 2845 sites in matrix: ID Tissue Factor Condition Treatment Replicate Caller Intervals FRiP 1 BT4741 BT ER Resistant Full-Media counts 2 BT4742 BT ER Resistant Full-Media counts 3 MCF71 MCF7 ER Responsive Full-Media counts 4 MCF72 MCF7 ER Responsive Full-Media counts 5 MCF73 MCF7 ER Responsive Full-Media counts 6 T47D1 T47D ER Responsive Full-Media counts 7 T47D2 T47D ER Responsive Full-Media counts 8 MCF7r1 MCF7 ER Resistant Full-Media counts 9 MCF7r2 MCF7 ER Resistant Full-Media counts 10 ZR751 ZR ER Responsive Full-Media counts 11 ZR752 ZR ER Responsive Full-Media counts 1 Contrast: Group1 Members1 Group2 Members2 DB.DESeq2 1 Resistant Responsive >
25
ChIP-seq analysis with DiffBind
> tadb <- dba.report(ta2) > tadb GRanges object with 677 ranges and 6 metadata columns: seqnames ranges strand | Conc Conc_Resistant <Rle> <IRanges> <Rle> | <numeric> <numeric> chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | chr18 [ , ] * | Conc_Responsive Fold p-value FDR <numeric> <numeric> <numeric> <numeric> e e-07 e e-07 e e-06 e e-06 e e-06 seqinfo: 1 sequence from an unspecified genome; no seqlengths >
26
ChIP-seq analysis with DiffBind
> counts <- dba.report(ta2, bCounts=TRUE) > x <- mcols(counts)[1,-c(1:6)] > x <- unlist(x) > (xord <- x[match(ta2$samples$SampleID, names(x))]) BT4741 BT4742 MCF71 MCF72 MCF73 T47D1 T47D2 MCF7r1 MCF7r2 ZR751 ZR752 > cond <- factor(ta2$samples[,"Condition"]) > condcomb <- factor(paste(ta2$samples[,"Condition"], ta2$samples[,"Tissue"])) > pdf("Counts-over-the-conditions.pdf") > par(mar=c(15,5,2,2)) > stripchart(log(xord) ~ condcomb, method="jitter", vertical=TRUE, las=2, ylab="log2 normalized counts") > dev.off() pdf 2 >
27
ChIP-seq analysis with DiffBind
Go to:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.