Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Principal Component Analysis (dPCA) for ChIP-seq

Similar presentations


Presentation on theme: "Differential Principal Component Analysis (dPCA) for ChIP-seq"— Presentation transcript:

1 Differential Principal Component Analysis (dPCA) for ChIP-seq
Hongkai Ji Department of Biostatistics The Bloomberg School of Public Health Johns Hopkins University

2 Functional Genomics Locations and Functions
Maston, Evans & Green, Annu Rev Genomics Hum Genet, 2006, 7: 29-59

3 Transcription Factor (TF)
ChIP-seq Transcription Factor (TF) Gene motif

4 Motivation: how to compare multiple ChIP profiles between two biological conditions?
Cell Type 1 Cell Type 2

5 Data Structure … … Intensities for locus g, marker m, replicate k :
(H3K4me3) (H3K27me3) Marker 2 Marker M (Myc) Rep 1 Rep K1 Cell Type 1 Marker 1 (H3K4me3) (H3K27me3) Marker 2 Marker M (Myc) Rep 1 Rep K2 Cell Type 2 Intensities for locus g, marker m, replicate k : xgmk ~ G(x; μ1gm, σ2) Intensities for locus g, marker m, replicate k : ygmk ~ G(x; μ2gm, σ2) Locus 1 Locus 2 Locus G

6 Modeling True Difference
0 * 0 * 0 0 * * . 0 * 0 . *

7 Bayesian Perspective

8 Goals of Analysis 1. Estimate 2. Infer …
0 * 0 * 0 0 * * . 0 * 0 . * Goals of Analysis 1. Estimate 2. Infer 0 * 0 * 0 0 * * . 0 * 0 . * (2.a) Rank loci according to each component (based on ugi); (2.b) Test ugi = 0?

9 Example: K562 vs. Huvec ENCODE Data
G = 138,328 MYC motif sites in human genome; M = 18 data sets.

10 Biological meaning of PCs

11 PC1 predicts MYC differential binding better than using each marker individually

12 Example: K562 vs. Huvec ENCODE Data
G = 138,328 MYC motif sites in human genome; M = 25 data sets. PC1: 50% FDR<5%: 65252 H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27me3 H3K36me3 H4K20me1 H3K4me3 H3K27me3 H3K36me3 DNase FAIRE H3K9ac H3K27ac CTCF Pol2 Input CTCF Input CTCF Input Jun Max Pol2 Input PC2: 14% FDR<5%: 47960

13 Other Examples

14 Implications TF Cell type 1 Cell type 2 TF

15 Example: K562 vs. Huvec ENCODE Data
G = human promoters; M = 16 markers. H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27ac H3K27me3 H3K36me3 H4K20me1 H3K4me3 H3K27me3 H3K36me3 CTCF H3K9ac Input CTCF Input

16 PC1 predicts RNA-seq differential expression
Cor =

17 False Discovery Rate (FDR)
0 * 0 * 0 0 * * . 0 * 0 . *

18 Simulation

19 Simulation

20 Summary dPCA provides a way to concisely summarize differences between two cell types. Differential genes along the major PC have biological meaning. Future directions include modeling the signal shapes, multiple conditions, non-linearity, and establishing convergence rate.


Download ppt "Differential Principal Component Analysis (dPCA) for ChIP-seq"

Similar presentations


Ads by Google