Download presentation
Presentation is loading. Please wait.
Published byTracey Wilkinson Modified over 9 years ago
1
Differential Principal Component Analysis (dPCA) for ChIP-seq
Hongkai Ji Department of Biostatistics The Bloomberg School of Public Health Johns Hopkins University
2
Functional Genomics Locations and Functions
Maston, Evans & Green, Annu Rev Genomics Hum Genet, 2006, 7: 29-59
3
Transcription Factor (TF)
ChIP-seq Transcription Factor (TF) Gene motif
4
Motivation: how to compare multiple ChIP profiles between two biological conditions?
Cell Type 1 Cell Type 2
5
Data Structure … … Intensities for locus g, marker m, replicate k :
(H3K4me3) (H3K27me3) Marker 2 … Marker M (Myc) Rep 1 Rep K1 Cell Type 1 Marker 1 (H3K4me3) (H3K27me3) Marker 2 … Marker M (Myc) Rep 1 Rep K2 Cell Type 2 Intensities for locus g, marker m, replicate k : xgmk ~ G(x; μ1gm, σ2) Intensities for locus g, marker m, replicate k : ygmk ~ G(x; μ2gm, σ2) Locus 1 Locus 2 Locus G …
6
Modeling True Difference
0 * 0 * 0 0 * * . 0 * 0 . *
7
Bayesian Perspective
8
Goals of Analysis 1. Estimate 2. Infer …
0 * 0 * 0 0 * * . 0 * 0 . * Goals of Analysis 1. Estimate … 2. Infer 0 * 0 * 0 0 * * . 0 * 0 . * (2.a) Rank loci according to each component (based on ugi); (2.b) Test ugi = 0?
9
Example: K562 vs. Huvec ENCODE Data
G = 138,328 MYC motif sites in human genome; M = 18 data sets.
10
Biological meaning of PCs
11
PC1 predicts MYC differential binding better than using each marker individually
12
Example: K562 vs. Huvec ENCODE Data
G = 138,328 MYC motif sites in human genome; M = 25 data sets. PC1: 50% FDR<5%: 65252 H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27me3 H3K36me3 H4K20me1 H3K4me3 H3K27me3 H3K36me3 DNase FAIRE H3K9ac H3K27ac CTCF Pol2 Input CTCF Input CTCF Input Jun Max Pol2 Input PC2: 14% FDR<5%: 47960
13
Other Examples
14
Implications TF Cell type 1 Cell type 2 TF
15
Example: K562 vs. Huvec ENCODE Data
G = human promoters; M = 16 markers. H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27ac H3K27me3 H3K36me3 H4K20me1 H3K4me3 H3K27me3 H3K36me3 CTCF H3K9ac Input CTCF Input
16
PC1 predicts RNA-seq differential expression
Cor =
17
False Discovery Rate (FDR)
0 * 0 * 0 0 * * . 0 * 0 . *
18
Simulation
19
Simulation
20
Summary dPCA provides a way to concisely summarize differences between two cell types. Differential genes along the major PC have biological meaning. Future directions include modeling the signal shapes, multiple conditions, non-linearity, and establishing convergence rate.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.