Presentation is loading. Please wait.

Presentation is loading. Please wait.

CTCF Peaks.

Similar presentations


Presentation on theme: "CTCF Peaks."— Presentation transcript:

1 CTCF Peaks

2 All CTCF Peaks 15 datasets
Cells: CH12, ER4, G1E, HPC7, LSK, NEU, TCD4, TCD8, ERY, MON 123,387 merged peaks HPC7 has a more stringent cutoff, .01 for peaks

3 Peak length

4 Separating peaks by number of datasets

5 Intersect peak categories with ccREs
Split Peaks into low(1-4), mid(5-12) and high (13-15) and find overlaps with ccREs 26,550 / 83,816 = .32 18,384 / 26,416 = .70 12,833 / 13,154 = .98 Intersect peak categories with ccREs

6 Dominant IDEAS states for called peaks
Peak found in: Low (1-4 datasets) 139,177 states Medium (5-12 datasets) 209,074 states High (13-15 datasets) 187,955 states Dominant IDEAS states for called peaks Sorted by state count in High grouping One state per dataset with peak called per peak 7 C 13 CN 24 PENCA 25 T C 26 CNE T States with CTCF signal

7 All States within peak set, except 0’s only counted when only state
Peak found in: Low (1-4 datasets) 1,216,173 states Medium (5-12 datasets) 566,212 states High (13-15 datasets) 405,088 states CTCF states are often in peaks, but not always the dominant state in the peak. Especially state 24 PENCA. This also uses states from regions where they are not peaks in the cell type. All states in all peaks (except 0’s when other states present)

8 Low category peaks in MultiView track

9 High category peaks in MultiView track

10 TAD boundaries Number of peaks overlapping end base of TADs
Low 185/83,816 = .002 Mid 141/26,416 = .005 High 139/13,154 = .011 Number of peaks overlapping end 20kb of TADs Low 6,128/83,816 = .07 Mid 2,603/26,416 = .10 High 1,719/13,154 = .13 TAD boundaries Boundaries computed as start + 1 and end – 1 and start – 20,000 to start, end – 20,000 to end.

11 Motifs from SeqUnwinder

12 Motifs from SeqUnwinder

13 Motifs from SeqUnwinder
On going run adding in ccREs

14 Other checks ORegAnno Regulatory regions 17,188
Intersect high peaks 509 CTCF peaks Intersect low peaks 1,882 CTCF peaks ORegAnno Transcription Factor binding sites 397,782 Intersect high peaks 5,926 CTCF peaks Intersect low peaks 20,846 CTCF peaks RefSeq Functional elements 1,968 Intersect high peaks 71 RefSeq Ele Intersect low peaks 294 RefSeq Ele Other checks Oreganno sites are tested

15 Cell specific peaks HPC7 (13,141 peaks)
Used all peaks as background. This has only 1 replicate and was only found in this dataset. One of largest peak numbers. Number of cell specific peaks mostly reflects total number of peaks called. (G1E has only 1!)

16 Cell Specific peaks ERY (1,406 replicated peaks)
I required these to be in both replicates.

17 Cell specific peaks MON (1,307 peaks)

18 CTCF Peaks found in all 15 datasets (7,116 peaks)
I raised the cutoff for the FDR for these to .1 from Without this there were no terms.

19 Peak length by category
Average length Median Low 580 493 Mid 975 819 High 1364 1192 High unmerged 735 689

20 Worst case: 63 peaks merged to one
9 with 40 or more. 84 with 30 or more. These would have been mid category without the single large peak merging them.

21 Same region with IDEAS Should I use a window slid over the merged peaks, rather than the peaks themselves? If so what size? Homer peaks – 150, IDEAS windows – 200, ?

22 THE END

23 Second largest peak in high category

24 Slides from before filtering HPC7 peaks and looking for non-zero states

25 All CTCF Peaks 15 datasets 141,678 merged peaks
Cells: CH12, ER4, G1E, HPC7, LSK, NEU, TCD4, TCD8, ERY, MON 141,678 merged peaks

26 Barplot About half the single peaks come from HPC7.
The majority of the peaks found in only 2 datasets are in the datasets with the highest numbers of peaks.

27 CTCF peaks intersected with ccREs
Split Peaks into low(2-4), mid(5-12) and high (13-15) and find overlaps with ccREs 14,315 / 38,794 = .37 18,600 / 26,880 = .69 12,836 / 13,158 = .98 CTCF peaks found in many cell types are very likely to also be ccREs. CTCF peaks intersected with ccREs

28 Most common pattern is all cell types
Most common if only 1 missing is cells with fewest peaks If only 2 missing it is most often MON, then G1E, and CH12 Most common 4 missing is MON + G1E Most common pattern with less than ½ the cells include the cells with the greatest numbers of peaks Both ERY, HPC7, TCD4, TCD8 Looking at ones found in both ERY and ER4, the most common patterns are all or nearly all cell types Patterns in replicated peaks (47,961) are dominated by peak counts in the cell types

29 CTCF peaks and IDEAS states
Out of 141, 678 peaks 50,641 are in state 0 in nearly all used cell types (quiescent) 4,653 are in state 1in nearly all used cell types (transcribed) Other frequently seen and conserved states are 7 (CTCF), 2 (heterochromatin), 10 (active promoter-like), 15(promoter-like) Leaving 81,421 peaks with mixed states CTCF peaks and IDEAS states

30 Counts of peaks in conserved states

31 State 7 peaks State 7 (CTCF) peaks are called in most datasets
Found 1,714 peaks with min 3 and max 15 average 12.4 median 13 first quartile 11 third quartile 15 State 7 peaks

32 State 0 peaks State 0 peaks are mostly found in few datasets
Found 50,641 peaks with min 1 and max 15 average 3 median 1 first quartile 1 third quartile 3 State 0 peaks


Download ppt "CTCF Peaks."

Similar presentations


Ads by Google