Supplementary Figure 1 Gene A1 st Gene B1 st Gene C1 st ~ Gene G1 st 2 nd ~ 19 th Gene H1 st 2 nd ~ 19 th Gene I1 st 2 nd ~ 19 th ~ 1 st 2 nd 19 th Gene.

Slides:



Advertisements
Similar presentations
Supplementary Material Supplementary Tables Supplementary Table 1. Sequencing statistics for ChIP-seq samples. Supplementary Table 2. Pearson correlation.
Advertisements

Controls for TTS identification using PET A series of controls were implemented in order to evaluate the potential contamination by internal priming in.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
A B IL-4(+) IL-4(-) IL-4(+) IL-4(-) ChIP-Seq (STAT6) Ramos IL-4 (+) P-value Ramos IL-4 (-) P-value BEAS2B IL-4 (+) P-value BEASB IL-4 (-) P-value fold.
Log 2 (expression) H3K4me2 score A SLAMF6 log 2 (expression) Supplementary Fig. 1. H3K4me2 profiles vary significantly between loci of genes expressed.
Functional Elements in the Human Genome
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Pol II Docking and Pausing at Growth and Stress Genes in C. elegans
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
by Leighton J. Core, Joshua J. Waterfall, and John T. Lis
Volume 43, Issue 6, Pages (September 2011)
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci  Gosia Trynka,
High-Resolution Profiling of Histone Methylations in the Human Genome
Volume 63, Issue 1, Pages (July 2016)
Volume 10, Issue 6, Pages (December 2002)
Volume 18, Issue 9, Pages (February 2017)
Taichi Umeyama, Takashi Ito  Cell Reports 
Volume 44, Issue 3, Pages (November 2011)
Volume 9, Issue 1, Pages (July 2017)
Lucas J.T. Kaaij, Robin H. van der Weide, René F. Ketting, Elzo de Wit 
Volume 44, Issue 1, Pages (October 2011)
Supplementary Figure 4. Comparisons of MethyLight and gene expression data. PMR values (X-axis) were plotted against log2 gene expression values (Y-axis)
Volume 133, Issue 3, Pages (May 2008)
Volume 17, Issue 8, Pages (November 2016)
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
High-Resolution Profiling of Histone Methylations in the Human Genome
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Volume 62, Issue 1, Pages (April 2016)
A Broadly Conserved Pathway Generates 3′UTR-Directed Primary piRNAs
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Srinivas Ramachandran, Kami Ahmad, Steven Henikoff  Molecular Cell 
Volume 67, Issue 6, Pages e6 (September 2017)
Volume 44, Issue 3, Pages (November 2011)
Human Promoters Are Intrinsically Directional
Pol II Docking and Pausing at Growth and Stress Genes in C. elegans
Genome-wide binding sites of OsMADS1 and the distribution of binding sites in different regions of annotated genes. Genome-wide binding sites of OsMADS1.
Volume 30, Issue 1, Pages (January 2009)
Evolution of Alu Elements toward Enhancers
Volume 151, Issue 7, Pages (December 2012)
Volume 5, Issue 4, Pages (November 2013)
Volume 132, Issue 6, Pages (March 2008)
Volume 22, Issue 3, Pages e4 (September 2017)
Volume 21, Issue 9, Pages (November 2017)
Volume 122, Issue 6, Pages (September 2005)
Volume 66, Issue 4, Pages e4 (May 2017)
Volume 35, Issue 2, Pages (August 2011)
Volume 20, Issue 7, Pages (August 2017)
Volume 47, Issue 4, Pages (August 2012)
Volume 16, Issue 6, Pages (December 2012)
Gene Density, Transcription, and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains  Chunhui Hou, Li Li, Zhaohui S.
Volume 32, Issue 6, Pages (June 2010)
Volume 3, Issue 6, Pages (June 2013)
Doxorubicin Enhances Nucleosome Turnover around Promoters
Volume 6, Issue 4, Pages (April 2016)
Modeling Enzyme Processivity Reveals that RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways  Nathan Archer, Mark D. Walsh, Vahid Shahrezaei,
(A) Comparison of wild-type and AcfC transcriptomes in relation to the chromatin five-state model. (A) Comparison of wild-type and AcfC transcriptomes.
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Taichi Umeyama, Takashi Ito  Cell Reports 
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
Defining the Status of RNA Polymerase at Promoters
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
Volume 11, Issue 7, Pages (May 2015)
by Leighton J. Core, Joshua J. Waterfall, and John T. Lis
Chromatin state mapping pinpoints PAX3–FOXO1 (P3F) in active enhancers
Volume 26, Issue 11, Pages e3 (March 2019)
Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF
Mutant TERT promoter displays active histone marks and distinct long-range interactions: A, cell lines that were used in the study with their origin and.
Retained introns in AA and EA cases.
Presentation transcript:

Supplementary Figure 1 Gene A1 st Gene B1 st Gene C1 st ~ Gene G1 st 2 nd ~ 19 th Gene H1 st 2 nd ~ 19 th Gene I1 st 2 nd ~ 19 th ~ 1 st 2 nd 19 th Gene J1 st 2 nd ~ 19 th 20 th Gene K1 st 2 nd ~ 19 th 20 th Gene L1 st 2 nd ~ 19 th 20 th ~ 1 st 2 nd 19 th 20 th Gene D1 st 2 nd Gene E1 st 2 nd Gene F1 st 2 nd ~ 1 st 2 nd Comp. (A) G1 1 st intron G2 (1 st ~2 nd )introns G3 (1 st ~3 rd )introns G4 (1 st ~4 th )introns G5 (1 st ~5 th )introns G6 (1 st ~6 th )introns G7 (1 st ~7 th )introns G8 (1 st ~8 th )introns G9 (1 st ~9 th )introns G10 (1 st ~10 th )introns G11 (1 st ~11 th )introns G12 (1 st ~12 th )introns G13 (1 st ~13 th )introns G14 (1 st ~14 th )introns G15 (1 st ~15 th )introns G16 (1 st ~16 th )introns G17 (1 st ~17 th )introns G18 (1 st ~18 th )introns G19 (1 st ~19 th )introns G20 (1 st ~20 th )introns Dark gray box = first intron %Conserved sites (B) Figure S1. Comparison of conservations in first introns with those in the other introns using an alternative group ing strategy. (A) Schematic of approach for preparing introns. The purpose of this analysis is the same as that of Figure S1, but using introns grouped by different strategy; Genes with two introns are used when first introns an d second introns are compared, and genes with twenty introns are used when first, second, …, twentieth intron ar e compared. (B) Box plot analyses for the proportions of conservations in introns of different ordinal positions.

Supplementary Figure 2 % Signals Introns grouped by their ordinal positions TFBS DHS H3K4me3 H3K4me1 H3K9me3 CTCF 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th % Signals Introns grouped by their ordinal positions TFBS DHS H3K4me3 H3K4me1 H3K9me3 CTCF 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th (A) H1-hesc (B) K562 Figure S2. Proportions of regulatory chromatin marks in intron ordinal groups in H1- hESC and K562. Please refer to the legends of Figure S2. (A) Comparison of the prop ortions of the chromatin marks among different ordinal positions of introns in H1-hES C cell line, and (B) Comparison of the proportions of the chromatin marks among diff erent ordinal positions of introns in K562 cell line.

Supplementary Figure 3 DHS τ = 0.27 (p=0.00) H3K4me1 τ = 0.23 (p=0.00) CTCF τ = 0.12 (p=0.00) TFBS τ = 0.30 (p=0.00) H3K4me3 τ = 0.16 (p=0.00) H3K9me3 τ = (p=0.11) % Signals % Conserved sites in first introns DHS τ = 0.20 (p=0.00) H3K4me1 τ = 0.08 (p=0.00) CTCF τ = 0.07 (p=0.01) TFBS τ = 0.21 (p=0.00) H3K4me3 τ = 0.08 (p=0.00) H3K9me3 τ = 0.01 (p=0.64) % Signals % Conserved sites in first introns (A) H1-hesc (B) K562 Figure S3. Correlation between regulatory signals and conservation in first introns in H1-hESC and K562. Please refer to the legends of Figure 3. (A) Comparison between the proportions of the regulatory marks and the conservation in first introns in H1-hES C cell line, and (B) Comparison between the proportions of the regulatory marks and t he conservation in first introns in K562 cell line.

Supplementary Figure 4 DHS τ = 0.22 (p=0.00) H3K4me1 τ = 0.03 (p=0.03) CTCF τ = 0.01 (p=0.76) TFBS τ = 0.22 (p=0.00) H3K4me3 τ = 0.15 (p=0.00) H3K9me3 τ = 0.03 (p=0.24) % Signals DHS τ = 0.21 (p=0.00) H3K4me1 τ = 0.10 (p=0.00) CTCF τ = 0.03 (p=0.09) TFBS τ = 0.33 (p=0.00) H3K4me3 τ = 0.30 (p=0.00) H3K9me3 τ = 0.01 (p=0.75) % Signals DHS τ = 0.15 (p=0.00) H3K4me1 τ = 0.03 (p=0.06) CTCF τ = 0.05 (p=0.01) TFBS τ = 0.24 (p=0.00) H3K4me3 τ = 0.15 (p=0.00) H3K9me3 τ = 0.07 (p=0.00) % Signals (A) GM12878 (B) H1-hesc (C) K562 Figure S4. Correlation between regulatory si gnals and conservation in the upstream flanki ng regions in three different cell lines. Please refer to the legends of Figure S3. Compariso n of the proportions of conserved sites and re gulatory signals for upstream in GM12878 ce ll line, (B) H1-hESC cell line, and (C) K562 cell line.

y = 0.14x , R 2 = ’ flanking regions y = 0.03x , R 2 = ’ flanking regions % Conserved sites Groups of genes containing each number of exon G1G5G10G15G20G1G5G10G15G20 Supplementary Figure 5 Figure S5. Relationship between flanking region conservation and the numbers of e xons. Please refer to the legends of Figure S4. The proportions of conservation in up stream (left) and in downstream (right) of genes are compared with those with more than one exon, more than two exons, more than three exons, up to more than twenty exons.

Supplementary Figure 6 % Signals in introns of each ordinal position 1 st intron2 nd intron3 rd intron4 th intron5 th intron DHS TFBS H3K4me1 H3K4me3 CTCF H3K9me3 Groups of genes containing different numbers of exons G5G G5G15 G5G15 G5G15G5G y=0.07x R 2 = 0.52 NA y=0.17x R 2 = 0.85 NA y=0.39x R 2 = 0.48 NA y=0.38x R 2 = 0.41 NA (A) From H1-hesc Figure S6. Relationship between the proportions of regulatory signals in introns of each ordinal position and the numbers of exons. Please refer to the legends of Figure S5. Com parison between the proportions of active chromatin marks and the numbers of exons wit hin genes in (A) H1-hESC cell line.

Supplementary Figure 6 % Signals in introns of each ordinal position 1 st intron2 nd intron3 rd intron4 th intron5 th intron DHS TFBS H3K4me1 H3K4me3 CTCF H3K9me3 Groups of genes containing different numbers of exons G5G G5G15 G5G15 G5G15G5G y=0.14x R 2 = 0.71 NA y=0.21x R 2 = 0.51 NA y=1.40x R 2 = 0.66 NA y=0.88x R 2 = 0.46 NA y=0.02x R 2 = 0.10 NA (B) From K562 Figure S6. Relationship between the proportions of regulatory signals in introns of each ordinal position and the numbers of exons. Please refer to the legends of Figure S5. Com parison between the proportions of active chromatin marks and the numbers of exons wit hin genes in (B) K562 cell line.

Supplementary Figure 7 UCSC_Refseq_mRNA (Jan 2013) 36,024 transcripts Transcripts with Intron Dataset of results 29,687 transcripts Unique transcript harboring introns for a gene 16,374 transcripts Gene2refseq (Nov 2013) ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ 1 gene – 1 transcript (A) (B) 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th 11 th 12 th 13 th 14 th 15 th 16 th 17 th 18 th 19 th 20 th Introns grouped by their ordinal positions %Conserved sites (C) y=0.06x R 2 = 0.47 y=0.02x R 2 = 0.32 y=0.02x R 2 = 0.21 y=0.02x R 2 = 0.20 y=0.02x R 2 = 0.20 y=0.03x R 2 = 0.22 y=0.04x R 2 = 0.35 y=0.04x R 2 = 0.31 y=0.00x R 2 = 0.00 y=-0.01x R 2 = st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th % Conserved sites in introns of each ordinal position Groups of genes containing each number of exons G5G15 G5G15G5G15G5G15G5G15 Figure S7. Analysis based on a single representative transcript for each gene. (A) Schematic illustrating data preparat ion. Among the 36,024 transcripts downloaded from UCSC genome browser, a total of 29,687 transcripts are found t o harbor at least one intron. Based on the transcript information using ‘Gene2Refseq’ obtained from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA, for each gene with multiple transcripts, the longest transcript is retrieved, resulting in a total of 16,374 transcripts. (B)-(D) correspond to Figures S1,S4,S5 respectively, reanalyzed with the smaller set of transcripts. Please refer to the legends of those figures. Figure (D) is in next page.

Supplementary Figure 7 (D) % Signals in introns of each ordinal position 1 st intron2 nd intron3 rd intron4 th intron5 th intron DHS TFBS H3K4me1 H3K4me3 CTCF H3K9me3 Groups of genes containing different numbers of exons G5G G5G15 G5G15 G5G15G5G y=0.17x R 2 = 0.69 NA y=0.29x R 2 = 0.56 NA y=1.50x R 2 = 0.55 y=-0.02x R 2 = 0.00 NA y=1.57x R 2 = 0.46 NA

Genes Log odds ratio and 95% CI DHS 4745 / 5020 H3K4Me / 3288 CTCF 1797 / / / / 3941 TFBS 4636 / 4920 H3K4Me / 4405 H3K9Me3 273 / / / / From H1-hESC Supplementary Figure 8 (A) (B) Genes Log odds ratio and 95% CI DHS 4750 / 5060 H3K4Me / 2752 CTCF 2177 / / / / 4457 TFBS 5177 / 5511 H3K4Me / 3380 H3K9Me3 628 / / / / From K562 Figure S8. Enrichment of regulatory marks in the first intron in two additional cell lines. Please refer to the legend for Figure S7. Log-odds ratio analysis is performed for enrich ment of regulatory signals in conserved regions in the first intron in (A) H1-hESC cell li ne, (B) K562 cell line.

Supplementary Figure 9 (A) 05k10k15k20k25k First intron length Frequency Median ≤ Histogram and Box-plot of first intron length transcripts (B) B1B2B3B4B5B1B2B3B4B5B1B2B3B4B5B1B2B3B4B5B1B2B3B4B5B1B2B3B4B5B1B2B3B4B5 % The highest bins 5’ - Bins- 3’ ConservationDHSTFBSH3K4Me1H3K4Me3CTCFH3K9Me3 Figure S9. Five prime to three prime biases in signal density along the first intron. (A) Schematic i llustrating data preparation. Genes harboring short first introns (shorter than the median length) of each intron are excluded. (B) The proportions of various signal densities are estimated over entire first intron. The first intron is binned into five equal-sized bins. Then the fraction of each signal is estimated for ea ch bin, and the fraction of introns in which the highest signal is a particular bin is shown.

Supplementary Figure 10 (A) 14 different ranking patterns in the sizes of the histone mark signals located in promoter, 1 st exon, and 1 st intron 5’FR1 st Exon1 st Intron Candidates for spill-overs The numbers of transcripts corresponding to each pattern for each signal Patterns CpGisland s DHSTFBSH3K4Me1H3K4Me3H3K27AcCTCFH3K9Me3H3K27Me3 P P P P P P P P P P P P P P (B) (C) 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th 11 th 12 th 13 th 14 th 15 th 16 th 17 th 18 th 19 th 20 th Introns grouped by their ordinal positions %Conserved sites Stars for p-value < one-sided Wilcoxon rank sum tests between the first intron and other downstream introns ( 2 nd ~ 20 th ) y=0.16x R 2 = 0.61 y=0.05x R 2 = 0.29 y=0.07x R 2 = 0.32 y=0.02x R 2 = 0.03 y=0.05x R 2 = 0.10 y=0.08x R 2 = 0.14 y=0.08x R 2 = 0.19 y=0.05x R 2 = 0.07 y=0.03x R 2 = 0.04 y=-0.11x R 2 = st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th % Conserved sites in introns of each ordinal position Groups of genes containing each number of exons G5G15 G5G15G5G15G5G15G5G15

Supplementary Figure 10 (D) % Signals in introns of each ordinal position 1 st intron2 nd intron3 rd intron4 th intron5 th intron DHS TFBS H3K4me1 H3K4me3 CTCF H3K9me3 Groups of genes containing different numbers of exons G5G G5G15 G5G15 G5G15G5G y=0.17x R 2 = 0.75 NA y=0.12x R 2 = 0.28 NA y=1.21x R 2 = 0.63 NA y=1.10x R 2 = 0.61 NA Figure S10. Excluding spillover of signals s from the promoter. (A) The top panel illustrates spillover definition. Brie fly, the sizes of the signal proportions are ranked among promoter, exon, and first intron in a transcript. For example, a transcript with the highest proportion of a signal in the promoter, the next lower proportion in the first exon, and the smallest proportion in the first intron is defined as a ‘P123’ set, and a transcript with the same levels of the proportion s in all the three different structures is defined as a ‘P111 set’. A total of 14 different sets are defined by this ranking s trategy, and five sets, i.e., P111, P112, P212, P122, and P123 are considered as spillovers. The bottom table shows th e numbers of transcripts corresponding to each pattern where the sets colored red indicate spillovers. (B) Rebuilt Figu re S1 after removing the introns with potential spillover, (C) Rebuilt Figure S4 after excluding potential spillover case s, and (D) Rebuilt Figure S5 after excluding potential spillover cases.

Supplementary Figure 11 (A) 3’ 5’ 5’ 3’ 5’FR1 st Exon1 st Intron2 nd Exon2 nd Intron 5’FRExons3’FR 5’FRExons3’FR 5’FRExons3’FR 5’FRExons3’FR 5’FRExons3’FR 5’FRExons3’FR Sense strand Antisense strand (B) 1 st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th 11 th 12 th 13 th 14 th 15 th 16 th 17 th 18 th 19 th 20 th Introns grouped by their ordinal positions %Conserved sites (C) y=0.07x R 2 = 0.37 y=0.04x R 2 = 0.65 y=0.03x R 2 = 0.24 y=0.02x R 2 = 0.12 y=0.02x R 2 = 0.17 y=0.05x R 2 = 0.29 y=0.04x R 2 = 0.38 y=0.05x R 2 = 0.27 y=0.01x R 2 = 0.01 y= 0.00x R 2 = st 2 nd 3 rd 4 th 5 th 6 th 7 th 8 th 9 th 10 th % Conserved sites in introns of each ordinal position Groups of genes containing each number of exons G5G15 G5G15G5G15G5G15G5G15

Supplementary Figure 11 (D) % Signals in introns of each ordinal position 1 st intron2 nd intron3 rd intron4 th intron5 th intron DHS TFBS H3K4me1 H3K4me3 CTCF H3K9me3 Groups of genes containing different numbers of exons G5G G5G15 G5G15 G5G15G5G y=0.17x R 2 = 0.68 NA y=0.30x R 2 = 0.69 NA y=1.76x R 2 = 0.64 NA y=1.80x R 2 = 0.50 NA Figure S11. Excluding genes whose first introns overlapped with exons or flanks of a nother genes. (A) Schematic showing the possible structural overlaps among different genes. (B) Rebuilt Figure S1B from “non-overlapped” datasets, (C) Rebuilt Figure 4 f rom “non-overlapped” dataset, and (D) Rebuilt Figure S5 from “non-overlapped” data set.

Supplementary Figure 12 Frequency Distances (bp) 1 st 2 nd TSS-distances from first introns TSS-distances from second introns 1 st 2 nd 1 st Exon1 st Intron2 nd Exon2 nd Intron TSS (A) Figure S12. Analyzing the effect of proximity to the TSS. (A) Histograms showing overlap in the distribution of distance from TSS for the first and the second introns. Please refer to the legends of Figure S8 for (B) and (C). (B) The same analysis as f or Figure S8 from H1-hESC cell line, and (C) The same analysis as for Figure S8 f rom K562 cell line. Figures (B) and (C) are in next page.

Supplementary Figure st 2 nd Conservation DHS TFBS H3K4me1 H3K4me3 ABCDE 1 st 2 nd 1 st 2 nd 1 st 2 nd 1 st 2 nd ABCDE Range of distance (bp)500~600600~700700~800800~900900~1000 Number of 1st introns Number of 2nd introns One-sided Wilcoxon rank sum tests between 1 st introns and 2 nd introns in the same ranges of distance p -values Conservation 0.00 DHS 0.00 TFBS 0.00 H3K4me H3K4me (A)(B) (C) From H1-hesc FromK st 2 nd Conservation DHS TFBS H3K4me1 H3K4me3 ABCDE 1 st 2 nd 1 st 2 nd 1 st 2 nd 1 st 2 nd ABCDE Range of distance (bp)500~600600~700700~800800~900900~1000 Number of 1st introns Number of 2nd introns One-sided Wilcoxon rank sum tests between 1 st introns and 2 nd introns in the same ranges of distance p -values Conservation 0.00 DHS 0.00 TFBS H3K4me H3K4me (A)(B)