Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window sizes. The observed variation is consistently larger than expected, but sample variance starts to increase rapidly in windows less than ~ 250kb.
Supplementary Figure S2 Co-variation of the divergence rate of different sequence classes in sliding 1 Mb windows. CpG and non-CpG divergence is highly correlated. As is repetitive and non-repetitive sequence divergence.
Supplementary Figure S3 Correlation between human-chimpanzee divergence and distance to the closest telomere for 1 Mb windows on metacentric (a) or acrocentric (b) chromosomes. Each dot corresponds to a unique 1 Mb window. The color of the dots represent their mean recombination rate (red = highest, dark blue = lowest).
Supplementary Figure S4 The ratio of human-chimpanzee non-CpG divergence over mouse-rat divergence vs. the ratio of human GC-content over mouse GC-content across 199 syntenic blocks greater than 1 Mb. Hominid-specific acceleration in subtelomeric blocks is evident even when ignoring CpG sites.
Supplementary Figure S5 Length distribution of small indels (< 15kb) detected within scaffolds (a and d) or contigs only (b and c). For chimpanzee “insertions”, the former is a over-estimate of the number of actual indels due to assembly artifacts, whereas the latter is a under-estimate, due to the small contig size.
Supplementary Figure S6 Size distribution of segmental duplications detected in the chimpanzee genome.
Supplementary Figure S7 Cumulative distribution of K a /K i values for 13,454 orthologs as observed (blue), as expected if all orthologs evolved at K a /K i = 0.23 (green), and as expected if 23% of the codons evolved at K a /K i = 1 and the rest at K a /K i = 0 (red). There is a small excess of orthologs with K a /K i > 1 in the observed distribution, possibly indicating an enrichment of genes under strong positive selection.
Supplementary Figure S8 Median K a /K i over sliding 10-gene windows across chromosome 1. Three peaks, corresponding to the indicated gene clusters are visible.
Supplementary Figure S9 Fraction of ancestral (blue) and derived (green) alleles by frequency in the sample. More ancestral alleles are at high frequency and more derived are at low frequency. The precise distributions are skewed by the minor allele frequency distribution and the ascertainment method for the SNPs, which favored moderate (5-25%) variants over very low or very high frequency variants.
Supplementary Figure S10 Probability of an allele being ancestral vs frequency calculated for a constant-size population (no bottleneck), shown in red, a population having just undergone a bottleneck with b = 0.2, shown in green, and one have just undergone a bottleneck with b = 0.3, shown in blue.
Supplementary Figure S11 Distribution of observed p a (x) for samples from European (CEPH, red), Asian (Japanese and Han Chinese, blue), and West African (Yoruba, green) HapMap samples in the ENCODE regions. Due to the small number of variants sampled, the scatter is large, but the trendlines for Europeans and Asians clearly have slope < 1 while that for West Africans is ~1.
Supplementary Figure S12 Distribution of diversity-divergence scores (blue) sorted from highest to lowest along with their high frequency derived allele skew p-values (green, negative log converted and scaled). Note the large separation in score of the first six regions from the remainder of the distribution. Dashed horizontal line shows the score cutoff for skews of p- value < 0.1.