Chromosome Architecture January 2015
Motivation Regulation of gene expression Gene complexes DNA replication and repair Recombination Epigenetic symptoms
Chromosome conformation structure Restriction enzyme is selected by the size of the loci examined
Methods 3C – ligation reverse cross link quantization using PCR with primers against ligation sites 4C – another round of restriction digest intermolecular ligation inverse PCR with primers against the site near the restriction site 5C – multiplexed ligation-mediated amplification (LMA) by using a more general primer All require choosing the target loci in advance
Hi-C Same 3 first steps as previous methods Using biotin in order to recognize ligated sites Massively parallel DNA sequencing Problems Centromeres Heterochromatin Vs. euchromatin
Statistical biases Neighboring DNA due to incomplete digestion Same fragment re-ligation Size of fragment Nucleotide composition Chromatin mobility Other epigenetic features (replication time, trans-factors, etc.) Actually reflects a competitive process of ligation
Results
Domains Not necessarily compacted
Classifications Active Domains A1-A4 Null Domains B1-B2
Boundaries Some DNA elements are found above random at domain boundaries (insulators) Some DNA elements are found above random inside domains (mainly histone modifications)
HK hetero CTCF
pseudogenes MAFK genes ZNF143
isValley test There are 161 transcription factors Hundreds of other DNA elements We need a test to distinguish which element is more likely to be found in domain boundaries and which one inside the domain (Valley shaped diagram) We will take the average of the ends and compare it to the average of the middle and see if the difference is significant (as a function of the total count)
isValley test Randomizing counts and running isValley on it, returned interesting results Ends middle Part of count P-value simulated Trxn factors returned true (out of 161) 0-10,90-100 40-60 1000 82 2000 117 3000 0.01 127
P-value of isValley test Under the assumption of uniform distribution we can calculate it rigorously For a count which derived from a uniform distribution, we will get that the chance of returning true is:
P-value of isValley test Now, the average of n cells of uniform distribution is a random variable
P-value of isValley test Now, the average of n cells of uniform distribution is a random variable The difference between avg(ends), avg(middle) is
Higher resulotion Kilobase resolution instead of megabase 6 subcompartments with distinct patterns of histone modifications – relations between domains in the same compartment Diploid Hi-C maps Reveals homolog specific features (X chromosome especially)
loops Searching for pairs of loci with significantly higher value then their neighbors Comparing results with 3D-fish to a control point Consistent with earlier lists of loops
Loops ~10,000 found by “peak loci test” Frequently linking promoters and enhancers CTCF are found in convergent orientation Conserved over evolution
Future steps Testing DNA elements by compartments Testing DNA elements by loop domains and non loop domains